Opened 4 years ago

Last modified 3 weeks ago

#14744 reopened defect

Automate upload of latest Tor Browser to cloud services

Reported by: ilv Owned by: ilv
Priority: High Milestone:
Component: Applications/GetTor Version:
Severity: Normal Keywords:
Cc: sukhbir, mrphs, gk, boklm Actual Points:
Parent ID: #8542 Points:
Reviewer: Sponsor:

Description

Currently, to have the latest Tor Browser version delivered is necessary to manually upload the files every time a new version of Tor Browser is released. This could easily be automated thanks to RecommendedTBBVersions. This will help to avoid the deliver of old Tor Browser versions (see #12502). A preliminary script for this can be found here.

Child Tickets

Change History (16)

comment:1 Changed 4 years ago by ilv

Note: manually here means that you have to *run a script*, not upload every file by yourself. The goal is to automate the execution of such script.

comment:2 Changed 4 years ago by ilv

As part of the integration of gettor as a tor2web feature, evilaviv3 has made some great improvements to the previous code here and here. These changes fix issues related to security, like possible directory traversals and https certificate validation. It also uses twisted instead of a system call to wget.

I will apply these improvements to the current script in GetTor.

comment:3 Changed 4 years ago by ilv

Resolution: implemented
Status: newclosed

Deployed.

comment:4 Changed 4 years ago by mrphs

Deployed as in... deployed on the running GetTor machine? or implemented and pushed to the repository?

In a typical Tor workflow, ideally you'd create a branch for your new feature and push it to the git, then set the ticket as 'needs_review'. Then $someone would review the code and if everything's fine it gets merged to the master. Then it can get deployed.

While skipping all this could speed up the process, but might also introduce unwanted (human) errors.

comment:5 Changed 4 years ago by ilv

Sure, my bad. This is deployed in GetTor machine. I did a final test when 4.0.5 went out and a cron job was added to check for updates every 12 hours. The code is already in the repo under the develop branch, so yes, I should have merged it to master I guess.

Anyway, thanks for the advices! I'm all ears to more suggestions.

comment:6 in reply to:  2 ; Changed 4 years ago by isis

Replying to ilv:

As part of the integration of gettor as a tor2web feature, evilaviv3 has made some great improvements to the previous code here and here. These changes fix issues related to security, like possible directory traversals and https certificate validation. It also uses twisted instead of a system call to wget.

I will apply these improvements to the current script in GetTor.


Hey ilv! Great work! I see that your current script still uses os.system(cmd)… were you still planning to use Twisted? Using os.system() is really not recommended in the Python world.

Some issues I see with the current implementation are:

  1. If the os.system("wget […]" command fails entirely, or only downloads a portion of a bundle, you'll never know because you're not checking the returned exit status code.
  1. There is no mechanism for resuming downloads, if #1 happens.
  1. Doing
    for provider in UPLOAD_SCRIPTS:
        os.system("python2.7 %s" % UPLOAD_SCRIPTS[provider])
    
    doesn't scale to more provider scripts than the Gettor machine has CPU cores, since most Python scripts will stupidly hog an entire core. It also doesn't take into account memory limitations (and thus, the more providers Gettor has, the more likely for this code to OOM the Gettor machine), nor network bandwidth limitations (nor the effect that any network bandwidth limitations might have on other upload scripts being executed).

Second, which doesn't matter, but the syntax is a bit odd; normally one might do

for provider, script in UPLOAD_SCRIPTS.items():
    os.system("python2.7 %s" % script)

or, if nothing is using provider, then the for loop should more optimally look like:

for script in UPLOAD_SCRIPTS.values():
    […]

By using Twisted instead, particularly if you have the service_identity module installed, and then with a trivially implementable amount of extra code, having leaf or root certificate pinning is possible. Not to mention the speed increases and parallelisation that become possible using Twisted. If you want an example of a standalone script for downloading something over TLS with Twisted, BridgeDB's script for downloading the list of Tor Exit relays (into memory or a file, in this case) might be helpful, as well as the way BridgeDB uses this script as a Protocol (twisted.internet.protocol.Protocol) and manages that Protocol within a Twisted program (so that the list in this case is loaded directly into memory for the servers in the cluster without wasting a bunch of time doing disk I/O. This latter part is less applicable to your case, but it does demonstrate how tasks such as these can be running parallel to the rest of your program. Oh, and they can also be easily scheduled, because f!@# cron too.)

/me stops preaching about how awesome Twisted is

You could also quite easily check the *.asc files on the downloaded bundles to ensure that the whole thing downloaded properly. If you were to use python-gnupg to do it, it would look something like:

import gnupg
import glob
# The GNUPG_HOME_DIR should have the correct signing keys in its pubring.gpg
# file (so geko's and mikeperry's keys, and the Tor Browser signing key, at
# the minimum).
gpg = gnupg.GPG(homedir=GNUPG_HOME_DIR)
signatures = glob.glob("%s/*.asc" % latest_version)
verified = []
unverified = []
for sig in signatures:
    bundle = sig.rstrip(".asc")
    with open(bundle, 'rb') as fh:
        data = fh.read()
        result = gpg.verify(data, sig)
        if result.valid:
            verified.append(bundle)
Last edited 4 years ago by isis (previous) (diff)

comment:7 in reply to:  6 Changed 4 years ago by ilv

Resolution: implemented
Status: closedreopened

Replying to isis:


Hey ilv! Great work! I see that your current script still uses os.system(cmd)… were you still planning to use Twisted? Using os.system() is really not recommended in the Python world.


hey isis, thanks! and thanks for taking the time to review this! tbh, I discarded using Twisted (for SSL verification) because wget fails (and thus the whole script) if the certificate is incorrect.

Some issues I see with the current implementation are:

  1. If the os.system("wget […]" command fails entirely, or only downloads a portion of a bundle, you'll never know because you're not checking the returned exit status code.
  1. There is no mechanism for resuming downloads, if #1 happens.


Correct, thanks for pointing this out.

  1. Doing
    for provider in UPLOAD_SCRIPTS:
        os.system("python2.7 %s" % UPLOAD_SCRIPTS[provider])
    
    doesn't scale to more provider scripts than the Gettor machine has CPU cores, since most Python scripts will stupidly hog an entire core. It also doesn't take into account memory limitations (and thus, the more providers Gettor has, the more likely for this code to OOM the Gettor machine), nor network bandwidth limitations (nor the effect that any network bandwidth limitations might have on other upload scripts being executed).


Correct me if I'm wrong, but the scripts for each provider should be executed sequentially, so I'm not sure about the scalability problems related to the CPU cores. And you are right again, I haven't taken into account nor the memory limitations nor the network bandwidth limitations. I guess Twisted should be helpful for these points.

Second, which doesn't matter, but the syntax is a bit odd; normally one might do

for provider, script in UPLOAD_SCRIPTS.items():
    os.system("python2.7 %s" % script)

or, if nothing is using provider, then the for loop should more optimally look like:

for script in UPLOAD_SCRIPTS.values():
    […]


/me is still a python noob :P

By using Twisted instead, particularly if you have the service_identity module installed, and then with a trivially implementable amount of extra code, having leaf or root certificate pinning is possible. Not to mention the speed increases and parallelisation that become possible using Twisted. If you want an example of a standalone script for downloading something over TLS with Twisted, BridgeDB's script for downloading the list of Tor Exit relays (into memory or a file, in this case) might be helpful, as well as the way BridgeDB uses this script as a Protocol (twisted.internet.protocol.Protocol) and manages that Protocol within a Twisted program (so that the list in this case is loaded directly into memory for the servers in the cluster without wasting a bunch of time doing disk I/O. This latter part is less applicable to your case, but it does demonstrate how tasks such as these can be running parallel to the rest of your program. Oh, and they can also be easily scheduled, because f!@# cron too.)


Thanks a lot for this info! Now I'm convinced again that I should use Twisted :)

You could also quite easily check the *.asc files on the downloaded bundles to ensure that the whole thing downloaded properly. If you were to use python-gnupg to do it, it would look something like:

import gnupg
import glob
# The GNUPG_HOME_DIR should have the correct signing keys in its pubring.gpg
# file (so geko's and mikeperry's keys, and the Tor Browser signing key, at
# the minimum).
gpg = gnupg.GPG(homedir=GNUPG_HOME_DIR)
signatures = glob.glob("%s/*.asc" % latest_version)
verified = []
unverified = []
for sig in signatures:
    bundle = sig.rstrip(".asc")
    with open(bundle, 'rb') as fh:
        data = fh.read()
        result = gpg.verify(data, sig)
        if result.valid:
            verified.append(bundle)


Awesome, thanks again!

comment:8 Changed 15 months ago by arma

Severity: Blocker

It looks like https://github.com/TheTorProject/gettorbrowser is telling people the latest tor browser version is 7.0.2, yet actually it's 7.0.4.

Is that because this automation is broken somehow? Or is the github.com url not one of the mirrors that this automation runs on? Or something else?

(reported by "kingu" on irc.)

comment:9 Changed 15 months ago by arma

Severity: BlockerNormal

(wait, why did my comment change the severity on this ticket. ugh, new trac.)

comment:10 Changed 15 months ago by teor

The current links are for 7.0.2 (Windows, macOS), and 7.0.3 (Linux). Could skipping 7.0.3 for Windows and macOS have confused the bot?

What version should we show in the first paragraph when different platforms have different versions?

comment:11 in reply to:  10 Changed 15 months ago by arma

Replying to teor:

The current links are for 7.0.2 (Windows, macOS), and 7.0.3 (Linux). Could skipping 7.0.3 for Windows and macOS have confused the bot?

A fine question.

What version should we show in the first paragraph when different platforms have different versions?

Option A would be to list one, two, or three versions, depending on how many there are.

Option B would be to stop listing "a" version, since it's clear sometimes there is no single version.

Option C would be to lobby the tor browser team to stop doing unsynchronized version bumps between platforms.

Option A would be the least invasive out of these.

comment:12 Changed 15 months ago by ilv

In deed, the different versions 7.0.2 and 7.0.3 confused the upload scripts. To my knowledge, this is the first it happens in almost 2-3 years. However, the automation is broken. Right now I have to manually run the upload scripts, that's why the repository had older versions of Tor Browser.

comment:13 Changed 7 months ago by gk

Cc: gk added

comment:14 Changed 5 weeks ago by traumschule

Meanwhile 8.0.2 is the current stable.

I would like to help but it seems the person with access rights to the github repository is MIA.

From my perspective it makes a lot of sense to combine GetTor with mirrors because the easiest way to synchronize files is rsync. It would help both projects to use existing synergies, see #22150.

Last edited 5 weeks ago by traumschule (previous) (diff)

comment:15 Changed 5 weeks ago by boklm

Cc: boklm added

comment:16 Changed 3 weeks ago by traumschule

Parent ID: #8542

Make #8542 parent of orphaned GetTor providers' tickets.

Note: See TracTickets for help on using tickets.