wiki:org/roadmaps/GetTor/design

Design

The easiest way to understand how GetTor works is by enumerating the different steps involved in the problem we want to solve. Consider the following situations:

  1. Receive requests from users via different channels.
  2. Process the received requests, extracting the information needed to provide an useful response, namely: source address/user, operating system and language.
  3. Construct a response according to the information extracted.
  4. Create an anti-flood mechanism that allows to blacklist specific users.
  5. Verify that the source address/user of the request is not permanently or temporarily blacklisted.
  6. Send back a reply with the links to download Tor Browser from some popular non-blocked cloud service.
  7. Keep track of the number of requests received by GetTor.
  8. Upload Tor Browser to popular non-blocked cloud services.

The current design of GetTor consists of a series of modules, each one intended for a specific task. There are two big groups: the main modules, and the service modules. The main modules are Core, Blacklist and Database, aimed to cover the points 3), 4), 5) and 7). The service modules are STMP, XMPP and Twitter, aimed to cover the points 1), 2), 5) and 6).

Whenever a request is received, it is handled by one of the service modules according to the channel the request was sent by the user. These channels are email for SMTP, chat for XMPP, and DM for Twitter. The corresponding module process the request, collecting all the necessary data to provide an useful reply, namely: operating system, language and source address/user. It also makes sure that the source address/user is not blacklisted (See Blacklisting for details). If no valid data is found, then a help message is sent back to the user. Otherwise, the service module contacts the Core module asking for the links and then replies to the user. In both cases, the Core module increases the number of requests received in the database. A very simple diagram of the modules interaction looks like this:



                  -----------
               ->|SMTP Module|          -----------
             /    -----------  \      >| Blacklist |<
            /                   \   /   -----------   \
           /     -----------     \ /    ------         \       ----------
    USERS <---> |XMPP Module| <------> | Core | <-----------> | Database |
           \     -----------     /      ------                 ----------
            \                   /         |
             \     --------------         |
              \-->|Twitter Module|        |
                   --------------         |
                \                         |
                 \             ----------------
                  \---------->| Other Services |
                               ----------------

There is one of the points enumerated before that is not covered by the previous modules, which is uploading Tor Browser to popular non-blocked cloud services. This is handled by a series of scripts, one for each cloud service supported. Currently, there are scripts for Dropbox and Google Drive.\ \ Below you will find a more detailed description of each one of the modules and scripts of GetTor.

Core

As its name suggests, this is the core module of GetTor, and its main purpose is to provide a simple and robust interface for obtaining the links to download the Tor Browser. The design of this module is based on one main concept: storing the links on files. The idea consists on having one file for each cloud service or provider, where each file follows the Python PEP8 format for configuration files, which means that the data is categorized under sections and accesible by keys. Every links file must have the following five sections:

[provider]: Contains only one key, the name of the cloud service/provider.

[key]: Contains only one key, the fingerprint of the PGP key used to sign the Tor Browser packages.

[linux]: Contains all the links for the Linux operating system, with one key for each locale available. Every locale should have no more than six lines. There is one line for the Tor Browser link, other for the ASC signature link, and other for the sha256 of Tor Browser. There is one set of three lines for 32-bit and other for 64-bit (six lines in total).

[windows]: Contains all the links for the Windows operating system, with one key for each locale available. Every locale should have no more than three lines. There is one line for the Tor Browser link, other for the ASC signature link, and other for the sha256 of Tor Browser. The windows package of Tor Browser is intended for both 32 and 64 bits.

[osx]: Contains all the links for the Mac OSX operating system, with one key for each locale available. Every locale should have no more than six lines. There is one line for the Tor Browser link, other for the ASC signature link, and other for the sha256 of Tor Browser. There is one set of three lines for 32-bit and other for 64-bit (six lines in total).

A sample links file should look like this:

[provider]
name = Dropbox

[key]
fingerprint = 8738 A680 B84B 3031 A630 F2DB 416F 0610 63FE E659

[linux]
en = Package (64-bit): link-to-dropbox-en64
	ASC signature (64-bit): link-to-dropbox-en64.asc
	Package SHA256 checksum (64-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4,
	Package (32-bit): link-to-dropbox-en32
	ASC signature (32-bit): link-to-dropbox-en32.asc
	Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
es = Package (32-bit): link-to-dropbox-es32
	ASC signature (32-bit): link-to-dropbox-es32.asc
	Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
	

[windows]
...

[osx]
....

Please note that for the purposes of making things easier, the name of a links file should be provider_in_lowercase.links. All of the above allow us an easy access to the links depending on the operating system and language that we need. The public method for doing this is the following:

  get_links(service, os, lc)

This returns a string with the links, where:

service: String that identifies the service communicating with the core module. This is for stats purposes only.

os: The operating system for which we need the links. There are currently three options: windows, linux, and osx.

lc: The locale for which we need the links. There is currently one supported option: en (for English).

Below is a sample script that communicates with the core module:

#!/usr/bin/python

import gettor.core

core = gettor.core.Core()
links = core.get_links('dummy service', 'linux', 'en')
print links

For more details you are welcome to see the implementation on the code repository.
The secondary purpose of the core module is to provide methods to ease the creation of links files for cloud services. There are two public methods for this:

create_links_file(provider, fingerprint)

This creates a links file with the format provider_in_lowecase.links, where:

provider: String for the name of the provider/cloud service (e.g. Dropbox)

fingerprint: String that represents the fingerprint used to sign the Tor Browser packages.

And,

   add_link(provider, os, lc, link)

This adds a link to the links file of the provider, where:

provider: Strings that identifies the provider/cloud service. This is also the name of the links file.

os: The operating system for which we intend to add the link. There are currently three options: windows, linux, and osx.

lc: Locale for which we intend to add the link. There is currently one supported option: en (for English).

link: String that represents the actual link to be added.

Below is a sample script to create a links file and add a couple of links to it:

#!/usr/bin/python
import gettor.core

link64 = """Package (64-bit): link-to-dropbox?dl=1
ASC signature (64-bit): link-to-dropbox.asc?dl=1
Package SHA256 checksum (64-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4"""

link32 = """Package (32-bit): link-to-dropbox?dl=1
ASC signature (32-bit): link-to-dropbox.asc?dl=1
Package SHA256 checksum (32-bit): 98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4"""

core = gettor.core.Core()
core.create_links_file('Dropbox', '8738 A680 B84B 3031 A630 F2DB 416F 0610 63FE E659')
core.add_link('Dropbox', 'linux', 'en', link64)
core.add_link('Dropbox', 'linux', 'en', link32)

For more details on these methods please check the code repository and/or see the cloud service scripts section.

Distribution Channels

Ideally, a user should have various ways to contact GetTor and receive the Tor Browser. This distribution channels should parse a request, get the user's OS and language, ask for the links to the core module and then send this info back to the user. Ideally, each distribution channel should be handled by a separate module. Currently, there is one distribution channel deployed (SMTP), one implemented but not deployed (XMPP), and one not finished (Twitter).

SMTP

This modules is on charge of receive and reply requests via email. Back in 2008 when GetTor was conceived, SMTP was the main and only distribution channel. Requests were answered with the actual bundle as an attachment instead of links. This approach was good, but the bundles started to get larger in size to the point were it was no longer feasible to send it as an attachment (the current size of Tor Browser is ~40Mb).

There three scenarios involved in sending links via email:

  • Listen for users' emails directed to GetTor robot.
  • Determine the type of request and get the necessary data to reply it.
  • Send back a reply to the user.

The first point is handled by the mail server provided by the Tor Project. In addition, we use email forwarding to make sure we get all the emails directed to GetTor robot. For this a .forward like the following is used:

|"python2.7 /path/to/gettor/smtp_process.py"

With this, the only concern of the smtp_process.py script is to receive emails fron the standard input and talk to the SMTP module to process it. The SMTP module has only one public method:

process_email(raw_msg)

Where:

raw_msg: String that represents the email received.

A basic script for communicating with the SMTP module should look like this:

#!/usr/bin/env python
import sys
import gettor.smtp

service = gettor.smtp.SMTP()
incoming = sys.stdin.read()
service.process_email(incoming)

The other two points are handled by the SMTP module. The first step after receiving a request is determine if the address is blacklisted. See the Blacklisting section to check the current process to do that. Then, the next step is to determine the type of request received.For now, there are only two types of request that could be received: help and links. The decision process to determine what type we have received is the following:

  • Does the body of the message include the words windows, linux, or osx? If so, we have received a links request.
  • Any other case should be considered as a help request, including blank emails.

For both types of request the language is obtainede from the address the email was intended to: gettor+lc@…, where lc stands for the supported locales by Tor Browser. Currently, the only locale supported is English. If no locale is specified, we assume English by default.

Knowing the type and language of the request is enough to construct a reply and send it to the user. Every time a reply is sent, the number of requests received is increased in the database. See Database to check the current DB schema.

XMPP

To be redacted.

Twitter

To be redacted.

Database

The database module, as its name suggests, is in charge of interacting with the GetTor database. The current design is quite simple and satisfies two needs:

Add a request. For now it consists only in knowing how many requests we have received so far. No other data is collected.

Add/delete/update a user. This allow us to know how many requests a single user has made and thus avoid any type of flood (see Blacklisting). For this purpose we collect the following data:

  • user: 256 hash of the user address/account.
  • service: string that represents the distribution method used by the user (e.g. SMTP).
  • times: number of requests received from the same user.
  • blocked: boolean flag to know if user is permanently blacklisted.
  • last_request: timestamp that represents the last time a given user made a request from the same distribution channel.

The initial design of the database module (during the revamp) considered a lot of data to be collected (type of request, language, os, etc.), but eventually we decided to keep just the necessary data to know how many requests GetTor has received and to avoid flood. The type of database choosen for this purpose was SQLite. You can check a sample database in the code repository (gettor.db).

Blacklisting

The current blacklisting mechanism is quite simply and it's based on the data collected by the 'users' table specified in GetTor's database, plus some extra parameters defined in blacklist.cfg, which help us to stablish limits to avoid flood. The current mechanism depends on four parameters:

  • user: Hashed address/account of the user. It helps to identify malicious users.
  • service: Service or distribution channel used by the user trying to contact GetTor.
  • max_req: Maximum number of requests per user and service allowed at the moment.
  • wait_time: Number of minutes a user should wait until she reaches max_req.

Both the user and service parameteres are obtained in real time when GetTor receives a request. The other two, max_req and wait_time are specified in blacklist.cfg. Each service module (e.g. SMTP) should be in charge of specifying the path to this configuration file and interact with the !Blacklisting module according to that information. The current mechanism also depends on the last_request, times, and blocked fields of the database for the record associated with user and service. With all of this, the decision algorithm can be described as follows:

 if blocked:
    update_user_on_db(user, service, times+1, 1)
    raise BlacklistError("Blocked user")
 elif times >= max_req:
    last = get last_request from db
    next = last + wait_time

    if now < next:
        # too many requests from the same user
        update_user_on_db(user, service, times+1, 0)
        raise BlacklistError("Too many requests")
    else:
        # fresh user again!
        update_user_on_db(user, service, 1, 0)
 else:
        # adding up a request for user
        update_user_on_db(user, service, times+1, 0)

This simple mechanism helps us avoid malicious users from flooding one or more services/distribution channels with infinite requests. As you may otice, if a user make a request before the wait_time has passed, then the user must wait another wait_time to make a request again, and if a user make a request after she has reached the maximun number of requests and waited wait_time, then the counter of her requests is setted to one. You can check the _is_blacklisted method of the SMTP module to see how a service should interact with the Blacklisting module.

This mechanism could certainly be improved. If you have any ideas/comments about it, please tell us (ideally by filling a ticket :)

Cloud Services

For each service used by GetTor to distribute the Tor Browser files there should be a script in charge of uploading such files according to the methods provided by the service used. Each one of these scripts must assume that the latest Tor Browser files has been downloaded (see Other Scripts) and contemplate the following tasks (in order):

  1. Get the fingerprint from the key used to sign the Tor Browser.
  2. Use the Core module to create a new links file (core.create_links_file).
  3. Obtain the sha256 checksum of each {tar.xz, exe, dmg} file to be uploaded.
  4. Check that the corresponding .asc signature exists for each {tar.xz, exe, dmg} file to be uploaded.
  5. Identify the architecture, language and operating system associated to each {tar.xz, exe, dmg} file to be uploaded.
  6. Create a string describing a new link, using the information identified before.
  7. Use the Core module to add a link to the new links file created (core.add_link), specifying the service, the operating system, and the language (locale).

You can check the existing scripts for Dropbox and Google Drive to see the current methods used to do the points listed above, specially 1, 3, 5, and 6. For more details on how the links files are created and how the links are stored, check the documentation about the Core module. Below is a list of the current services/providers integrated with GetTor:

  • Dropbox: Deployed. In use for a long time.
  • Google Drive: Implemented, but not yet deployed.
  • Github: Implemented, but not yet deployed. This one should be especially useful to distribute the Tor Browser in places where Dropbox and Google Drive are blocked (e.g. China).

If you have an idea for a new service that could be used (even if you don't know how to implement it), please contact us (ideally by filling a ticket :).

Other Scripts

Below is a list of scripts used for diverse and "smaller" tasks:

  • blacklist.py: Handle blacklisting of users. Execute blacklist.py -h for more details.
  • create_db.py: Handle the creation of the SQLite database used by GetTor for managing blacklisting of users and keep track of basic stats. Execute create_db.py -h for more details.
  • stats.py: Handle basic stats according to the information stored in the SQLite database. Execute stats.py -h for more details.
  • fetch_latest_torbrowser.py: Automate the download of Tor Browser files from Tor Project's website and upload of these files to the services used by GetTor every time a new stable version of Tor Browser is available. Implemented, but not yet deployed. See the source file in the repository for more details.
Last modified 3 years ago Last modified on Mar 6, 2015, 1:45:22 AM