Opened 8 years ago

Closed 5 years ago

Last modified 5 years ago

#4405 closed defect (fixed)

bridgedb's list of tor exit relays is down since bulk exit list is down

Reported by: arma Owned by: isis
Priority: Medium Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Keywords: isis2015Q1Q2, isisExB, isisExC, bridgedb-0.3.0
Cc: sebastian, aagbsn, kaner, isis@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

bridgedb has a PROXY_LIST config option that pulls in a list of IPs that should be treated specially for the https bucket. The goal is to prevent people from using their open proxy list (or heck, the Tor network) to appear to be many users of the https bucket.

Unfortunately, when we took down the bulk exit list cgi, we broke bridgedb's ability to learn Tor exit relay IPs.

The new plan has been to wait for TorBEL to go live, since it will have a replacement bulk exit list script. But it looks like TorBEL will be a while more.

In the mean time, weasel suggested that we use the old 'exitlist' script to generate a set of Tor exit IPs. That sounds like a great plan to me.

Child Tickets

Change History (14)

comment:1 Changed 8 years ago by arma

We're going to need a place that runs a Tor client, has the cached-descriptors and cached-descriptors.new files, and runs exitlist periodically and puts the file up for download. Bonus points if the download is via https. :)

That Tor client needs to be 0.2.2.x, or 0.2.3.x with UseMicrodescriptors set to 0.

comment:2 Changed 8 years ago by arma

To clarify, ordinarily I don't like exitlist anymore because it has too many false positives. (Specifically, since it pulls in all the files from cached-descriptors and doesn't pay any attention to the consensus, it overcounts addresses for relays that a) aren't up anymore, and b) are on a different address and published a newer descriptor.) But in this case, some false positives for the https bridge bucket are perfectly fine.

comment:3 Changed 6 years ago by isis

Cc: isis@… added
Status: newneeds_information

I saw recently on the tor-dev mailing list that a person was wanting to volunteer working on TorDNSEL. Does this mean that TorBEL still has a very long while to go?

If it is useful (in other words, if it is going to be months, at least, before TorBEL is ready) I can try to fix up or rewrite the old exitlist script.

There is also karsten's and delber's compass, and though I only briefly looked at the code, it seems much more productive to make any necessary improvements on compass, rather than fix exitlist.py.

comment:4 Changed 6 years ago by sysrqb

So https://exitlist.torproject.org/ already exists, do we know to what degree it is unreliable? And is it reliable enough that, for the time being, we can use it and not spend time hacking something else together?

comment:5 in reply to:  4 Changed 6 years ago by isis

Replying to sysrqb:

So https://exitlist.torproject.org/ already exists, do we know to what degree it is unreliable?

According to arma, exitlist.tpo uses TorDNSEL, so as mentioned above it is currently unmaintained (except for a possible upcoming contribution from a volunteer on tor-dev@…). Perhaps I am Doing It Wrong, but I've never gotten any response from the server at https://exitlist.torproject.org -- I assume it's not up. Do you see something different? Or perhaps do you mean check.tpo?

And is it reliable enough that, for the time being, we can use it and not spend time hacking something else together?

Well, we could fix TorDNSEL (I know the basics of Haskell, and wouldn't at all mind a chance to practice) or we could fix exitlist (python).

comment:6 Changed 6 years ago by arma

On the old bridgedb, I had a cron job to fetch http://freehaven.net/~arma/exitlist periodically and use that as bridgedb's list of Tor IPs. (The file gets autogenerated from moria1's info, at x:20 and x:50 each hour.) Maybe the new bridgedb is still using that file? If it is currently using nothing, maybe it should resume using that file.

comment:7 Changed 6 years ago by isis

Status: needs_informationneeds_review

See this branch for a twisted.internet.protocol.Protocol class for handling downloading the exitlist from within bridgedb and parsing/loading it asynchronously into the ProxyCategory/ProxyList, which fixes this ticket.

comment:8 Changed 5 years ago by chingucha

Status: needs_reviewneeds_revision

Is this still relevant? If it is, you should periodically download https://check.torproject.org/exit-addresses and check against that, instead of querying for every IP address against https://check.torproject.org/cgi-bin/TorBulkExitList.py.

comment:9 Changed 5 years ago by isis

Keywords: isis2015Q1Q2 isisExB isisExC added

comment:10 Changed 5 years ago by isis

Owner: set to isis
Status: needs_revisionassigned

comment:11 in reply to:  7 Changed 5 years ago by isis

Keywords: bridgedb-0.2.5 added
Resolution: fixed
Status: assignedclosed

Replying to isis:

See this branch for a twisted.internet.protocol.Protocol class for handling downloading the exitlist from within bridgedb and parsing/loading it asynchronously into the ProxyCategory/ProxyList, which fixes this ticket.

The latest version of this code is in my fix/4405-tor-exit-check_2_r1 branch, with full test coverage, and has been merged into develop for bridgedb-0.2.5.

comment:12 Changed 5 years ago by arma

(For those following along and not wanting to paw through all the git commits: it looks like isis opted to resume fetching from TorBulkExitList now that it's more consistently back up.)

comment:13 in reply to:  12 Changed 5 years ago by isis

Replying to arma:

(For those following along and not wanting to paw through all the git commits: it looks like isis opted to resume fetching from TorBulkExitList now that it's more consistently back up.)


Oh, sorry to make you paw through them! I should have summarised better.

Basically, BridgeDB for the past several years had a cronjob to download the exitlist via https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=38.229.72.19&port=443. The issue wasn't that the TorBulkExitList.py script was down (or, this hasn't been an issue for quite a while). Rather, the issue was that BridgeDB had all these cronjobs running to download new versions, but the new ones weren't getting loaded into BridgeDB. Because BridgeDB tends to run for at least a couple months at a time without the process being restarted, this could lead to BridgeDB's notion of which IPs are Tor exits being slightly off, thus allowing someone to use those exits which BridgeDB doesn't know about to more effectively bypass BridgeDB's various rate-limiting mechanisms and gain more information on bridge nodes.

Additionally, I personally think it's messy to have cronjobs downloading files that are supposed to be in a certain directory and then get loaded into the process… (I'd rather have BridgeDB do it's own things, as much as possible by itself, so that it's easier for others to someday run their own BridgeDBs.) So I changed this to no longer need the external cronjob, but rather to use twisted.internet.task. Because of this, there is now infrastructure to have BridgeDB run other repeating tasks without writing much code:

In bridgedb.conf:

# TASKS is a dictionary mapping the names of tasks to the frequency with which
# they should be run (in seconds). If a task's value is set to 0, it will not
# be scheduled to run.
TASKS = {
    # Download a list of Tor exit relays once every three hours (by running
    # scripts/get-exit-list) and add those exit relays to the list of proxies
    # loaded from the PROXY_LIST_FILES: 
    'GET_TOR_EXIT_LIST': 3 * 60 * 60,
}

And in lib/bridgedb/Main.py:

    tasks = {}
    # Setup all our repeating tasks:
    if config.TASKS['GET_TOR_EXIT_LIST']:
        tasks['GET_TOR_EXIT_LIST'] = task.LoopingCall(
            proxy.downloadTorExits,
            proxyList,
            config.SERVER_PUBLIC_EXTERNAL_IP)

    # Schedule all configured repeating tasks:
    for name, seconds in config.TASKS.items():
        if seconds:
            try:
                tasks[name].start(abs(seconds))
            except KeyError:
                logging.info("Task %s is disabled and will not run." % name)
            else:
                logging.info("Scheduled task %s to run every %s seconds."
                             % (name, seconds))

    # Actually run the servers.
    try:
        logging.info("Starting reactors.")
        reactor.run()

Also, the entire process of retrieving, parsing, and loading the exit list is now async, and data is handled as it arrives off the wire, so there's no stalling while writing to disk.

Lastly, I wrote those patches two years ago. This ticket somehow got lost in the bug tracker, and the branch got buried by other branches. :(

comment:14 Changed 5 years ago by isis

Keywords: bridgedb-0.3.0 added; bridgedb-0.2.5 removed
Note: See TracTickets for help on using tickets.