Opened 9 years ago

Closed 22 months ago

#1839 closed enhancement (fixed)

Rotate available bridges over time

Reported by: arma Owned by: isis
Priority: High Milestone:
Component: Circumvention/BridgeDB Version:
Severity: Blocker Keywords: bridgedb-dist, bridgedb-0.3.2
Cc: isis, sysrqb Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We need to design a new algorithm for deciding which bridge addresses to give out in response to a given query. The result of this algorithm should be that only a small fraction (e.g. 20%) of the bridges in the https or gmail bucket are available at a given time, and we rotate to a new fraction each week.

Benefits: a) an adversary who does a high-intensity push to enumerate bridges can't get them all in one week now matter how hard the push, and b) if you learned your bridge last week, the enumerating adversary won't find it this week.

This task depends on #1606 (bridgedb spec), since I don't have a good handle on what we're doing now, so I don't know what needs to change to get there.

And finally, we're going to need to improve Vidalia's "who has used my bridge" interface to explain to the users what's going on, so they don't see zero usage and turn off their bridge.

Child Tickets

Change History (12)

comment:1 Changed 6 years ago by sysrqb

Keywords: important added
Priority: normalmajor

comment:2 Changed 6 years ago by isis

Cc: isis added
Keywords: easy added; important removed
Owner: set to isis
Priority: majornormal
Status: newassigned

In order to implement this, one should look at the bridgedb.Dist.uniformMap() function and either write a new function based on it, or write some sort of wrapper/decorator for it, which adds a timestamp (truncated to whatever interval is decided upon) to the client's IP address (the IP address's last quad should probably still be truncated)

comment:3 Changed 6 years ago by sysrqb

Another question that will need to be answered is how long we should make the period for this rotation. Also, ideally, the subset of the bridges won't rotate in a predictable order, either. I wonder if weekly rotation will offer enough protection, too.

comment:4 in reply to:  3 Changed 6 years ago by isis

Replying to sysrqb:

Also, ideally, the subset of the bridges won't rotate in a predictable order, either.


Actually, we probably shouldn't have to worry about this too much.

If we implement this with the rotation period as something like either "YEAR || WEEK_NUMBER" or a rounded-off timestamp, and append it to the truncated client IP and then HMACed, then — working in the random oracle model — the probability of a collision is negligible, dependent ultimately on the key/IV/security parameter size, approximately Pr[HMAC] ≤ 1ᵏ where k is the security parameter (in our case, 256, which is the bitlength of the master HMAC key, MASTER_KEY in bridgedb.conf). Hopefully I'm remembering the equation correctly... I think I remember it being a unary relation to the security parameter. For your concerns on predictability, the probability is lower because it needs a second preimage for SHA-1.

And bonus points, that probability is often actually even smaller: the security of an HMAC can be proven if only the compression function is modeled as random, the digest function doesn't need to be. This is also good news, because digest functions are very clearly not random oracles. For the bad news, there are several types of differential distinguishers for HMAC-SHA-1, though only on reduced-step variants. We should make sure we're rotating the MASTER_KEY often enough, and move to SHA-2/3 when we get around to it (should be simple as 's/\'sha1\'/\'sha256\'/')

I wonder if weekly rotation will offer enough protection, too.


It all basically boils down to an adversary using some fancypants tricks to brute force the MASTER_KEY. If they get that key, and also given that some of our adversaries have large IP spaces, it doesn't really matter matter how often we rotate. In the end, the problem isn't so much whether our adversaries can actually computationally/economically afford to beat us and our crypto; it's that no matter what, they've got more resources than the people we're trying to help.

comment:5 Changed 5 years ago by isis

Keywords: bridgedb-dist added

comment:6 Changed 4 years ago by isis

Cc: sysrqb added

During #4771, BridgeDB was changed to distribute a maximum of four sets of bridges to all Tor/proxy users during a time period. The way that this works is that the HMAC determining the client's position in the hashring is given:

known-proxy<EPOCH>GROUP

where EPOCH is the beginning of the current time period (this is the part that needs this rotation ticket done), and GROUP is calculated by int(ipaddr.IPAddress(CLIENT_IP)) % 4, such that all Tor/proxy users are split into four groups (effectively meaning that regardless of how many times one gets a new Exit node or proxy, there are only four sets of bridges available.

Therefore, we absolutely need to be certain that periodic rotation is enabled now (it isn't), otherwise only four sets of bridges will go to all Tor/proxy users forever, and everyone will mail Join AOL! CDs and explody packages of glitter to me in their sheer anger and frustration.

comment:7 Changed 4 years ago by isis

Also, arma suggested on IRC that we expose these rotation periods as configurable settings in the config file, which I think is a fine idea.

comment:8 Changed 4 years ago by isis

Priority: normalblocker

This is blocking #4771 deployment.

comment:9 Changed 4 years ago by isis

Keywords: bridgedb-0.3.2 added; easy removed
Status: assignedneeds_review

My changes for this are in my fix/1839-rotation-periods branch.

There are now HTTPS_ROTATION_PERIOD and EMAIL_ROTATION_PERIOD config settings which may be used to control the hashring rotation periods for BridgeDB's distributors via the configuration file.

The defaults for those settings will likely need to be changed and fiddled with to make the behaviour an optimum balance between user-friendly and resilient to enumeration. Currently, they are:

HTTPS_ROTATION_PERIOD = "3 hours"
EMAIL_ROTATION_PERIOD = "1 day"

Behavioural changes

Email Distributor

The behaviour for hashring rotation for the EmailBasedDistributor is such that, when bridgedb.email.autoresponder.createResponseBody() calls EmailBasedDistributor.getBridgesForEmail(EMAIL_ADDRESS, EPOCH) with the EPOCH set to the start of the current EMAIL_ROTATION_PERIOD, the client's position in the hashring is determined by the HMAC of the string "<EPOCH>EMAIL_ADDRESS". Therefore, the EMAIL_ROTATION_PERIOD directly effects where the client is placed in the hashring, resulting in different bridges for that client, depending on whether the period has elapsed.

With the default setting of "1 day", and taking into account also that the EmailBasedDistributor only responds to a particular user once per three hours, this results in the client being able to ask for (and receive) vanilla bridges (starting from hashring position A) at 9:00, obfs3 bridges (also from position A) at 12:00, obfs4 bridges (from position A again) at 15:00, and so on, and finally different vanilla bridges (from hashring position B) the next morning at 9:00.

HTTPS Distributor

For the IPBasedDistributor, the behaviour for hashring rotation is that the client's hashring position is determined by the HMAC of the string "<EPOCH>AREA" where EPOCH is the start of the current HTTPS_ROTATION_PERIOD, and the AREA is the /16 subnet which the client's IP address resides within. If the client is using Tor or some other open proxy, then the client's hashring position is determined by "known-proxy<EPOCH>GROUP" where EPOCH is the same as before, and GROUP is an integer (currently 1 through 4, inclusive) deterministically derived from the IP address of the Tor Exit relay or open proxy that the client is using. Using GROUP causes there to only be 4 sets of bridges available to any and all Tor/proxy users at a given time. Hence additionally using EPOCH rotates the set of 4 bridges available.

The default setting is "3 hours", causing all Tor/proxy users to have 4 different sets of bridges every three hours, while non-Tor users have a new set of bridges (probably) unique to their /16 subnet of their IP address available every three hours.

comment:10 Changed 4 years ago by isis

Some IRC logs, because they started off with arma reviewing the design of this ticket's implementation and recent changed in #4771, and then drifted to other future BridgeDB/Metrics related tasks.

05:50          armadev  | i was looking at #1839
05:50 -zwiebelbot:#tor-dev- tor#1839: Rotate available bridges over time - https://bugs.torproject.org/1839
05:50             isis@ | oh great, thanks
05:50          armadev  | i need to look more at the plan there, but i continue to think that the strategy of "don't let an attacker learn very many bridges in a given time period no matter how much effort they 
                          put in" is a good one
05:50          armadev  | but that made me remember another thing i was wanting us to look at
05:51          armadev  | which is #10 on https://blog.torproject.org/blog/research-problems-ten-ways-discover-tor-bridges
05:51          armadev  | right now, we have a hash ring design, where the "address" of the requestor maps to a point on the hash ring, and we give them the next k bridges?
05:51            *      coderman_ wants more redteam arma blog
05:52          armadev  | so that naturally will lead somebody who can attack some points in this ring to learn all the rest of the bridges if they do this attack
05:52          armadev  | whereas we could imagine something other than a hash ring, or rather, using the ring differently, to make it so all bridges map to a small closed cycle
05:52          armadev  | i haven't thought through the details and maybe it cannot be made to work easily, but i wanted to raise the topic again.
05:55             isis@ | so the alternative that i could do would be to have "consistent" hashrings, which is something used usually in backend systems for data replication, where if the number of duplicates 
                          N=3, then you end up with the resource places into three buckets as-evenly-as-possible placed around the main hashring
05:56             isis@ | with a replication level of N=1, this would result in each bridge being in its own little subgroup, and no others
05:57             isis@ | BridgeDB kind of has like four classes which try to implement this concept, and IMO they all do a really bad job, with code duplication, unused code, and half-implemented stuff all over 
                          the place
05:57             isis@ | i am finishing up my branch which cleans up all the hashring code today, it is for #12505
05:57 -zwiebelbot:#tor-dev- tor#12505: Refactor Bridges.py and Dist.py in BridgeDB - https://bugs.torproject.org/12505
05:59             isis@ | anyway, if we did this, they we could easily say "every distributor gets one main consistent hashring which is split into X subhashrings. depending on what week it is, only one of those 
                          subhashrings is available." then, in the subhashring, rotate the clients around the ring with a different frequency
06:00             isis@ | does that sound like it would solve #10?
06:00          armadev  | but it's still a ring. this is good for the "change what you're giving out over time" feature, but no, i think it doesn't address #10.
06:00          armadev  | the issue is that my address maps to bridges 5, 6, and 7
06:00          armadev  | and your address maps to bridges 7, 8, and 9
06:01          armadev  | so if the adversary sees your behavior, it learns about 7, and from 7 it learns about me, and then it learns about 5 and 6
06:01          armadev  | it would be better, for #10, if every address maps to a trio of bridges that are the same trio that other people get when they're mapped there
06:01          armadev  | rather than these partially overlapping sets that we do now.
06:02             isis@ | ah, i see
06:02             isis@ | yes, the overlap has also bothered me, but i wasn't thinking of the zig-zag problem
06:02          armadev  | not sure this one needs to be solved now
06:02             isis@ | hmm. the overlap is a much more difficult one to solve
06:02          armadev  | and i think "different bridges at different times" is a more important topic to do
06:03          armadev  | the blog post describes a potential solution.
06:04          armadev  | but 'more work remains' before that solution will actually do what we want.
06:04          armadev  | it's the sort of thing we should write up as a math problem for somebody's grad class, and then sit back and wait
06:04             isis@ | the overlap is also even harder to solve because, when BridgeDB parses new descriptors, it rebuilds all the hashrings entirely, causing rings to add and lose bridges. however, for the 
                          bridges which remain, their place in the hashring remains the same.
06:04          armadev  | rather than get caught up in ourselves
06:05          armadev  | huh. yeah.
06:05          armadev  | which leads me to another topic that we should be pondering:
06:05          armadev  | all of these steps we take to make it less likely for an attacker to Get All The Bridges lead to more bridges going unused for some time periods
06:05          armadev  | we should think about ways to tell the bridge operator when they're in action, and when they're in reserve
06:05          armadev  | so we can reassure them that being in reserve is a great and valuable role.
06:06          armadev  | (some people run a bridge for a day, then stop. if they were in reserve the whole time, technically speaking, that wasn't a great and valuable role after all.)
06:06             isis@ | so, e.g. if you ask for bridges right now (without #1839 deployed) and you get bridges A, B, and C, and then BridgeDB reparses and rebuilds, and B goes offline, then three hours later 
                          you ask for more bridges, you'll likely get bridges A, B, and D
06:07          armadev  | perhaps you mean C goes offline?
06:07          armadev  | otherwise, this sounds bad :)
06:08             isis@ | oh yeah. that. :)
06:08          armadev  | hey, it's bridgedb, you never know
06:09             isis@ | haha, the thing is becoming a tiny bit more well-behaved now
06:09             isis@ | just a tiny bit
06:09             isis@ | i think you once had a ticket for designing some bridge statistics interface for BridgeDB…
06:09            *           isis  is looking for it
06:09             isis@ | #7877
06:09 -zwiebelbot:#tor-dev- tor#7877: Web interface for looking up bridge status? - https://bugs.torproject.org/7877
06:09             isis@ | why did that never happen?
06:10             isis@ | do we still want that to happen?
06:10          Yawning  | hm
06:10             isis@ | or do we consider Globe to solve that problem?
06:10          armadev  | didn't we do something related to #7877 in globe?
06:11          armadev  | except, i vaguely remember hearing from karsten that he decided to drop that data point from the globe interface, because i-don't-remember-why
06:11          armadev  | it does seem a bit silly for bridgedb to grow a new interface for users,
06:12          armadev  | when it's already exporting stuff to globe and globe is already an interface for users
06:12          armadev  | but it might be wise for us to export a bit more stuff from bridgedb to globe, so it can give that stuff to users
06:12             isis@ | once the database stuff for prop#226 is merged, we get a pretty neat stucture to build statistics gathering and analysis tools on top of
06:12 -zwiebelbot:#tor-dev- Prop#226: &quot;Scalability and Stability Improvements to BridgeDB: Switching to a Distributed Database System and RDBMS&quot; [OPEN]
06:12          armadev  | and i guess, step zero is for globe to resume giving out that info at all
06:13             isis@ | yeah, i suppose i could also more easily support giving the metrics server access to certain queries, so that it benefits from BridgeDB keeping state and all
06:14             isis@ | plus then metrics wouldn't have to do a bunch of crazy reparsing and recalculation of any things which bridgedb already does
06:16          armadev  | yeah, hm, the 'pool assignment' entry on globe appears empty
06:16          armadev  | for e.g. https://globe.torproject.org/#/bridge/1513028CD43BD34798D829719D76E6EC3F5391CA
06:17          armadev  | #13921
06:17             isis@ | yeah, see #13921
06:17 -zwiebelbot:#tor-dev- tor#13921: Remove "bridge pool assignment" UI element from Atlas/Globe - https://bugs.torproject.org/13921
06:17             isis@ | which replaces it in Globe with the `transport` field instead
06:17             isis@ | showing which transports a bridge currently supports
06:17          armadev  | well, great, but that removes the thing i was just talking about where we give feedback to the user about whether her bridge is in action or what
06:18          armadev  | which i think will become even more important with #1839
06:22             isis@ | armadev: well, right, but then we should probably do either #2755 or…
06:22 -zwiebelbot:#tor-dev- tor#2755: Reconsider BridgeDB's pool assignment file implementation and deployment - https://bugs.torproject.org/2755
06:22            *           isis  can't find the other ticket
06:23             isis@ | i had a ticket that was for adding somewhere in the bridge-extrainfo descriptor a line like `BridgeDistribution 0` or `BridgeDistribution https`
06:23          armadev  | isis: to me #2755 is more about documenting how bridges were given out over the past, so we can match up load and blocking measurements with distribution to find patterns.
06:33            *           isis  found the torrc `BridgeDistribution https` tickets, they are #13727 and #13504
06:33 -zwiebelbot:#tor-dev- tor#13727: BridgeDB should not distribute Tor Browser's default bridges - https://bugs.torproject.org/13727
06:33 -zwiebelbot:#tor-dev- tor#13504: Bridges in Tor Browser Bundles should be public so that we have metrics on them - https://bugs.torproject.org/13504
07:03          armadev  | isis: so in summary (there are a lot of tickets), where are we at with the goals of remembering how we gave out bridges at which time, so we can use that to study the effectiveness of 
                          bridge distribution strategies in the past? and where are we at communicating to the operator what strategies we've used recently to give out her bridge?
07:08             isis  | currently, there is a pile of assignments.log files which continued to be produced and never got synced to Metrics
07:09             isis  | i could do #2755 soon, and ask karsten to allow BridgeDB to start syncing to Metrics again
07:09 -zwiebelbot:#tor-dev- tor#2755: Reconsider BridgeDB's pool assignment file implementation and deployment - https://bugs.torproject.org/2755
07:09            *        karsten looks at #2755
07:10             isis  | or, if karsten likes, i can provide an interface to BridgeDB's newer databases, so that the Metrics server can obtain data without additional processing/storage
07:10          karsten  | isis: or should we think about better usage statistics here?
07:11          karsten  | well, Metrics has only data that is archived by CollecTor.
07:11             isis  | sure, that sounds better than a string that likely has no meaning to most operators
07:11          karsten  | we could come up with better stats that are collected by CollecTor and then displayed/processed by Metrics and/or Onionoo.
07:11          armadev  | if there is historical how-we-distributed-it-when data that we have but we're not keeping, that's a bit sad
07:12             isis  | i have #14453 and #10218 which are along those lines
07:12          karsten  | you mean past assignment.log files?
07:12 -zwiebelbot:#tor-dev- tor#14453: Implement statistics gathering for number of Bridges-per-Transport in BridgeDB - https://bugs.torproject.org/14453
07:12 -zwiebelbot:#tor-dev- tor#10218: Provide "users-per-transport-per-country" statistics for obfsbridges - https://bugs.torproject.org/10218
07:12          armadev  | (though ideally there is how-it-got-blocked-when data somewhere out there, that we are not collecting and not keeping, and we'd ideally like to have both.)
07:12             isis  | karsten: yes, i have some past assignments.log files
07:13          armadev  | karsten: i think i don't mean statistics summaries, but rather, than underlying data.
07:13          armadev  | the sort of thing that researchers are going to want, a year from now, when they ask how that blocking event happened and which bridges it affected.
07:13          karsten  | we could convert existing logs into the new format.
07:14          karsten  | the yet-to-be-designed format.
07:15             isis  | the assignments.log files, did Metrics used to sanitise them by replacing the fingerprints with hashed fingerprints?
07:15          karsten  | isis: it seems #10218 is for little-t-tor, not bridgedb.
07:15          karsten  | yes, that's what it did. and it sorted them by hashed fingerprint, so that the order didn't reveal anything.
07:16          karsten  | maybe more.
07:16             isis  | steps like those are something that BridgeDB could easily do to begin with, if it would make the processing less intense
07:16          karsten  | that's right.
07:17          karsten  | it totally should do those steps.
07:18             isis  | and BridgeDB is parsing all the bridges into stem classes anyway, and is going to store them as json in couchDB, if that json is something more accessible
07:19          karsten  | json is easier than inventing our own data format, yes.
07:19          karsten  | we still need to think what to put into the json though.
07:19          Yawning  | xmllllllll
07:19          karsten  | xml in json, ok.
07:19          Yawning  | :D
07:19             isis  | one idea i had earlier was to allow collecTor to have certain queries on the new database (or the output of the query and some processing) for whatever statistics we wish to extract
07:21          karsten  | ideally, collector would fetch a thing every hour or so, verify it, and store it.
07:21          Yawning  | armadev: would #15515 count as what you want out of the defense at the intro point?
07:21 -zwiebelbot:#tor-dev- tor#15515: Don't allow multiple INTRODUCE1s on the same circuit - https://bugs.torproject.org/15515
07:21          Yawning  | or do you want something more sophisticated?
07:21             isis  | karsten: i was just going to put everything in the json, that way BridgeDB could do cooler stuff with detecting when certain fields have changed
07:22             isis  | karsten: verify means verify the descriptor signatures?
07:22          karsten  | ah, mostly that it's valid json and contains certain required fields.
07:22          karsten  | I think.
07:23          karsten  | not sure about putting in everything, including things that are already contained elsewhere,
07:23          karsten  | but it might be possible to remove certain fields while exporting to collector.
07:23             isis  | i planned on writing protobufs to define what data was valid for BridgeDB to be exporting
07:24          karsten  | okay, happy to learn what exactly that means. :)
07:24          Yawning  | it's google's serialization format
07:24             isis  | https://developers.google.com/protocol-buffers/docs/overview
07:24          Yawning  | you feed a definition file into a code generator and it outputs code that marshals/demarshalls stuffs
07:25          karsten  | nice, ok.
07:25             isis  | basically, i write a .proto file and it generates python, java, c, and/or go
07:25          Yawning  | https://capnproto.org/
07:25          Yawning  | see also
07:25          Yawning  | which is protobufs redesigned by the author after he left google
07:25          Yawning  | haven't used it, claims to be be better
07:26             isis  | lol, i can't tell if they are joking
07:26             isis  | "∞% faster!!"
07:27          Yawning  | heh 
07:27          Yawning  | if you read on they clarify what they mean
07:28          Yawning  | ymmv, protobufs is a fine format to use
07:28          Yawning  | and this thing may eat all ur dataz
07:28          karsten  | isis: okay, want to start a list of things to put into that json that are safe to be collected and published by collector?
07:28            *           isis  totally thinks they are joking at "Time-traveling RPC"
07:29          karsten  | isis: and is this something for a tor proposal?
07:29             isis  | but it is interesting and if they are not totally lying their pants off, then SUBSCRIBE
07:29          Yawning  | isis: the idea they're doingis actually p clever
07:29          karsten  | isis: or as addition to bridgedb-spec (if that exists)?
07:30          karsten  | oh, yes, it still exists, I helped write it..
07:30             isis  | karsten: currently, there is no proposal for better bridge statistics
07:30             isis  | although we could start one
07:30             isis  | and bridgedb-spec.txt lives in the top-level of tor-spec.git now
07:30          karsten  | oh, nice.
07:30          karsten  | you mean bridgedb has changed since  Date:   Fri Jul 5 01:40:49 2013 +0000
07:31          karsten  | I should git pull..
07:31             isis  | hah, it's almost entirely rewritten
07:31             isis  | i am finishing the final refactorings now
07:32             isis  | which is why i will have time and ability to do cool stuff, like the social distributor
07:32             isis  | (and better bridge metrics, if we want that)
07:34          karsten  | isis: just let me know if I can help with the stats side of things. it would be useful for bridge operators (onionoo/atlas/globe) and for sponsors (metrics).
07:35             isis  | karsten: is that sponsor S, or sponsors in general
07:35          karsten  | sponsors in general. I don't know what S wants.
07:35             isis  | karsten: ok, i will start making a proposal, and ask you to review it
07:36          karsten  | sounds great!
Version 0, edited 4 years ago by isis (next)

comment:11 Changed 4 years ago by isis

Priority: blockermajor
Status: needs_reviewnew

Okay, I merged the changes mentioned above for bridgedb-0.3.2. This at least fixes the issues with getting #4771 deployed. I'm decreasing the priority, because it's no longer blocking.

However, I think we still want separated hashrings, where only one subhashring of a hashring is available per period. This sounds like a good idea. I'm not sure how compatible it will be with running BridgeDB's distributors on separate machines (#12506), since in that model, each distributor will pretty much get assigned a chunk of bridges and be able to do whatever it wants with them. However, if we have generalised hashring classes that can be easily configured to do this subhashring rotation behaviour (see #12505 for hashring refactoring), this can be easily used by distributor implementations who wish to better protect their bridges from any attack enabling rapid bridge enumeration.

Last edited 4 years ago by isis (previous) (diff)

comment:12 Changed 22 months ago by isis

Resolution: fixed
Severity: Blocker
Status: newclosed

The above comment on subhashrings should be a different ticket, if/when we decide to do it.

Note: See TracTickets for help on using tickets.