BridgeDB re-assigns unallocated bridges from/to file buckets without need
It looks like BridgeDB re-assigns unallocated bridges from/to file buckets without need. That is, a bridge that keeps running from one network status to the next might be removed from a file bucket and replaced with another bridge. This leads to quick enumeration of all bridges in the unallocated pool when using file buckets.
A second bug seems to be that BridgeDB appends bridges to file buckets instead of overwriting these files. The result is that there are duplicate entries in files that external distributors use.
Here's how one can reproduce the problem using sanitized bridge descriptors. Yes, this description is lengthy and ugly, but it works for testing BridgeDB in general, even if one doesn't have the original bridge descriptors handy.
Download and extract the sanitized bridge descriptors from January 2009:
https://metrics.torproject.org/data/bridge-descriptors-2009-01.tar.bz2
Create a single bridge-descriptors file containing all bridge descriptors from that month.
$ cd bridge-descriptors-2009-01/ $ echo "@purpose bridge" > purpose $ echo "router-signature" > routersignature $ find server-descriptors/ -type f | xargs -I{} cat purpose {} routersignature > bridge-descriptors
Copy the new bridge-descriptors file to BridgeDB's working directory (here: ~/run/
).
$ cd ~/run/ $ cp bridge-descriptors-2009-01/bridge-descriptors .
Also copy the sanitized network status file from 2009-01-10 00:07:04 to BridgeDB's working directory and rename it to networkstatus-bridges.
$ cp bridge-descriptors-2009-01/statuses/10/20090110-000704-4A0CCD2DDC7995083D73F5D667100C8A5831F16D networkstatus-bridges
Configure BridgeDB to write 4 bridges to file bucket twitter
, otherwise keep the default configuration. Start BridgeDB (note that it may take 30 seconds to digest the 8.5M bridge-descriptors file) and dump bridges to file buckets.
The result is a new file twitter-2011-03-09.brdgs
with this content (may vary for you):
10.134.79.249:443
10.236.199.173:443
10.116.76.140:9001
10.51.76.151:18443
It also writes this unallocated-2011-03-09.brdgs
file (again, content may vary):
10.126.198.237:443
10.251.69.61:9003
10.31.186.235:49001
10.81.88.5:9001
Replace the networkstatus-bridges with the one from roughly 30 minutes later:
$ cp bridge-descriptors-2009-01/statuses/10/20090110-030709-4A0CCD2DDC7995083D73F5D667100C8A5831F16D networkstatus-bridges
Give BridgeDB a HUP, wait at least 30 seconds, and tell it to dump bridges to file buckets again.
Here's my new twitter-2011-03-09.brdgs
file:
10.134.79.249:443
10.236.199.173:443
10.116.76.140:9001
10.51.76.151:18443
10.51.76.151:18443
10.237.143.0:443
10.241.115.62:443
10.239.76.198:443
And my unallocated-2011-03-09.brdgs
file:
10.126.198.237:443
10.251.69.61:9003
10.31.186.235:49001
10.81.88.5:9001
10.126.198.237:443
10.251.69.61:9003
10.31.186.235:49001
10.81.88.5:9001
10.134.79.249:443
10.236.199.173:443
10.116.76.140:9001
There are two bugs:
-
Why was 10.236.199.173:443 (2nd line in
twitter-2011-03-09.brdgs
) removed from this file bucket and put back in the unallocated ring again (last but one line inunallocated-2011-03-09.brdgs
)? I confirmed that this is the same bridge using the new pool assignments patch that is not merged yet. This is the first bug described above. -
Why are IP:port lines appended to these files? This is the second bug described above.