When parsing bridge descriptors, BridgeDB assumes that descriptors in the bridge descriptor files are in chronological order and that descriptors in cached-descriptors.new are newer than those in cached-descriptors. If this is not the case, BridgeDB overwrites a bridge's IP address and OR port with those from an older descriptor.
I think that the current cached-descriptors* files that Tor produces always have descriptors in chronological order. But once we change that, e.g., when trying to limit the number of descriptors that Tor memorizes, BridgeDB will behave funny.
We should look at the bridge descriptor that is referenced from the bridge network status by its publication time and ignore all other bridge descriptors from the same bridge.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
If this was referring to the cached-extrainfo and cached-extrainfo.new files (to my knowledge, BridgeDB has never had cached-descriptor* files), then this is a bug (related to #11216 (moved)), and it would mean that the transport lines of newer descriptors would potentially be overwritten by older, duplicate descriptors.
If that's the bug we're talking about, then here's the fix. :) Otherwise, feel free to reopen and/or add more information.
Those commits introduce the bridgedb.parse.descriptors.deduplicate()function, which is called in the bridgedb.parse.descriptors.parseExtraInfoFiles()function. The former deduplicates all descriptors for every bridge, selecting only the newest descriptor for a particular bridge. Additionally, if any Bridge has multiple @type bridge-extrainfo descriptors with exactly the same timestamps, then a bridgedb.parse.descriptors.DescriptorWarningwill be issued, since perfectly identical descriptors shouldn't be something an unmodified tor is capable of doing (and thus would imply that there is either a drastic regression in tor, or that someone has created a possibly-malicious OR implementation). Unittests and integration tests which verify that these behaviours are functioning as expected have also been added.
to my knowledge, BridgeDB has never had cached-descriptor* files
Hm? That's how bridgedb used to know what bridges exist -- Tonga would export its cached-descriptor* files and bridgedb would import them.
In fact, I'm a bit confused that it doesn't still have them, yet there are extrainfo descriptors. How do you know which extrainfo descriptor matches up to which bridge descriptor? Isn't that what the "extra-info-digest" line in the bridge descriptor is for?
to my knowledge, BridgeDB has never had cached-descriptor* files
Hm? That's how bridgedb used to know what bridges exist -- Tonga would export its cached-descriptor* files and bridgedb would import them.
The files currently given to BridgeDB by Tonga are: networkstatus-bridges, bridge-descriptors, cached-extrainfo, and cached-extrainfo.new.
In fact, I'm a bit confused that it doesn't still have them, yet there are extrainfo descriptors. How do you know which extrainfo descriptor matches up to which bridge descriptor? Isn't that what the "extra-info-digest" line in the bridge descriptor is for?
Yes, that is what it is for.
No, BridgeDB (as of #9380 (moved)) doesn't currently do this, but instead chains the verification of descriptors using the router-signature on the @type bridge-extrainfo document. (Although, I can gladly add code to check the descriptor digest too… that would be part of #9380 (moved). And that might possibly require more resources for the parsing and hashing of the @type bridge-extrainfo descriptors during the extrainfo deduplication, stage !#6 (closed) below, since the deduplication would need to do the hashing for each one and check that the hashes match, and I would still prefer to additionally check the signature on the @type bridge-extrainfo descriptor, so that both would need to validate before updating the Bridge with any of the extrainfo.)
BridgeDB's verification chain for descriptors currently (as of #9380 (moved)) goes like this:
Parse the @type bridge-networkstatus documents in the networkstatus-bridges file.
Create Bridgeclass instances for each this we parsed in step !#1. Call the Bridge.updateFromNetworkStatus()method with the corresponding networkstatus document for each Bridge. This includes storing the descriptor digest for each Bridge.
Parse the @type bridge-server-descriptors found in the bridge-descriptors file.
Store the extra-info-digest from each @type bridge-server-descriptor.
Parse and deduplicate the @type bridge-extrainfo descriptors in cached-extrainfo and cached-extrainfo.new.
Verify the router-signature on the @type bridge-extrainfo descriptor for each bridge, using the signing-key from the Bridge's @type bridge-server-descriptor.
The files currently given to BridgeDB by Tonga are: networkstatus-bridges, bridge-descriptors, cached-extrainfo, and cached-extrainfo.new.
Ah ha. Somewhere in the transfer process I believe it does a "cat cached-descriptors* > bridge-descriptors". So those are indeed the bridge descriptors, just in a different file name, and both files together.
BridgeDB's verification chain for descriptors currently (as of #9380 (moved)) goes like this: