When we fetch a bridge descriptor and already have a descriptor with the same identity digest cached, we refuse to add the new one. This means that when we have an old descriptor with purpose general, we won't change the purpose to bridge when we configured that relay as bridge in the Tor client.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
if (sdmap_get(routerlist->desc_digest_map, id_digest)) {
This part is bad news. Descriptor digests and identity digests are different things, and we shouldn't mix them. We shouldn't try looking up the id_digest in the desc_digest_map -- it'll never be there.
Check out git master (e.g. 8d588e7b1a4fccf), run Tor to fetch all your directory stuff, kill Tor, then run Tor pointing to a public relay as your bridge. It will fetch that relay's descriptor using router purpose bridge, and successfully store it (so the above patch worked to that extent).
But then it will discover that it doesn't have any general-purpose descriptor for that relay, and it's listed in the consensus so clearly it should have one. So it fetches one, and then replaces the bridge-purpose descriptor with the general-purpose descriptor, and then Tor freaks out because none of its bridges are up anymore.
A) Teach the Tor directory stuff that if a descriptor it wants is a
configured bridge, it doesn't want the general-purpose descriptor.
B) If you get a general-purpose descriptor for a relay which is currently
one of your bridges, for god's sake don't accept it.
C) ?
We clearly need to do A. It's not clear that doing B as
belt-and-suspenders helps -- if we do A correctly, then we don't
need B, and if we do B but don't do A correctly, we end up doing the
fetch-and-discard-repeat dance.
Trac: Status: reopened to accepted Owner: N/Ato arma
So back when I wrote the patch, I tested it and it appeared to work.
I'm testing with "src/or/tor --UseBridges 1 --Bridge 78.47.18.110:80", and could just use Tor for a few minutes without any problem. While I did that, "Dropping descriptor that we already have for router 'fluxe3'" was logged exactly once. I'm not quite sure what's up here.
It certainly seems we're not overzealously fetching the descriptor again. After running for more than one hour, my Tor only logged "Dropping descriptor that we already have for router 'fluxe3'" three times, roughly 15 minutes apart.
Ahhh, now I can reproduce the problem. It happens when I have a descriptor for fluxe3 without the bridge annotation when I first start up. It isn't dependant on how much other network information is present, the important thing is that there is a general-purpose descriptor in my cached-descriptors file. So I believe that this is not related to fetching a new descriptor from the network, but rather Tor goes back to the descriptor it has on disk.
I talked to arma about whether the assert is appropriate. He said to leave it in for now until he can review, but see bug1776_v4 for an alternative that doesn't assert, but just warns and continues.
Sebastian: for context, I originally put in the if (old_router) check just as a defensive programming measure, so we don't look at *old_router if it's NULL.
The first case that comes to mind where old_router could be NULL is if Tor has a bug somewhere such that routerlist->desc_digest_map and routerlist->identity_map get out of sync. We don't know of bugs now, but that's no promise they won't be introduced later.
I guess the second case that you're pointing out is if somebody can create a sha1 collision on structured data (relay descriptors here), such that they give you a descriptor A for relay identity B, and then when you're fetching a bridge descriptor for bridge identity C they give you a new descriptor D signed by C whose hash matches the hash of A. That would cause you to trigger the assert and exit.
I have to say, I'm not much worried about that DoS vulnerability. Somebody should audit the Tor code in their spare time for other such cases, to get a sense of how many there are, but I bet there are lots.
Nick, can you give some insight here? How pedantic should we be? bug1776_v3 seems like it resolves the issues, and it has an assert that shouldn't be triggered except if we have a bug or a bad assumption about sha1. Good enough?
Switching an assert to a log-and-carry-on is pretty trivial. I wouldn't be to worried about that, if the assert is as hard-to-trigger as you say. Is there something else you wanted insight on here?
I do not know what yetonetime is talking about; I can't figure out what he is saying. I am trying to translate it as 'It's not enough to warn and continue; we must drop the descriptor for a good reason [in the hash collision case]".
I am fine with either v3 or v4. If I were pushing for a v5, I'd also say "drop the descriptor in a collision case too," since it's not nice to have hash collisions flying around. Still, v3 is okay with me; I like v4 least. I'd push harder for my "v5", except for the fact that if if somebody can use hash collisions to attack Tor users, this isn't the right vector to do it with.
bug1776_v3 introduced an assert that could trigger on directory authorities.
The issue is that router_get_by_digest(id_digest) and sdmap_get(routerlist->desc_digest_map, router->cache_info.signed_descriptor_digest) are deliberately not kept in sync if you're a directory cache. That's because the desc_digest_map includes descriptors that are in old_routers, that is, the descriptors that are cached but not in the currently recommended routerlist.
The assert should only be triggerable on directory authorities, since other Tors avoid downloading something they already have: e.g. router_get_by_descriptor_digest() in update_consensus_router_descriptor_downloads() looks at routerlist->desc_digest_map, which is what we want.
The logic is going to be messy here, because it's not just a matter of "drop or don't drop if !old_router" -- we want to drop the new descriptor if desc_digest_map already has it and the copy we have (whether in routerlist or old_routers) is not worth replacing.
Probably we'll be happier working out the logic, using intermediate variables, rather than trying to cram it into one awful if. For example, a truth table based on all the variables would help us sort out what we actually want to do in each case.
So first off, this isn't authorities only: caches also have a nonempty routerlist->old_routers.
Second, assuming the behavior we want is:
"Drop duplicate descriptors, unless adding this one would turn a nonbridge into a bridge" then the inputs we need are:
'Is this a duplicate?' (Yes if we got anything from the sdmap_get(desc_digest_map) call.
'Is it a bridge?' (Yes if router->purpose says it is.)
'Was the old one a bridge?' (Yes if old_router != NULL and old_router->purpose says it is. But what if old_router == NULL? We'd want to look at the result of the sdmap_get() call. Can a bridge descriptor even make it into old_routers? Should it? If "yes" and "yes", is there anything about the signed_descriptor_t that tells us whether it's a bridge?)
Caches have a nonempty routerlist->old_routers but the assert can be detonated on a directory authorities only (by default). Another story is client with configured UseBridges and Bridge options.
piebeer was concerned that some of the calls from router_add_to_routerlist to the routerlist manipulation functions could fail if we do an operation on the routerlist when we already have a routerinfo or a signed_descriptor_t with the same identity_digest. I should fix that before we merge.
What 114a371c0ea4 does doesn't match its comment though.
/* If we have this descriptor already and the new descriptor is a bridge * descriptor, replace it. If we had a bridge descriptor before and the * new one is not a bridge descriptor, don't replace it. */
Well, this is a crash bug atm, so we should think about what we're trying to do here before the next 0.2.2.x release. I believe that the new code fixes that crash bug, even if the logic could be improved. I do not know what the right logic is. Maybe build a truth table and figure out what it should be? I tried that, and came up with this, so I am not the one to to it.
history note: signed_descs_update_status_from_consensus_networkstatus() was very happy with non actual signed_descriptor_t *d. software was good enough long time ago.