when a v2 directory goes away, other tor authorities keep serving their cached copy of the v2 status document of that directory.
If that status directory is old this will result in clients (clients or other relays?) downloading the status document, realizing it's to old, and trying to download it again. ad inf.
While tor26 was serving dizum's two-day old status document it was completely swamped. it had thousands of directory requests open at a time, they were consuming all the bandwidth and memory and it didn't even get to properly participate in consensus building.
Removing dizum's old status document from the cache and restarting tor26 made it happy. It now says 404 and clients don't come back (or if they do, at least it's a cheap 404 and not "here's 100k you'll throw away immediately, have it as often as you want".
I think we should stop serving expired status documents.
Or maybe we should stop serving them entirely. If we still need them between authorities, let's move them to a different URL.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Who the heck is downloading v2 networkstatuses? Only authorities should be doing that.
So, I'd like to finally implement proposal 147 in the 0.2.4.x series, which should be the final nail in the coffin of any need for v2 networkstatuses. With that in mind, let's figure out if there's some minimal version of this that will work in the meantime to prevent a recurrence of the issue you describe. No need for anything fancy, since we're going to be removing the need entirely.
I think "no longer serve expired ones" is just fine for now; changing the URL on the other hand would break authorities that haven't upgraded to know the new URL.
Marking for 0.2.4.x, unless this is vital enough for 0.2.3.
I think we don't need them between authorities anymore. As Nick says, proposal 147 would be nice. But git commits 2e692bd8c9 and eaf5487d95 (in Tor 0.2.2.12-alpha) made authorities look at v3 votes and fetch descriptors that are new to them:
- Many relays have been falling out of the consensus lately because not enough authorities know about their descriptor for them to get a majority of votes. When we deprecated the v2 directory protocol, we got rid of the only way that v3 authorities can hear from each other about other descriptors. Now authorities examine every v3 vote for new descriptors, and fetch them from that authority. Bugfix on 0.2.1.23.
and I think that is basically the "minimal version that will work in the meantime" that Nick wants.
FYI: Because of massive amounts of legacy badcode fail, I have begun restricting dirport requests to only the directory authorities. I am willing to whitelist legitimate legacy services by IP. Adding myself to Cc in case this actually matters for anyone who cares.
FYI: Because of massive amounts of legacy badcode fail, I have begun restricting dirport requests to only the directory authorities. I am willing to whitelist legitimate legacy services by IP. Adding myself to Cc in case this actually matters for anyone who cares.
Wait, what? Almost all relays use your dirport to ask you for the consensus or descriptors that you tell them about. That's what your dirport is for.
Who the heck is downloading v2 networkstatuses? Only authorities should be doing that.
I believe there are some old Tor relays out there who still go to the authorities for v2 status documents. Heck, they might even be Tor clients, if they're old enough.
I believe we made dir mirrors stop mirroring v2 a while ago. Before we dump v2 statuses entirely, it might be wise to look through the old code to see what they would do. (Another option is to make a plan for how to measure how it's going, and then dump them and measure how it's going.)
weasel is being polite in his choice of words, but he's totally right: this is a serious hassle and threat to the network.
Nick/Andrea, can you fit either "debug this" or "make it easier for others to debug stuff like this" into your medium-term schedule?
I don't have much of a clue for what the "debugging" would be here. It would be very easy to implement "serve no authority's v2 status document but your own" some time this week. Would that be a fix here? If so, it's like a 20-line patch at most.
To be concrete, I've put a few possible things up at "bug6783_maybe" in my public repository. They're untested as heck. Under some characterizations of the problem above, they'll solve it. Under others, they won't. What do you think?
I was having a hard time tracking down the definition of expiry for v2 networkstatuses. Is it "24 hours"? according to dir-spec-v2 it is, but it looks like Tor has defined MAX_NETWORKSTATUS_AGE as "240 hours" since at least 0.1.2.x.
I tried changing the rule to "Don't serve networkstatuses older than X hours" (for X=2) in 94b6d1d7e60e933e84c52c3335e58be851958bb1 : we could change X to whatever the critical interval is, if we can figure out what it should be.
Okay, so here's a possible plan to do what Roger prefers:
Implement a patch with an option to disables serving V2 directory information entirely.
Try a test network where there are v2 authorities, and clients and servers running older versions of Tor, with that option. Try enabling that option for a subset of the v2 authorities, and for all of them. Monitor the load on the authorities, and the behavior of the clients, to see if anybody's hosing anybody else.
If that doesn't explode, merge that patch into master, and have the remaining v2 authorities try it out all at once. This WILL require coordination, so that y'all can turn the option off in a hurry if it turns out to DoS the network in a way we didn't anticipate.
(I'm okay with doing this testing after the 10 Dec deadline, since the actual amount of code changes is small, and authority stuff doesn't have to be quite as stable quite as fast. That said, this is a tricky feature, and we shouldn't let it wait indefinitely.)