In the past, it made sense to use directory authorities for all other network functions too, because they provided a significant contribution to the network's available bandwidth. Now that this isn't so anymore, and we're starting to see more and more bugs where the dirauths also act as relays, we should change that so the dirauths can focus on providing a consensus and bootstrapping functionality.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I could go either way here. That is, I'm not sure if I want to triage this into 'minor priority for 0.2.3.x' or into the 'tor: unspecified' milestone.
I don't think it's urgent to make this change; the particular issue that prompted Sebastian to make this trac entry turned out to be a false positive, and I don't know of any others that are biting us much currently.
We'd want to think harder about the design in any case, since for smaller Tor networks the directory authorities could be a substantial fraction of the network (heck, the Ironkey Tor network has no nodes that aren't authorities).
I guess one reason to make this change is because eventually Mike's bwauth scripts will make moot the MaxAdvertisedBandwidth hack that we use to discourage 'too much' relay traffic on authorities. I say 'eventually' because his scripts aren't robust/accurate enough to detect the hack.
We're seeing more problems with dirauths that don't have enough capacity, because they're also popular relays. I feel that we should make sure dirauths aren't used as relays at all, so that bootstrapping clients, consensus generation and reachability testing always take precedence.
arma argues that this is bad, because dirauths provide more bandwidth to the network and more bandwidth is always good. I'm not sure that I'd agree with that, tho; because bootstrapping and consensus generation should take precedence.
Another argument is that private networks might be harder to bootstrap and use if dirauths can't be relays; but we could just make a flag to change that behaviour for private networks - or declare you need to run relays. I think even the latter wouldn't be a big constraint, setting up a private network is trivial due to chutney and even my privnet hacks, and throwing in a few relays doesn't hurt anything.
I'm listing problems that we've seen on dirauths and that I can remember offhand:
up until a year ago or so, dannenberg was extremely flaky because it didn't have enough bandwidth.
maatuska, dizum and gabelmoo were/are configured without a low enough MaxAdvertisedBandwidthRate, so that they often max out their RelayBandwidthRate. This causes problems to bootstrapping clients, and also during consensus generation.
tor26 is frequently hitting extremely high memory usage, not having to handle relay traffic might help
tor26 is frequently hitting extremely high memory usage, not having to handle relay traffic might help
I think tor26's problems come from the fact that it's the sole remaining directory authority for a lot of obsolete Tor versions. So when one of those obsolete Tor versions has a bug that involves hammering the directory, it focuses on tor26.
(moria1 is not quite as old, but still sometimes hits the 32000+ socket mark, and most of those sockets are directory hammering attempts.)
We're seeing more problems with dirauths that don't have enough capacity
I guess the days of directory authorities having plenty of excess bandwidth are gone. That's bad news too when it comes to dirauth DDoS concerns. Oops. Should we say that future directory authorities need to have strong (e.g. 100mbit) connections, even if they don't use it all in normal operation?
My main reason for not thinking we should add complexity to directory authorities (by special-casing them further) is that I think it's dangerous to have directory authorities that need it. I guess if it's really the case that our directory authorities are hitting their bandwidth limits, we've already failed at that goal. So be it.
Well, you can hook up a tor server to a one gigabit connection and mostly fill it with just relay traffic, oftentimes. I think most dirauths are configured to rates much less than their link speed (gabelmoo is configured for 500KB, before I took it over it was at 250KB) on a 10Mbps link (also with a bw auth on the same link)
My main reason for not thinking we should add complexity to directory authorities (by special-casing them further) is that I think it's dangerous to have directory authorities that need it. I guess if it's really the case that our directory authorities are hitting their bandwidth limits, we've already failed at that goal. So be it.
I'm now fine with special-casing directory authorities so they don't carry traffic by default.
Clients already avoid them for directory requests when there's a non-dir-auth available. Should we have clients avoid authorities for circuits too when there are 'enough' other relays available?
Or said another way, how were you expecting to implement the idea?
Hm. That code would make us never list authorities in the consensus. I'm worried that this would make clients and caches decide that all the authorities were down for all purposes, not just for relaying-traffic purposes. Perhaps instead we could just give them very very low bandwidths and very very low weights? Is there a reason that wouldn't work?
It was my idea to not have them in the consensus at all, yeah.
I looked around in master and didn't see anything where we'd fail to work, and ran a test network, which didn't have any problems bootstrapping and being used. The situation looks to be a bit more complex in maint-0.2.1 and 0.2.2. If we decide to try this, we'd need more careful evaluation there.
As for why I'm favoring this approach, I'm mostly worried that we have some cornercase where relays without the Fast flag are preferred for traffic, and we end up pushing lots of users onto the dirauths when we lower traffic. Also I would generally like to head in a direction where dirauths aren't required to speak the Tor protocol as much, can't act as clients/HS, etc. For example, maybe bugs like the one plaguing tor26 that stem from the HS client code we left over could be avoided.
All that said, I'd also be happy to only go so far as to remove all the flags from dirauths and see where we stand then.
It was my idea to not have them in the consensus at all, yeah.
This patch would make bridges fail to publish to Tonga, yes? Since they won't know the onion key so they can't extend their three-hop circuit to it. Similarly, it would break bridge users fetching descriptors from Tonga.
I looked around in master and didn't see anything where we'd fail to work, and ran a test network, which didn't have any problems bootstrapping and being used. The situation looks to be a bit more complex in maint-0.2.1 and 0.2.2. If we decide to try this, we'd need more careful evaluation there.
As for why I'm favoring this approach, I'm mostly worried that we have some cornercase where relays without the Fast flag are preferred for traffic, and we end up pushing lots of users onto the dirauths when we lower traffic.
There are some relays now without the Fast flag, and they're not getting mobbed. (If I have my way with #4489 (moved), there will be many more soon.)
I think it would be a much safer move to arrange to take away the Fast, Stable, Guard, and HSDir flag from authorities.
Also I would generally like to head in a direction where dirauths aren't required to speak the Tor protocol as much, can't act as clients/HS, etc. For example, maybe bugs like the one plaguing tor26 that stem from the HS client code we left over could be avoided.
We need them to still speak the Tor protocol enough to do reachability tests. That's most of the Tor protocol right there. Unless we change things so the authorities don't do their own reachability tests I guess.
All that said, I'd also be happy to only go so far as to remove all the flags from dirauths and see where we stand then.
Don't take away Running or they'll disappear from the consensus. :) And don't take away Valid or they'll hit that bug where relays without the Valid flag lose the Running flag.
I think this is also a DoS resistance feature. People can currently create loads of circuits through dirauths and sap their CPU Or bandwidth resources.
In addition to not being in the consensus, in an ideal world dirauths would also refuse circuit creation attemps (save for one-hop tunneled dirconns and descriptor submissions).