Opened 8 years ago

Closed 2 years ago

Last modified 2 years ago

#3023 closed defect (duplicate)

Tor directory authorities should not act as regular relays/hsdirs

Reported by: Sebastian Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: performance, bootstrap, needs-proposal, tor-dos-dirauth, tor-dirauth
Cc: weasel Actual Points:
Parent ID: Points: 3
Reviewer: Sponsor:

Description

In the past, it made sense to use directory authorities for all other network functions too, because they provided a significant contribution to the network's available bandwidth. Now that this isn't so anymore, and we're starting to see more and more bugs where the dirauths also act as relays, we should change that so the dirauths can focus on providing a consensus and bootstrapping functionality.

Child Tickets

Change History (25)

comment:1 Changed 8 years ago by arma

I could go either way here. That is, I'm not sure if I want to triage this into 'minor priority for 0.2.3.x' or into the 'tor: unspecified' milestone.

I don't think it's urgent to make this change; the particular issue that prompted Sebastian to make this trac entry turned out to be a false positive, and I don't know of any others that are biting us much currently.

We'd want to think harder about the design in any case, since for smaller Tor networks the directory authorities could be a substantial fraction of the network (heck, the Ironkey Tor network has no nodes that aren't authorities).

I guess one reason to make this change is because eventually Mike's bwauth scripts will make moot the MaxAdvertisedBandwidth hack that we use to discourage 'too much' relay traffic on authorities. I say 'eventually' because his scripts aren't robust/accurate enough to detect the hack.

comment:2 Changed 8 years ago by Sebastian

I thought the bw auths special-case the dirauths to not give them high weights?

comment:3 Changed 7 years ago by Sebastian

We're seeing more problems with dirauths that don't have enough capacity, because they're also popular relays. I feel that we should make sure dirauths aren't used as relays at all, so that bootstrapping clients, consensus generation and reachability testing always take precedence.

arma argues that this is bad, because dirauths provide more bandwidth to the network and more bandwidth is always good. I'm not sure that I'd agree with that, tho; because bootstrapping and consensus generation should take precedence.

Another argument is that private networks might be harder to bootstrap and use if dirauths can't be relays; but we could just make a flag to change that behaviour for private networks - or declare you need to run relays. I think even the latter wouldn't be a big constraint, setting up a private network is trivial due to chutney and even my privnet hacks, and throwing in a few relays doesn't hurt anything.

I'm listing problems that we've seen on dirauths and that I can remember offhand:

up until a year ago or so, dannenberg was extremely flaky because it didn't have enough bandwidth.
maatuska, dizum and gabelmoo were/are configured without a low enough MaxAdvertisedBandwidthRate, so that they often max out their RelayBandwidthRate. This causes problems to bootstrapping clients, and also during consensus generation.
tor26 is frequently hitting extremely high memory usage, not having to handle relay traffic might help

comment:4 in reply to:  3 Changed 7 years ago by arma

Replying to Sebastian:

tor26 is frequently hitting extremely high memory usage, not having to handle relay traffic might help

I think tor26's problems come from the fact that it's the sole remaining directory authority for a lot of obsolete Tor versions. So when one of those obsolete Tor versions has a bug that involves hammering the directory, it focuses on tor26.

(moria1 is not quite as old, but still sometimes hits the 32000+ socket mark, and most of those sockets are directory hammering attempts.)

comment:5 in reply to:  3 ; Changed 7 years ago by arma

Replying to Sebastian:

We're seeing more problems with dirauths that don't have enough capacity

I guess the days of directory authorities having plenty of excess bandwidth are gone. That's bad news too when it comes to dirauth DDoS concerns. Oops. Should we say that future directory authorities need to have strong (e.g. 100mbit) connections, even if they don't use it all in normal operation?

My main reason for not thinking we should add complexity to directory authorities (by special-casing them further) is that I think it's dangerous to have directory authorities that need it. I guess if it's really the case that our directory authorities are hitting their bandwidth limits, we've already failed at that goal. So be it.

comment:6 Changed 7 years ago by Sebastian

Well, you can hook up a tor server to a one gigabit connection and mostly fill it with just relay traffic, oftentimes. I think most dirauths are configured to rates much less than their link speed (gabelmoo is configured for 500KB, before I took it over it was at 250KB) on a 10Mbps link (also with a bw auth on the same link)

comment:7 in reply to:  5 Changed 7 years ago by arma

Replying to arma:

My main reason for not thinking we should add complexity to directory authorities (by special-casing them further) is that I think it's dangerous to have directory authorities that need it. I guess if it's really the case that our directory authorities are hitting their bandwidth limits, we've already failed at that goal. So be it.

I'm now fine with special-casing directory authorities so they don't carry traffic by default.

Clients already avoid them for directory requests when there's a non-dir-auth available. Should we have clients avoid authorities for circuits too when there are 'enough' other relays available?

Or said another way, how were you expecting to implement the idea?

comment:8 Changed 7 years ago by Sebastian

My idea was basically this:

index 58ceeda..3d7dd7e 100644
--- a/src/or/dirserv.c
+++ b/src/or/dirserv.c
@@ -2699,7 +2699,8 @@ dirserv_generate_networkstatus_vote_obj(crypto_pk_env_t *p
       vote_routerstatus_t *vrs;
       microdesc_t *md;
       node_t *node = node_get_mutable_by_id(ri->cache_info.identity_digest);
-      if (!node)
+      if (!node ||
+          router_digest_is_trusted_dir(ri->cache_info.identity_digest))
         continue;
 
       vrs = tor_malloc_zero(sizeof(vote_routerstatus_t));

comment:9 Changed 7 years ago by weasel

Cc: weasel added

comment:10 Changed 7 years ago by nickm

Hm. That code would make us never list authorities in the consensus. I'm worried that this would make clients and caches decide that all the authorities were down for all purposes, not just for relaying-traffic purposes. Perhaps instead we could just give them very very low bandwidths and very very low weights? Is there a reason that wouldn't work?

comment:11 Changed 7 years ago by Sebastian

It was my idea to not have them in the consensus at all, yeah.

I looked around in master and didn't see anything where we'd fail to work, and ran a test network, which didn't have any problems bootstrapping and being used. The situation looks to be a bit more complex in maint-0.2.1 and 0.2.2. If we decide to try this, we'd need more careful evaluation there.

As for why I'm favoring this approach, I'm mostly worried that we have some cornercase where relays without the Fast flag are preferred for traffic, and we end up pushing lots of users onto the dirauths when we lower traffic. Also I would generally like to head in a direction where dirauths aren't required to speak the Tor protocol as much, can't act as clients/HS, etc. For example, maybe bugs like the one plaguing tor26 that stem from the HS client code we left over could be avoided.

All that said, I'd also be happy to only go so far as to remove all the flags from dirauths and see where we stand then.

comment:12 in reply to:  11 Changed 7 years ago by arma

Replying to Sebastian:

It was my idea to not have them in the consensus at all, yeah.

This patch would make bridges fail to publish to Tonga, yes? Since they won't know the onion key so they can't extend their three-hop circuit to it. Similarly, it would break bridge users fetching descriptors from Tonga.

I looked around in master and didn't see anything where we'd fail to work, and ran a test network, which didn't have any problems bootstrapping and being used. The situation looks to be a bit more complex in maint-0.2.1 and 0.2.2. If we decide to try this, we'd need more careful evaluation there.

As for why I'm favoring this approach, I'm mostly worried that we have some cornercase where relays without the Fast flag are preferred for traffic, and we end up pushing lots of users onto the dirauths when we lower traffic.

There are some relays now without the Fast flag, and they're not getting mobbed. (If I have my way with #4489, there will be many more soon.)

I think it would be a much safer move to arrange to take away the Fast, Stable, Guard, and HSDir flag from authorities.

Also I would generally like to head in a direction where dirauths aren't required to speak the Tor protocol as much, can't act as clients/HS, etc. For example, maybe bugs like the one plaguing tor26 that stem from the HS client code we left over could be avoided.

We need them to still speak the Tor protocol enough to do reachability tests. That's most of the Tor protocol right there. Unless we change things so the authorities don't do their own reachability tests I guess.

All that said, I'd also be happy to only go so far as to remove all the flags from dirauths and see where we stand then.

Don't take away Running or they'll disappear from the consensus. :) And don't take away Valid or they'll hit that bug where relays without the Valid flag lose the Running flag.

comment:13 Changed 7 years ago by arma

Keywords: performance bootstrap added

comment:14 Changed 7 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: unspecified

comment:15 Changed 7 years ago by mikeperry

Keywords: dos-resistance added
Parent ID: #2664

I think this is also a DoS resistance feature. People can currently create loads of circuits through dirauths and sap their CPU Or bandwidth resources.

In addition to not being in the consensus, in an ideal world dirauths would also refuse circuit creation attemps (save for one-hop tunneled dirconns and descriptor submissions).

comment:16 Changed 7 years ago by mikeperry

Keywords: dirauth-dos-resistance added; dos-resistance removed

comment:17 Changed 7 years ago by nickm

Keywords: needs-proposal added

comment:18 Changed 7 years ago by nickm

Keywords: tor-auth added

comment:19 Changed 7 years ago by nickm

Component: Tor Directory AuthorityTor

comment:20 Changed 3 years ago by mikeperry

Keywords: tor-dos-dirauth added; dirauth-dos-resistance removed

Canonicalize dirauth-dos to tor-dos-dirauth

comment:21 Changed 3 years ago by nickm

Points: 3
Severity: Normal

comment:22 Changed 2 years ago by dgoulet

Keywords: tor-dirauth added; tor-auth removed

Turns out that tor-auth is for directory authority so make it clearer with tor-dirauth

comment:23 Changed 2 years ago by nickm

Parent ID: #2664
Resolution: duplicate
Status: newclosed

#18364 captures our current thinking on this, I think

comment:24 Changed 2 years ago by arma

#18364 is "Tor Browser in Gnu+Linux doesn't support Dingbats properly".

That may indeed be a great summary of our current thinking on this topic.

But if so, perhaps we should keep thinking. :)

comment:25 Changed 2 years ago by nickm

Oops. Try #18346 , "Separate the various roles that directory authorities play, from a configuration POV".

The only dingbat at issue here is me. :)

Note: See TracTickets for help on using tickets.