The idea here is that even if we hit the global write limit (bw), we should not return 503 code but rather answer another directory authority.
Dirauth must be able to talk to each other at all time regardless of the bandwidth state.
Setting 043 milestone because this should be considered a bug and could even be considered for backport since dirauth are suffering from this at the moment.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
"is the one of a configured directory" -> "is a configured directory"
and a bigger issue:
"so it might get a 503 code and thus fail the upload of its brand new descriptor" -- I don't think you can get a 503 in response to a post attempt. That is, we only check global_write_bucket_low() in five cases:
handle_get_current_consensus(), in response to a vanilla or microdesc consensus request
handle_get_status_vote(), for when somebody is asking for our current or most recent vote [that one's fun because only dir auths serve votes, and previously dir auths would never decide to reply with a 503]
handle_get_microdesc(), when somebody is asking for individual microdescs
handle_get_descriptor(), same as above but for vanilla descriptors
handle_get_keys(), when somebody is asking for authority certificates
So the "To clarify further the situation:" paragraph in the commit comment needs to change. I think the problematic scenario is that relays try to fetch new consensus and descriptor documents from authorities, because directory_fetches_from_authorities(), but the authorities give them a 503 and then they don't have a proper cached version to give out when clients come asking, and then clients don't get their network view and it all falls apart.
That's why this patch here should be ok for one or two authorities to run, but not more, until we also do the "whitelist relays" piece of it.
That's why this patch here should be ok for one or two authorities to run, but not more, until we also do the "whitelist relays" piece of it.
Oh! I see that it does include this clause. So we should rename this ticket. I tried a new name here.
Trac: Summary: dir-auth: Never send a 503 directory request code to another directory authority to dir-auth: Dir auths should resume sending 503's but never to relays or other dir auths
Another optimization we could do here: the whole priority thing is no longer used, since all five callers send in "priority 2". So we could either rip it out, or we could re-purpose it to use new kinds of priorities, like microdesc vs vanilla.
Slight preference for keeping it in, and then using it for item (6) in #33018 (moved) ("handle vanilla vs microdesc flavors differently").
Please check for authority IPv6 addresses. I'm just about to make relays use IPv6 to authorities as part of sponsor 55, so we need an IPv6 check when we whitelist relays.
There's a few subtle address issues in this patch, which we should document:
configured vs consensus addresses
inbound vs outbound addresses
See the PR for details.
I have been running this patch on moria1 lately, with an additional patch where I send a 503 response to vanilla-flavored consensus fetches, or old-style descriptor fetches, if they're not from a dir auth or relay address, even if I otherwise have enough bandwidth to answer them.
With both patches in place, moria's outbound traffic has gone from 200-500mbit/s down to 10-40mbit/s.
Here are some stats from a one hour period (1400 to 1500 EST):
Dir auth requests
I whitelisted 13 dirport connections from dir auths during this time:
Jan 24 14:18:32.445 [notice] Prioritizing dir auth responseJan 24 14:31:28.374 [notice] Prioritizing dir auth responseJan 24 14:50:03.921 [notice] Prioritizing dir auth responseJan 24 14:50:04.452 [notice] Prioritizing dir auth responseJan 24 14:50:04.836 [notice] Prioritizing dir auth responseJan 24 14:50:05.016 [notice] Prioritizing dir auth responseJan 24 14:50:05.261 [notice] Prioritizing dir auth responseJan 24 14:50:05.458 [notice] Prioritizing dir auth responseJan 24 14:50:05.575 [notice] Prioritizing dir auth responseJan 24 14:50:05.697 [notice] Prioritizing dir auth responseJan 24 14:50:05.808 [notice] Prioritizing dir auth responseJan 24 14:50:07.510 [notice] Prioritizing dir auth responseJan 24 14:50:09.915 [notice] Prioritizing dir auth response
Looking through the logs, these are all for /extra/d or server/d, i.e. old-style descriptors. Most of them occur a little bit after :50, which makes sense because that would be when those other dir auths discovered new descriptors from the vote I just sent them.
Relay requests
I whitelisted 847 dirport requests from relay addresses during this period. Of these, 763 of them happened in the first half of the period (:00 through :30), which makes sense because fetch-extra-early tries to get a cached copy of the consensus in place before clients start asking for it. Accounting for a bit of time skew, 830 of the 847 happened between :00 and :33.
Spot-checking these fetches, every single one of them that I looked at was a fetch for /micro/d, i.e. fetching a new microdescriptor. That's weird! I would have thought many of them would be fetching a microdesc-flavored consensus. I wonder if I am failing to log those, or if the process of answering them bypasses global_write_bucket_low().
Non-relay requests
I sent back 110818 "503" responses during this hour (i.e. averaging over thirty "503" responses per second).
Of those, 105452 were for "network status lists":
Jan 24 14:00:00.038 [info] handle_get_current_consensus(): Client asked for network status lists, but we've been writing too many bytes lately. Sending 503 Dir busy.
which I think is almost entirely requests to
And the remaining 5366 were for "server descriptors":
Jan 24 14:00:02.387 [info] handle_get_descriptor(): Client asked for server descriptors, but we've been writing too many bytes lately. Sending 503 Dir busy.
Please check for authority IPv6 addresses. I'm just about to make relays use IPv6 to authorities as part of sponsor 55, so we need an IPv6 check when we whitelist relays.
Thanks, teor, this is a great point.
I actually think I have a better plan that will accomplish your goal and the rest of these goals better: let's make sure the dir auth addresses (all of them) are added to the bloom filter that nodelist_probably_contains_address() checks, and then just only check that and we're done. That is, the logic should be: "if we're a dir auth, always answer every question from other relays."
It looks like the ipv6 address for relays gets added properly in node_add_to_address_set():
if (!tor_addr_is_null(&node->rs->ipv6_addr)) address_set_add(the_nodelist->node_addrs, &node->rs->ipv6_addr);[...] if (!tor_addr_is_null(&node->ri->ipv6_addr)) address_set_add(the_nodelist->node_addrs, &node->ri->ipv6_addr);[...] if (!tor_addr_is_null(&node->md->ipv6_addr)) address_set_add(the_nodelist->node_addrs, &node->md->ipv6_addr);
So the change we would want to make is in or near nodelist_set_consensus(), where right after we call
/* Now add all the nodes we have to the address set. */ SMARTLIST_FOREACH_BEGIN(the_nodelist->nodes, node_t *, node) { node_add_to_address_set(node); } SMARTLIST_FOREACH_END(node);
we call some similar thing that loops through trusted_dir_servers and calls address_set_add() and/or address_set_add_ipv4h() on each known dir auth address. That way we get the dir auth addresses in the consensus because they are relays, and we get the configured addresses (if different) with this new code.
And then the minor worry in dgoulet's code about "if this shows up in the profile, we can move to have an address set instead" gets resolved too because it is just one thing we are checking.
And as a tiny bonus, we handle dir auth addresses as though they are relays from the perspective of the DoS module, which is probably a thing we should have done from the beginning there anyway.
And then the minor worry in dgoulet's code about "if this shows up in the profile, we can move to have an address set instead" gets resolved too because it is just one thing we are checking.
The reason I did not go with the dirauth + bloomfilter idea is because when we ask "is this address a dirauth?", we should NOT get a probabilistic answer which is what the address_set does unfortunately.
It considers IPv6 but as teor mentions, only the trusted configured authorities, not from the consensus. For dirauth, that is fine for now since they all configure the other dirauths in their torrc.
Also, this can backfire if a dirauth outbound address is different from its inbound DirPort address.
And then the minor worry in dgoulet's code about "if this shows up in the profile, we can move to have an address set instead" gets resolved too because it is just one thing we are checking.
The reason I did not go with the dirauth + bloomfilter idea is because when we ask "is this address a dirauth?", we should NOT get a probabilistic answer which is what the address_set does unfortunately.
Are you sure here? We never get false negatives; only false positives. When we're asking "is address a dirauth" it's not so bad if we accidentally allow a tiny fraction of non-dirauths; only if we accidentally block a dirauth.
Are you sure here? We never get false negatives; only false positives. When we're asking "is address a dirauth" it's not so bad if we accidentally allow a tiny fraction of non-dirauths; only if we accidentally block a dirauth.
Ah true!!! I always forgot it is the other way around. Ok then, we should simply use the address set for which we'll get the IPv6 for free it seems.
Now the approach is simplified. HOWEVER, because this branch only uses the nodelist address set, the authority will fail to recognize its fellow authorities as long as it doesn't have a consensus. I think that is fine but it might not be if anyone can think of a reason why. I can see that the authority is starting and gets bombarded already but doesn't have a consensus?
If this is indeed an issue, we'll have to fallback to testing the trusted dir list directly.
In commit 823006f5, in the commit message, s/seperate/separate/ and s/addresse/address/
If somebody shows me how to submit this as a comment on the commit in github, I will do that. :)
[Edit: ok, there is a way to submit a comment on the whole PR. I did that. If you know a way to submit a comment on a commit, I am still hoping to learn that. :)