This is an idea how to improve the current situation, where sometimes a directory authority is slow to get its vote out to the other dirauths, and so the dirauths don't all have the same sets of votes. To simplify, I'm illustrating with an example of three dirauths:
at :50, all dirauths make their vote and start uploading. auth1 and auth2 get their vote to all auths, but auth3 doesn't. it cannot publish a vote to auth1 at all, and it takes more than 2.5 minutes to publish its vote to auth2. at :52:30, all auths try fetching the votes they're missing from the other auths, so auth1 asks auth2 for auth3's vote, and auth2 asks auth1 for auth3's vote. auth3 asks nobody, and nobody asks auth3. At this point, neither auth1 nor auth2 have auth3's vote. auth3 now (at, for example, :53:30) succeeds publishing to auth2, so auth1 now has a vote from auth1 and auth2, auth2 and auth3 have a vote from auth1, auth2, and auth3. At :55 the auths try to make a consensus, but auth1 will end up with a different consensus than auth2 and auth3.
My idea to make this less of a problem would be that only for two minutes will we accept a vote that gets pushed to us, and anything we get later than that is considered "too late" and will be dropped. At :52:30 minutes, we still go ahead and try and fetch all votes from all the other authorities, and if they have a vote we will accept it. We repeat that fetching of all votes that we don't have at 53:00, 53:30, 54:00 and 54:30. That way, a delayed publication of the original vote will not cause this kind of split, where the dirauths have different opinions on who has voted, so only the dirauth that took more than 2 minutes to publish its descriptor to any of the other dirauths will be affected. There's still a race condition here, which is when a dirauth (within two minutes) only publishes to one other dirauth, and then that dirauth gets so slow it cannot get the vote to any of the other votes. But since it was fast enough to get the vote the first time, hopefully that's rather rare.
Does this all sound viable? Am I overlooking something?
Update: This bug was introduced in Tor 0.2.0.5-alpha, with the v3 authority voting code.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
I have been running my ticket4631-demo branch on moria1 lately, and it is catching and handling real buggy behavior by other directory authorities:
Jan 28 15:59:50.804 [warn] Rejecting vote from 199.58.81.140 received at 2020-01-28 20:59:50; our cutoff for received votes is 2020-01-28 20:52:30Jan 28 15:59:50.805 [warn] Rejected vote from 199.58.81.140 ("Vote received too late, would be dangerous to count it").Jan 29 01:52:52.667 [warn] Rejecting vote from 204.13.164.118 received at 2020-01-29 06:52:52; our cutoff for received votes is 2020-01-29 06:52:30Jan 29 01:52:52.669 [warn] Rejected vote from 204.13.164.118 ("Vote received too late, would be dangerous to count it").Jan 29 04:53:26.323 [warn] Rejecting vote from 204.13.164.118 received at 2020-01-29 09:53:26; our cutoff for received votes is 2020-01-29 09:52:30Jan 29 04:53:26.326 [warn] Rejected vote from 204.13.164.118 ("Vote received too late, would be dangerous to count it").
It would be swell for somebody to turn it into a branch that the network team is willing to merge. I don't think it would fully resolve this ticket but it is a nice step.
And in the mean time, other dir auths, if they're ok with manual patching, might enjoy running it.
I don't think we should backport this change, it's medium-risk, and it's also very hard to test. We would need to get authorities under severe load, or with bad clock skew.
That said, it resolves at least one common issue in chutney, so it is a definite improvement.
I'll open a child ticket for the spec patch.
Trac: Actualpoints: N/Ato 0.2 Description: This is an idea how to improve the current situation, where sometimes a directory authority is slow to get its vote out to the other dirauths, and so the dirauths don't all have the same sets of votes. To simplify, I'm illustrating with an example of three dirauths:
at :50, all dirauths make their vote and start uploading. auth1 and auth2 get their vote to all auths, but auth3 doesn't. it cannot publish a vote to auth1 at all, and it takes more than 2.5 minutes to publish its vote to auth2. at :52:30, all auths try fetching the votes they're missing from the other auths, so auth1 asks auth2 for auth3's vote, and auth2 asks auth1 for auth3's vote. auth3 asks nobody, and nobody asks auth3. At this point, neither auth1 nor auth2 have auth3's vote. auth3 now (at, for example, :53:30) succeeds publishing to auth2, so auth1 now has a vote from auth1 and auth2, auth2 and auth3 have a vote from auth1, auth2, and auth3. At :55 the auths try to make a consensus, but auth1 will end up with a different consensus than auth2 and auth3.
My idea to make this less of a problem would be that only for two minutes will we accept a vote that gets pushed to us, and anything we get later than that is considered "too late" and will be dropped. At :52:30 minutes, we still go ahead and try and fetch all votes from all the other authorities, and if they have a vote we will accept it. We repeat that fetching of all votes that we don't have at 53:00, 53:30, 54:00 and 54:30. That way, a delayed publication of the original vote will not cause this kind of split, where the dirauths have different opinions on who has voted, so only the dirauth that took more than 2 minutes to publish its descriptor to any of the other dirauths will be affected. There's still a race condition here, which is when a dirauth (within two minutes) only publishes to one other dirauth, and then that dirauth gets so slow it cannot get the vote to any of the other votes. But since it was fast enough to get the vote the first time, hopefully that's rather rare.
Does this all sound viable? Am I overlooking something?
to
This is an idea how to improve the current situation, where sometimes a directory authority is slow to get its vote out to the other dirauths, and so the dirauths don't all have the same sets of votes. To simplify, I'm illustrating with an example of three dirauths:
at :50, all dirauths make their vote and start uploading. auth1 and auth2 get their vote to all auths, but auth3 doesn't. it cannot publish a vote to auth1 at all, and it takes more than 2.5 minutes to publish its vote to auth2. at :52:30, all auths try fetching the votes they're missing from the other auths, so auth1 asks auth2 for auth3's vote, and auth2 asks auth1 for auth3's vote. auth3 asks nobody, and nobody asks auth3. At this point, neither auth1 nor auth2 have auth3's vote. auth3 now (at, for example, :53:30) succeeds publishing to auth2, so auth1 now has a vote from auth1 and auth2, auth2 and auth3 have a vote from auth1, auth2, and auth3. At :55 the auths try to make a consensus, but auth1 will end up with a different consensus than auth2 and auth3.
My idea to make this less of a problem would be that only for two minutes will we accept a vote that gets pushed to us, and anything we get later than that is considered "too late" and will be dropped. At :52:30 minutes, we still go ahead and try and fetch all votes from all the other authorities, and if they have a vote we will accept it. We repeat that fetching of all votes that we don't have at 53:00, 53:30, 54:00 and 54:30. That way, a delayed publication of the original vote will not cause this kind of split, where the dirauths have different opinions on who has voted, so only the dirauth that took more than 2 minutes to publish its descriptor to any of the other dirauths will be affected. There's still a race condition here, which is when a dirauth (within two minutes) only publishes to one other dirauth, and then that dirauth gets so slow it cannot get the vote to any of the other votes. But since it was fast enough to get the vote the first time, hopefully that's rather rare.
Does this all sound viable? Am I overlooking something?
Update: This bug was introduced in Tor 0.2.0.5-alpha, with the v3 authority voting code. Type: enhancement to defect Status: assigned to needs_review
Reviewing the idea only, before looking at the patch:
So the idea is that we cut off the deadline for POSTed votes early, to increase the odds that fetching votes from else will give everybody the same set of votes? That's plausible, but I'm wondering whether we're replacing one race condition with another. I'm also interested in:
what happens during the transition period where some authorities have upgraded but others have not.
whether the fetch time is the right point for the POST cutoff, or whether we'd be better off having it be a second or two before.
We should also think about our diagnostics for unsynchronized consensuses. Are they good enough that we'll be able to tell whether this is helping or hurting?
I've added a small comment on the patch. This looks reasonably solid.
Teor, how confident are you that this is actually helping out for the chutney case? Since you've been running into sync issues on chutney, I guess you've probably been testing this patch in that case?
Reviewing the idea only, before looking at the patch:
So the idea is that we cut off the deadline for POSTed votes early, to increase the odds that fetching votes from else will give everybody the same set of votes? That's plausible, but I'm wondering whether we're replacing one race condition with another.
I think we are replacing one race condition with another, but in practice, the replacement is less risky. Here's why:
in the first phase, each authority uploads its own vote to each other authority (1->N)
in the second phase, each authority downloads all votes from each other authority (N<-N)
So the second phase ensures that all well-connected authorities have the same set of votes. But it requires "no partial, late uploads", because each partial, late upload splits the set of authorities.
This patch enforces the "no late uploads" rule, which implies "no partial, late uploads".
I'm also interested in:
what happens during the transition period where some authorities have upgraded but others have not.
Some authorities reject late uploads, others do not.
If the late upload is a full, late upload, then the set of authorities splits between:
authorities that reject the late upload, and
authorities that accept the late upload.
The mitigation is to ensure that all authorities' clocks are synchronised within ~5 minutes.
If the late upload is a partial, late upload, then the set of authorities is already split. The authorities that reject the late upload are protected from the split. (Strictly, they reject the late upload, and end up in the set of authorities that didn't get the vote).
In my chutney tests, this patch replaces "mistmatched digest" (consensus failure) with "Vote received too late" (late vote rejection, consensus success).
I've also done some testing in mixed 0.3.5 and master networks (equal numbers of authorities). I haven't seen a consensus failure in a mixed network after applying this patch, they were common before.
whether the fetch time is the right point for the POST cutoff, or whether we'd be better off having it be a second or two before.
The cutoff time is the time that the entire vote has been received. So the current code works as-is. But ideally, we would want multiple authorities to receive each vote before the cutoff.
For chutney (and other fast networks), the current cutoff should not be changed: the intervals are only a few seconds long.
For the public network (and other slow networks), votes are about 3.5 MB, and let's say authorities have about 10 Mbps of bandwidth spare for votes. So that's:
3 seconds per vote upload * 8 authorities = 24 seconds
So let's set the cutoff to: fetch_missing_votes - vote_delay/6 ?
Note that fetch_missing_votes is already start - dist_delay - (vote_delay/2), so we can't go any earlier than fetch_missing_votes - vote_delay/4.
Also, fast networks need about 4 seconds to vote reliably (upload, process, download, process), so the extra cutoff should be at most vote_delay/5.
Interestingly, the "successful vote upload" bandwidth cutoff for public authorities is about:
3.5 MB * 8 vote uploads / 300 seconds * 8 bits = 0.75 Mbit/s
We should also think about our diagnostics for unsynchronized consensuses. Are they good enough that we'll be able to tell whether this is helping or hurting?
Yes, I've seen the diagnostics change in chutney from "mistmatched digest" to "Vote received too late".
If rejecting late votes splits the consensus, we'll see both "Vote received too late" and "mistmatched digest" for the same consensus period.
Teor, how confident are you that this is actually helping out for the chutney case? Since you've been running into sync issues on chutney, I guess you've probably been testing this patch in that case?
It is surprisingly effective. I haven't seen a consensus failure in ~100 chutney runs, since applying this patch.
I've added a small comment on the patch. This looks reasonably solid.
I've fixed some of the logging in the patch.
Our remaining work is to choose the exact threshold.
Add all the tickets from sponsor 55 that are done and being worked on to the keyword #network-team-roadmap-2020Q1 so I can look at them in the wiki page...