Opened 3 years ago
Last modified 2 years ago
#19570 reopened enhancement
Shared random round gets out of sync
Reported by: | teor | Owned by: | |
---|---|---|---|
Priority: | High | Milestone: | Tor: unspecified |
Component: | Core Tor/Tor | Version: | |
Severity: | Normal | Keywords: | tor-sr, tor-dirauth |
Cc: | dgoulet, asn | Actual Points: | |
Parent ID: | Points: | 0.5 | |
Reviewer: | Sponsor: |
Description (last modified by )
I have two test directory authorities which disagree on which round it is. I think it's because one is voting every half hour (due to consensus failure), and the other is voting every hour. It could also be due to their start times.
I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).
I started another test directory authority at 01:47:52 UTC
(Log times are UTC)
Jul 05 02:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 2 commit & 0 reveal rounds).
Is this just a logging / counting issue, or a serious bug that could affect the consensus?
Child Tickets
Attachments (2)
Change History (20)
comment:1 Changed 3 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 3 years ago by
Keywords: | tor-hs added |
---|
comment:3 Changed 3 years ago by
Keywords: | tor-sr added; tor-hs shared-random removed |
---|---|
Milestone: | → Tor: 0.2.9.x-final |
Priority: | Medium → High |
Sponsor: | → SponsorR-must |
comment:4 Changed 3 years ago by
Status: | new → needs_information |
---|
comment:5 Changed 3 years ago by
Replying to teor:
I have two test directory authorities which disagree on which round it is. I think it's because one is voting every half hour (due to consensus failure), and the other is voting every hour. It could also be due to their start times.
I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).
I started another test directory authority at 01:47:52 UTC
(Log times are UTC)
Jul 05 02:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 2 commit & 0 reveal rounds).
Is this just a logging / counting issue, or a serious bug that could affect the consensus?
This might indeed be a logging/counting issue and not a serious bug. The round counters that caused the confusion are just integer counters that get incremented everytime Tor calls sr_state_update()
. The counters don't actually understand the SR protocol and that's what causes these inconsistencies.
For example, if a dirauth does not have a consensus in its data directory at startup, it will call sr_state_update()
one extra time, because it will go into emergency voting mode, call sr_state_update()
and set up its state to vote at InitialVotingInterval
; that is at the half hour (e.g. at 02:30), instead of the next full hour (e.g. at 03:00). Then when that dirauth downloads a valid consensus, sr_state_update()
will be called again and it will correct its state to vote on the next full hour (e.g. at 03:00) as it should. So the dirauth without a consensus will call that function one extra time, and that's why the counter values can be inconsistent.
To see if that's the case, please attach your $ grep SR info.log
. :)
WRT the voting issue that dgoulet mentioned, I think this is just a timezone confusion. Since the log timezones are in local time UTC+10, at 12:00:01 when the message was logged it's actually 02:00:01UTC. So at that point it indeed makes sense to prepare our state for 03:00UTC. So I don't think there is an issue here.
comment:6 Changed 3 years ago by
Status: | needs_information → new |
---|
comment:7 follow-up: 8 Changed 3 years ago by
It seems like just a counting issue. But let's make sure that count is only ever used for display.
Evelyn was sleeping for a few hours today, and so the counts are:
Jul 07 23:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-07 14:00:00). Upcoming phase is reveal (counters: 10 commit & 3 reveal rounds).
Which is ok - the authority was present for 10/12 commit rounds, and rolled over to the reveal phase at the right time. (Remember, Evelyn's logs are in UTC+10, the time after "upcoming voting period" is in UTC.)
comment:8 Changed 3 years ago by
Resolution: | → not a bug |
---|---|
Status: | new → closed |
Replying to teor:
It seems like just a counting issue. But let's make sure that count is only ever used for display.
Agreed. We only use the counters for logging. Nothing else.
I'm talking about n_commit_rounds
, n_reveal_rounds
and n_protocol_runs
.
I'm gonna close this ticket for now. Feel free to reopen it if an issue is found. Cheers.
comment:9 Changed 3 years ago by
Resolution: | not a bug |
---|---|
Status: | closed → reopened |
Ok, now that I think about it, I really want to see the number of rounds based on the current time in the log. This would help diagnose any round mismatches (but feel free to close if you think it's not that important).
Jul 07 23:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-07 14:00:00). Upcoming phase is reveal (counters: 10/12 commit & 3/3 reveal rounds).
Where the number before the slash is the number of rounds the authority has seen since starting up, and the number after the slash is the current round number based on the time (assuming that there's a round every consensus period).
Sometimes, the number of rounds the authority has seen will be less, if it's just started up or has been down or (ugh) sleeping.
Sometimes, the number of rounds the authority has seen will be more, if it's voting every half hour because it doesn't have a valid consensus.
comment:10 Changed 3 years ago by
I think that's a plausible feature request. I've been wanting the same thing for a while, but never got down to actually implementing it.
To do so, we would need a function with similar slotting logic to get_sr_protocol_phase()
which given a timestamp returns the canonical phase and round at that time. Then we could also replace get_sr_protocol_phase()
with the new function.
comment:11 Changed 3 years ago by
Keywords: | 029-proposed added |
---|---|
Milestone: | Tor: 0.2.9.x-final → Tor: 0.2.??? |
Sponsor: | SponsorR-must → SponsorR-can |
This ticket is not a bug anymore. It's a non-essential feature request.
Changing ticket to more realistic severities.
comment:12 Changed 3 years ago by
Keywords: | 029-proposed removed |
---|
comment:13 Changed 3 years ago by
Please re-add this to 0.2.9 if you're going to have time to do it. :)
comment:15 Changed 3 years ago by
Keywords: | tor-03-unspecified-201612 added |
---|---|
Milestone: | Tor: 0.3.??? → Tor: unspecified |
Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.
comment:16 Changed 3 years ago by
Keywords: | tor-03-unspecified-201612 removed |
---|
Remove an old triaging keyword.
comment:17 Changed 3 years ago by
Keywords: | tor-dirauth added |
---|---|
Sponsor: | SponsorR-can |
Type: | defect → enhancement |
comment:18 Changed 2 years ago by
Cc: | dgoulet asn added |
---|
Replying to teor:
This one is worrying to me because there is suppose to be a gap of *one hour* (or 30 minutes) between the log time
12:00:01
and the upcoming voting period which here is03:00:00
.The upcoming time we see is taken from the
voting_schedule_t->interval_starts
.We use
get_voting_schedule()
for this and we log it at NOTICE level so you should have a bit before that log message this that should help us:I would probably speculate towards a voting schedule bug here since we really don't do any modification on that voting interval. The phases completely follows the voting schedule.
Attaching logs to this ticket would be useful also.