I have two test directory authorities which disagree on which round it is. I think it's because one is voting every half hour (due to consensus failure), and the other is voting every hour. It could also be due to their start times.
I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).
I started another test directory authority at 01:47:52 UTC
(Log times are UTC)
Jul 05 02:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 2 commit & 0 reveal rounds).
Is this just a logging / counting issue, or a serious bug that could affect the consensus?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Trac: Description: I have two test directory authorities which disagree on which round it is. I think it's because one is voting every half hour (due to consensus failure), and the other is voting every hour. It could also be due to their start times.
I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).
I started another test directory authority at 01:47:52 UTC
(Log times are UTC)
Jul 05 02:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 2 commit & 0 reveal rounds).
Is this just a logging / counting issue, or a serious bug that could affect the consensus?
to
I have two test directory authorities which disagree on which round it is. I think it's because one is voting every half hour (due to consensus failure), and the other is voting every hour. It could also be due to their start times.
I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).
I started another test directory authority at 01:47:52 UTC
(Log times are UTC)
Jul 05 02:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 2 commit & 0 reveal rounds).
Is this just a logging / counting issue, or a serious bug that could affect the consensus?
I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).
This one is worrying to me because there is suppose to be a gap of one hour (or 30 minutes) between the log time 12:00:01 and the upcoming voting period which here is 03:00:00.
The upcoming time we see is taken from the voting_schedule_t->interval_starts.
We use get_voting_schedule() for this and we log it at NOTICE level so you should have a bit before that log message this that should help us:
format_iso_time(tbuf, new_voting_schedule->interval_starts); tor_log(severity, LD_DIR,"Choosing expected valid-after time as %s: " "consensus_set=%d, interval=%d", tbuf, consensus?1:0, interval);
I would probably speculate towards a voting schedule bug here since we really don't do any modification on that voting interval. The phases completely follows the voting schedule.
Attaching logs to this ticket would be useful also.
I have two test directory authorities which disagree on which round it is. I think it's because one is voting every half hour (due to consensus failure), and the other is voting every hour. It could also be due to their start times.
I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).
I started another test directory authority at 01:47:52 UTC
(Log times are UTC)
Jul 05 02:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 2 commit & 0 reveal rounds).
Is this just a logging / counting issue, or a serious bug that could affect the consensus?
This might indeed be a logging/counting issue and not a serious bug. The round counters that caused the confusion are just integer counters that get incremented everytime Tor calls sr_state_update(). The counters don't actually understand the SR protocol and that's what causes these inconsistencies.
For example, if a dirauth does not have a consensus in its data directory at startup, it will call sr_state_update() one extra time, because it will go into emergency voting mode, call sr_state_update() and set up its state to vote at InitialVotingInterval; that is at the half hour (e.g. at 02:30), instead of the next full hour (e.g. at 03:00). Then when that dirauth downloads a valid consensus, sr_state_update() will be called again and it will correct its state to vote on the next full hour (e.g. at 03:00) as it should. So the dirauth without a consensus will call that function one extra time, and that's why the counter values can be inconsistent.
To see if that's the case, please attach your $ grep SR info.log. :)
WRT the voting issue that dgoulet mentioned, I think this is just a timezone confusion. Since the log timezones are in local time UTC+10, at 12:00:01 when the message was logged it's actually 02:00:01UTC. So at that point it indeed makes sense to prepare our state for 03:00UTC. So I don't think there is an issue here.
It seems like just a counting issue. But let's make sure that count is only ever used for display.
Evelyn was sleeping for a few hours today, and so the counts are:
Jul 07 23:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-07 14:00:00). Upcoming phase is reveal (counters: 10 commit & 3 reveal rounds).
Which is ok - the authority was present for 10/12 commit rounds, and rolled over to the reveal phase at the right time. (Remember, Evelyn's logs are in UTC+10, the time after "upcoming voting period" is in UTC.)
Ok, now that I think about it, I really want to see the number of rounds based on the current time in the log. This would help diagnose any round mismatches (but feel free to close if you think it's not that important).
Jul 07 23:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-07 14:00:00). Upcoming phase is reveal (counters: 10/12 commit & 3/3 reveal rounds).
Where the number before the slash is the number of rounds the authority has seen since starting up, and the number after the slash is the current round number based on the time (assuming that there's a round every consensus period).
Sometimes, the number of rounds the authority has seen will be less, if it's just started up or has been down or (ugh) sleeping.
Sometimes, the number of rounds the authority has seen will be more, if it's voting every half hour because it doesn't have a valid consensus.
Trac: Resolution: not a bug toN/A Status: closed to reopened
I think that's a plausible feature request. I've been wanting the same thing for a while, but never got down to actually implementing it.
To do so, we would need a function with similar slotting logic to get_sr_protocol_phase() which given a timestamp returns the canonical phase and round at that time. Then we could also replace get_sr_protocol_phase() with the new function.