Opened 3 years ago

Last modified 2 years ago

#19570 reopened enhancement

Shared random round gets out of sync

Reported by: teor Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-sr, tor-dirauth
Cc: dgoulet, asn Actual Points:
Parent ID: Points: 0.5
Reviewer: Sponsor:

Description (last modified by teor)

I have two test directory authorities which disagree on which round it is. I think it's because one is voting every half hour (due to consensus failure), and the other is voting every hour. It could also be due to their start times.

I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).

I started another test directory authority at 01:47:52 UTC
(Log times are UTC)
Jul 05 02:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 2 commit & 0 reveal rounds).

Is this just a logging / counting issue, or a serious bug that could affect the consensus?

Child Tickets

Attachments (2)

evelyn_rounds.log (12.4 KB) - added by teor 3 years ago.
SR info log from test authority Evelyn
betty_rounds.log (57.0 KB) - added by teor 3 years ago.
SR info log from test authority Betty

Download all attachments as: .zip

Change History (20)

comment:1 Changed 3 years ago by teor

Description: modified (diff)

comment:2 Changed 3 years ago by teor

Keywords: tor-hs added

comment:3 Changed 3 years ago by dgoulet

Keywords: tor-sr added; tor-hs shared-random removed
Milestone: Tor: 0.2.9.x-final
Priority: MediumHigh
Sponsor: SponsorR-must

comment:4 in reply to:  description Changed 3 years ago by dgoulet

Status: newneeds_information

Replying to teor:

I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).

This one is worrying to me because there is suppose to be a gap of *one hour* (or 30 minutes) between the log time 12:00:01 and the upcoming voting period which here is 03:00:00.

The upcoming time we see is taken from the voting_schedule_t->interval_starts.

We use get_voting_schedule() for this and we log it at NOTICE level so you should have a bit before that log message this that should help us:

    format_iso_time(tbuf, new_voting_schedule->interval_starts);
    tor_log(severity, LD_DIR,"Choosing expected valid-after time as %s: "
            "consensus_set=%d, interval=%d",
            tbuf, consensus?1:0, interval);

I would probably speculate towards a voting schedule bug here since we really don't do any modification on that voting interval. The phases completely follows the voting schedule.

Attaching logs to this ticket would be useful also.

comment:5 in reply to:  description Changed 3 years ago by asn

Replying to teor:

I have two test directory authorities which disagree on which round it is. I think it's because one is voting every half hour (due to consensus failure), and the other is voting every hour. It could also be due to their start times.

I started this test directory authority at 00:54:30 UTC
(Log times UTC+10)
Jul 05 12:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 3 commit & 0 reveal rounds).

I started another test directory authority at 01:47:52 UTC
(Log times are UTC)
Jul 05 02:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-05 03:00:00). Upcoming phase is commit (counters: 2 commit & 0 reveal rounds).

Is this just a logging / counting issue, or a serious bug that could affect the consensus?

This might indeed be a logging/counting issue and not a serious bug. The round counters that caused the confusion are just integer counters that get incremented everytime Tor calls sr_state_update(). The counters don't actually understand the SR protocol and that's what causes these inconsistencies.

For example, if a dirauth does not have a consensus in its data directory at startup, it will call sr_state_update() one extra time, because it will go into emergency voting mode, call sr_state_update() and set up its state to vote at InitialVotingInterval; that is at the half hour (e.g. at 02:30), instead of the next full hour (e.g. at 03:00). Then when that dirauth downloads a valid consensus, sr_state_update() will be called again and it will correct its state to vote on the next full hour (e.g. at 03:00) as it should. So the dirauth without a consensus will call that function one extra time, and that's why the counter values can be inconsistent.

To see if that's the case, please attach your $ grep SR info.log. :)


WRT the voting issue that dgoulet mentioned, I think this is just a timezone confusion. Since the log timezones are in local time UTC+10, at 12:00:01 when the message was logged it's actually 02:00:01UTC. So at that point it indeed makes sense to prepare our state for 03:00UTC. So I don't think there is an issue here.

Changed 3 years ago by teor

Attachment: evelyn_rounds.log added

SR info log from test authority Evelyn

Changed 3 years ago by teor

Attachment: betty_rounds.log added

SR info log from test authority Betty

comment:6 Changed 3 years ago by teor

Status: needs_informationnew

comment:7 Changed 3 years ago by teor

It seems like just a counting issue. But let's make sure that count is only ever used for display.

Evelyn was sleeping for a few hours today, and so the counts are:
Jul 07 23:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-07 14:00:00). Upcoming phase is reveal (counters: 10 commit & 3 reveal rounds).

Which is ok - the authority was present for 10/12 commit rounds, and rolled over to the reveal phase at the right time. (Remember, Evelyn's logs are in UTC+10, the time after "upcoming voting period" is in UTC.)

comment:8 in reply to:  7 Changed 3 years ago by asn

Resolution: not a bug
Status: newclosed

Replying to teor:

It seems like just a counting issue. But let's make sure that count is only ever used for display.

Agreed. We only use the counters for logging. Nothing else.

I'm talking about n_commit_rounds, n_reveal_rounds and n_protocol_runs.

I'm gonna close this ticket for now. Feel free to reopen it if an issue is found. Cheers.

comment:9 Changed 3 years ago by teor

Resolution: not a bug
Status: closedreopened

Ok, now that I think about it, I really want to see the number of rounds based on the current time in the log. This would help diagnose any round mismatches (but feel free to close if you think it's not that important).

Jul 07 23:00:01.000 [info] sr_state_update(): SR: State prepared for upcoming voting period (2016-07-07 14:00:00). Upcoming phase is reveal (counters: 10/12 commit & 3/3 reveal rounds).

Where the number before the slash is the number of rounds the authority has seen since starting up, and the number after the slash is the current round number based on the time (assuming that there's a round every consensus period).

Sometimes, the number of rounds the authority has seen will be less, if it's just started up or has been down or (ugh) sleeping.

Sometimes, the number of rounds the authority has seen will be more, if it's voting every half hour because it doesn't have a valid consensus.

comment:10 Changed 3 years ago by asn

I think that's a plausible feature request. I've been wanting the same thing for a while, but never got down to actually implementing it.

To do so, we would need a function with similar slotting logic to get_sr_protocol_phase() which given a timestamp returns the canonical phase and round at that time. Then we could also replace get_sr_protocol_phase() with the new function.

comment:11 Changed 3 years ago by asn

Keywords: 029-proposed added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???
Sponsor: SponsorR-mustSponsorR-can

This ticket is not a bug anymore. It's a non-essential feature request.
Changing ticket to more realistic severities.

comment:12 Changed 3 years ago by nickm

Keywords: 029-proposed removed

comment:13 Changed 3 years ago by nickm

Please re-add this to 0.2.9 if you're going to have time to do it. :)

comment:14 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:15 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:16 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:17 Changed 3 years ago by dgoulet

Keywords: tor-dirauth added
Sponsor: SponsorR-can
Type: defectenhancement

comment:18 Changed 2 years ago by nickm

Cc: dgoulet asn added
Note: See TracTickets for help on using tickets.