Opened 4 years ago

Closed 4 years ago

#19124 closed defect (not a bug)

Shared Random and Half-Hour Consensuses

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: #16943 Points:
Reviewer: Sponsor:

Description

My test authority was asleep for an hour, and so it didn't have a recent consensus. It started voting at 10:30 for round commit 3/reveal 3, but the rest of the network wasn't voting for commit 3/reveal 3 until 11.

How does an authority recover from this situation if it has thrown away its state at 1030?

Do we fix this by implemeting #19045 (keep voting for shared random values)?
Or does that make this worse?

Child Tickets

Change History (4)

comment:1 Changed 4 years ago by teor

So what happened is:
cat debug.log | grep 'Current phase is'

May 19 00:00:01.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 05:00:00). Current phase is reveal (3 commit & 3 reveal rounds).
May 19 01:00:01.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 06:00:00). Current phase is commit (1 commit & 0 reveal rounds).
May 19 02:00:01.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 07:00:00). Current phase is commit (2 commit & 0 reveal rounds).
May 19 03:00:01.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 08:00:00). Current phase is commit (3 commit & 0 reveal rounds).
May 19 04:00:01.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 09:00:00). Current phase is reveal (3 commit & 1 reveal rounds).
May 19 10:17:11.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 15:00:00). Current phase is reveal (3 commit & 2 reveal rounds).
May 19 11:31:10.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 16:00:00). Current phase is reveal (3 commit & 3 reveal rounds).
May 19 12:03:10.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 17:00:00). Current phase is reveal (3 commit & 4 reveal rounds).
May 19 13:00:01.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 18:00:00). Current phase is commit (1 commit & 0 reveal rounds).
May 19 14:00:01.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 19:00:00). Current phase is commit (2 commit & 0 reveal rounds).

The "3 commit & 4 reveal rounds" is obviously wrong, but it recovers in the next round.

Also, the "11:31:10.000 [info] sr_state_update: SR: State prepared for new voting period (2016-05-19 16:00:00)" is correct. But I wonder what would have happened at 11:00 had tje authority been awake at that time. Would it have prepared state with the same valid-after time "2016-05-19 16:00:00"? Is this a bug?

How do we get rounds and valid-after times working correctly when one or more authorities is voting on the half-hour because they don't have a recent consensus?

comment:2 Changed 4 years ago by teor

Oops, ignore me, I was using the 12-round branch.

My test authority was asleep for a while, and has just logged:

May 20 11:00:01.000 [info] sr_state_update(): SR: State prepared for new voting period (2016-05-20 16:00:00). Current phase is reveal (13 commit & 7 reveal rounds).

Something is really wrong here, unless the round numbers just keep counting up?

Last edited 4 years ago by teor (previous) (diff)

comment:3 Changed 4 years ago by teor

Apparently it's possible to get up to 5 & 3 (so I would imagine the values depend on the amount of time the authority is offline):

May 21 10:52:31.000 [debug] should_keep_commit(): SR: Ignoring non-authoritative commit.
May 21 12:59:15.000 [warn] Your system clock just jumped 7469 seconds forward; assuming established circuits no longer work.
May 21 12:59:16.000 [notice] Choosing expected valid-after time as 2016-05-21 17:00:00: consensus_set=1, interval=3600
May 21 12:59:16.000 [info] sr_state_update(): SR: State prepared for new voting period (2016-05-21 17:00:00). Current phase is reveal (5 commit & 2 reveal rounds).
May 21 12:59:16.000 [notice] Choosing expected valid-after time as 2016-05-21 17:00:00: consensus_set=1, interval=3600
May 21 12:59:17.000 [notice] Choosing valid-after time in vote as 2016-05-21 17:00:00: consensus_set=1, last_interval=3600
May 21 13:00:01.000 [notice] Choosing expected valid-after time as 2016-05-21 17:30:00: consensus_set=0, interval=1800
May 21 13:00:01.000 [info] sr_state_update(): SR: State prepared for new voting period (2016-05-21 17:30:00). Current phase is reveal (5 commit & 3 reveal rounds).
May 21 13:00:01.000 [notice] Choosing expected valid-after time as 2016-05-21 17:30:00: consensus_set=0, interval=1800

comment:4 Changed 4 years ago by dgoulet

Resolution: not a bug
Status: newclosed

Yes, those are simple counters. When no consensus is available, we use the InitialInterval so our "update state" function is called twice in the hour and the counter is incremented. Those counters have nothing to do with any checks on about which round we are in. The code doesn't do something like if (n_commit_rounds > 12) { change phase }.

They are purely used for logging purposes so we can keep track of where we are between commit and reveal and time. Tests use those to know where the subsystem is at. Also, this is why you don't see a big gap in numbers in the first log output above.

Authority should recover if it ever has an offset in the phases because it's lined up to the time and voting interval. So if an authority would jump into the reveal phase by mistake, it will start the commit phase with everyone else at the right time (if all dirauth are synced ofc).

Re-open if you see a bug or potential issues that we can fix. I'm closing this one and let discussion happen on IRC/mailing list if more information is needed but for now I don't see a bug?

Note: See TracTickets for help on using tickets.