Opened 2 years ago

Closed 6 months ago

#27146 closed defect (fixed)

Mismatched digest in 0.3.3.9 and master mixed chutney network

Reported by: teor Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: regression, tor-dirauth, macOS, 035-roadmap-proposed, 035-can, teor-unreached-2019-03-08
Cc: teor Actual Points:
Parent ID: #4631 Points:
Reviewer: Sponsor:

Description (last modified by teor)

When I run master i386 and Tor 0.3.3.9 x68_64 in a mixed chutney network, I see the following error:

Detail: chutney/tools/warnings.sh /Users/USER/tor/chutney/tools/../net/nodes.1534263100
Warning: Unable to add signatures to consensus: Mismatched digest. Number: 102
Exit 255

master x86_64 and Tor 0.3.3.9 x68_64 in a mixed chutney network also get the same error:

Detail: chutney/tools/warnings.sh /Users/USER/tor/chutney/tools/../net/nodes.1534285790
Warning: Unable to add signatures to consensus: Mismatched digest. Number: 102
Exit 255

We should fix this issue if it affects Linux and BSD. If it doesn't, maybe we should downgrade our support for authorities on macOS.

If we want to diagnose this issue, implementing #20625 and #4539 would help.

Child Tickets

TicketStatusOwnerSummaryComponent
#27298closedteorRemove obsoleted code for setting TestingV3AuthVotingStartOffsetCore Tor/Chutney
#27300closedteorIncrease chutney timings to allow for Tor timing changes in 0.3.4Core Tor/Chutney
#27303closedSilence duplicate vote warning in chutneyCore Tor/Chutney
#27382closedBad valid-after time in 0.3.3 and 0.3.4Core Tor/Tor
#28135closedBad CERTS cells in mixed chutney networkCore Tor/Tor

Attachments (2)

nodes.1534263100.zip (404.3 KB) - added by teor 2 years ago.
macOS i386 chutney nodes directory
nodes.1534285790.zip (389.0 KB) - added by teor 2 years ago.
macOS x86_64 chutney nodes directory

Download all attachments as: .zip

Change History (26)

Changed 2 years ago by teor

Attachment: nodes.1534263100.zip added

macOS i386 chutney nodes directory

Changed 2 years ago by teor

Attachment: nodes.1534285790.zip added

macOS x86_64 chutney nodes directory

comment:1 Changed 2 years ago by teor

Keywords: 035-roadmap-proposed 034-must added

We must check if this issue affects Linux and BSD on 0.3.4 before releasing 0.3.4.

comment:2 Changed 2 years ago by nickm

What versions are the live authorities on the network currently running? I thought that they _did_ span the versions in question.

comment:3 Changed 2 years ago by nickm

IMO this is a blocker for 0.3.4 if it affects real authorities, and a "should fix" otherwise.

comment:4 in reply to:  2 Changed 2 years ago by dgoulet

Replying to nickm:

What versions are the live authorities on the network currently running? I thought that they _did_ span the versions in question.

http://tgnv2pssfumdedyw.onion/#authorityversions

comment:5 Changed 2 years ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.3.x-final

comment:6 Changed 2 years ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.5.x-final

So most authorities are on 0.3.3.x, but moria is on 0.3.4.6-rc.

If this is a problem, it's either showing up in Moria's logs but Roger isn't noticing... or it's master only.

comment:7 Changed 2 years ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

Teor says this affects 0.3.4 as well.

comment:8 Changed 2 years ago by teor

nickm discovered that this issue is happening because consensus timings are different in 0.3.3 and 0.3.4:

  • In 0.3.3, there is at least one second_elapsed_callback() between authorities asking for missing votes, and creating the consensus
  • In 0.3.4, second_elapsed_callback() is removed, and there can be less than a millisecond between authorities asking for votes, and creating the consensus

We should ensure that there are always 2 seconds between asking for missing votes, and creating the consensus. This timing is required by 0.3.3 and earlier, because:

  • the remote authority responds in the first second, then
  • the local authority retrieves the vote and creates the consensus in the second second.

comment:9 Changed 2 years ago by teor

Description: modified (diff)

comment:10 Changed 2 years ago by teor

0.3.3 authorities (002a and 003a) will post votes that are valid for only a few milliseconds.

000a/notice.log:Aug 24 11:20:23.751 [notice] Choosing valid-after time in vote as 2018-08-24 11:20:25: consensus_set=0, last_interval=5
001a/notice.log:Aug 24 11:20:23.957 [notice] Choosing valid-after time in vote as 2018-08-24 11:20:25: consensus_set=0, last_interval=5
002a/notice.log:Aug 24 11:20:24.587 [notice] Choosing valid-after time in vote as 2018-08-24 11:20:25: consensus_set=0, last_interval=5
003a/notice.log:Aug 24 11:20:24.806 [notice] Choosing valid-after time in vote as 2018-08-24 11:20:25: consensus_set=0, last_interval=5

One way to fix this bug would be for authorities to refuse to post or accept votes that are about to expire.

comment:11 in reply to:  10 Changed 2 years ago by teor

Replying to teor:

0.3.3 authorities (002a and 003a) will post votes that are valid for only a few milliseconds.
...

That's not a bug, because the valid-after time is the time of the *next* consensus, not the vote.

Edit: it might be a bug, because there's no time to create the next consensus.

Last edited 2 years ago by teor (previous) (diff)

comment:12 Changed 2 years ago by teor

Keywords: 034-must removed
Milestone: Tor: 0.3.4.x-finalTor: 0.3.5.x-final

I made some changes to chutney, and I think we're ok to fix these tor issues in 0.3.5 (or later).

comment:13 Changed 2 years ago by teor

For the record, my attempted fix was bug27146-034, but it created more problems than it solved.

comment:14 Changed 23 months ago by nickm

Sponsor: Sponsor8-can

Noting some tickets in 0.3.5 milestone as 8-can. These include tickets that are bugfixes on bugs caused by earlier sponsor8 work.

comment:15 Changed 23 months ago by nickm

Priority: MediumVery High

Mark all 035-must tickets as "very high"

comment:16 Changed 22 months ago by nickm

Cc: teor added

Teor, what did we decide for this one? I think we thought that it was something to fix inside chutney for now, but I'm not sure I remember properly.

comment:17 Changed 22 months ago by nickm

Owner: set to teor
Status: newassigned

comment:18 in reply to:  16 Changed 22 months ago by teor

Replying to nickm:

Teor, what did we decide for this one? I think we thought that it was something to fix inside chutney for now, but I'm not sure I remember properly.

This issue is part of a cluster of race conditions in the current dirvote_act() code.

We reduced the frequency of the issue by increasing the consensus interval in chutney (#27300). On the public network, the timing required to trigger the race doesn't happen very often. And dirauth operators avoid these races by starting their machines outside the voting interval hh:50-hh:00.

I'd like to revise my bug27146-034 branch to solve this issue. But we need better tests to work out if we've solved it or not.

I opened #28036 so we can test dirvote_act() inside a dirauth instance. If we write failed votes, signatures, and consensuses to disk (#20625 and #4539), then it would be a lot easier to diagnose these issues.

comment:19 Changed 21 months ago by nickm

Keywords: 035-can added; 035-must removed

comment:20 Changed 19 months ago by gaba

Sponsor: Sponsor8-can

comment:21 Changed 17 months ago by teor

Keywords: teor-unreached-2019-03-08 added
Owner: teor deleted

I'd like to do these tickets, but not in the next few months.

comment:22 Changed 17 months ago by teor

Milestone: Tor: 0.3.5.x-finalTor: unspecified
Priority: Very HighHigh

I still see this bug occasionally. But it's only in chutney. Maybe we'll fix it some day.

comment:23 Changed 6 months ago by teor

Description: modified (diff)

comment:24 Changed 6 months ago by teor

Parent ID: #4631
Resolution: fixed
Status: assignedclosed

I think this bug is fixed by #4631.

Note: See TracTickets for help on using tickets.