Opened 7 months ago

Last modified 2 weeks ago

#27146 assigned defect

Mismatched digest in 0.3.3.9 and master mixed chutney network

Reported by: teor Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: regression, tor-dirauth, macOS, 035-roadmap-proposed, 035-can, teor-unreached-2019-03-08
Cc: teor Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by teor)

When I run master i386 and Tor 0.3.3.9 x68_64 in a mixed chutney network, I see the following error:

Detail: chutney/tools/warnings.sh /Users/USER/tor/chutney/tools/../net/nodes.1534263100
Warning: Unable to add signatures to consensus: Mismatched digest. Number: 102
Exit 255

master x86_64 and Tor 0.3.3.9 x68_64 in a mixed chutney network also get the same error:

Detail: chutney/tools/warnings.sh /Users/USER/tor/chutney/tools/../net/nodes.1534285790
Warning: Unable to add signatures to consensus: Mismatched digest. Number: 102
Exit 255

We should fix this issue if it affects Linux and BSD. If it doesn't, maybe we should downgrade our support for authorities on macOS.

If we want to diagnose this issue, implementing #20625 and #4539 #4593 would help.

Child Tickets

TicketStatusOwnerSummaryComponent
#27298closedteorRemove obsoleted code for setting TestingV3AuthVotingStartOffsetCore Tor/Chutney
#27300closedteorIncrease chutney timings to allow for Tor timing changes in 0.3.4Core Tor/Chutney
#27303closedSilence duplicate vote warning in chutneyCore Tor/Chutney
#27382assignedBad valid-after time in 0.3.3 and 0.3.4Core Tor/Tor
#28036newLaunch tests inside a single dirauth instanceCore Tor/Tor
#28135closedBad CERTS cells in mixed chutney networkCore Tor/Tor

Attachments (2)

nodes.1534263100.zip (404.3 KB) - added by teor 7 months ago.
macOS i386 chutney nodes directory
nodes.1534285790.zip (389.0 KB) - added by teor 7 months ago.
macOS x86_64 chutney nodes directory

Download all attachments as: .zip

Change History (24)

Changed 7 months ago by teor

Attachment: nodes.1534263100.zip added

macOS i386 chutney nodes directory

Changed 7 months ago by teor

Attachment: nodes.1534285790.zip added

macOS x86_64 chutney nodes directory

comment:1 Changed 7 months ago by teor

Keywords: 035-roadmap-proposed 034-must added

We must check if this issue affects Linux and BSD on 0.3.4 before releasing 0.3.4.

comment:2 Changed 7 months ago by nickm

What versions are the live authorities on the network currently running? I thought that they _did_ span the versions in question.

comment:3 Changed 7 months ago by nickm

IMO this is a blocker for 0.3.4 if it affects real authorities, and a "should fix" otherwise.

comment:4 in reply to:  2 Changed 7 months ago by dgoulet

Replying to nickm:

What versions are the live authorities on the network currently running? I thought that they _did_ span the versions in question.

http://tgnv2pssfumdedyw.onion/#authorityversions

comment:5 Changed 7 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.3.x-final

comment:6 Changed 7 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.5.x-final

So most authorities are on 0.3.3.x, but moria is on 0.3.4.6-rc.

If this is a problem, it's either showing up in Moria's logs but Roger isn't noticing... or it's master only.

comment:7 Changed 7 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.4.x-final

Teor says this affects 0.3.4 as well.

comment:8 Changed 7 months ago by teor

nickm discovered that this issue is happening because consensus timings are different in 0.3.3 and 0.3.4:

  • In 0.3.3, there is at least one second_elapsed_callback() between authorities asking for missing votes, and creating the consensus
  • In 0.3.4, second_elapsed_callback() is removed, and there can be less than a millisecond between authorities asking for votes, and creating the consensus

We should ensure that there are always 2 seconds between asking for missing votes, and creating the consensus. This timing is required by 0.3.3 and earlier, because:

  • the remote authority responds in the first second, then
  • the local authority retrieves the vote and creates the consensus in the second second.

comment:9 Changed 7 months ago by teor

Description: modified (diff)

comment:10 Changed 7 months ago by teor

0.3.3 authorities (002a and 003a) will post votes that are valid for only a few milliseconds.

000a/notice.log:Aug 24 11:20:23.751 [notice] Choosing valid-after time in vote as 2018-08-24 11:20:25: consensus_set=0, last_interval=5
001a/notice.log:Aug 24 11:20:23.957 [notice] Choosing valid-after time in vote as 2018-08-24 11:20:25: consensus_set=0, last_interval=5
002a/notice.log:Aug 24 11:20:24.587 [notice] Choosing valid-after time in vote as 2018-08-24 11:20:25: consensus_set=0, last_interval=5
003a/notice.log:Aug 24 11:20:24.806 [notice] Choosing valid-after time in vote as 2018-08-24 11:20:25: consensus_set=0, last_interval=5

One way to fix this bug would be for authorities to refuse to post or accept votes that are about to expire.

comment:11 in reply to:  10 Changed 7 months ago by teor

Replying to teor:

0.3.3 authorities (002a and 003a) will post votes that are valid for only a few milliseconds.
...

That's not a bug, because the valid-after time is the time of the *next* consensus, not the vote.

Edit: it might be a bug, because there's no time to create the next consensus.

Last edited 7 months ago by teor (previous) (diff)

comment:12 Changed 7 months ago by teor

Keywords: 034-must removed
Milestone: Tor: 0.3.4.x-finalTor: 0.3.5.x-final

I made some changes to chutney, and I think we're ok to fix these tor issues in 0.3.5 (or later).

comment:13 Changed 7 months ago by teor

For the record, my attempted fix was bug27146-034, but it created more problems than it solved.

comment:14 Changed 6 months ago by nickm

Sponsor: Sponsor8-can

Noting some tickets in 0.3.5 milestone as 8-can. These include tickets that are bugfixes on bugs caused by earlier sponsor8 work.

comment:15 Changed 6 months ago by nickm

Priority: MediumVery High

Mark all 035-must tickets as "very high"

comment:16 Changed 5 months ago by nickm

Cc: teor added

Teor, what did we decide for this one? I think we thought that it was something to fix inside chutney for now, but I'm not sure I remember properly.

comment:17 Changed 5 months ago by nickm

Owner: set to teor
Status: newassigned

comment:18 in reply to:  16 Changed 5 months ago by teor

Replying to nickm:

Teor, what did we decide for this one? I think we thought that it was something to fix inside chutney for now, but I'm not sure I remember properly.

This issue is part of a cluster of race conditions in the current dirvote_act() code.

We reduced the frequency of the issue by increasing the consensus interval in chutney (#27300). On the public network, the timing required to trigger the race doesn't happen very often. And dirauth operators avoid these races by starting their machines outside the voting interval hh:50-hh:00.

I'd like to revise my bug27146-034 branch to solve this issue. But we need better tests to work out if we've solved it or not.

I opened #28036 so we can test dirvote_act() inside a dirauth instance. If we write failed votes, signatures, and consensuses to disk (#20625 and #4539), then it would be a lot easier to diagnose these issues.

comment:19 Changed 4 months ago by nickm

Keywords: 035-can added; 035-must removed

comment:20 Changed 2 months ago by gaba

Sponsor: Sponsor8-can

comment:21 Changed 2 weeks ago by teor

Keywords: teor-unreached-2019-03-08 added
Owner: teor deleted

I'd like to do these tickets, but not in the next few months.

comment:22 Changed 2 weeks ago by teor

Milestone: Tor: 0.3.5.x-finalTor: unspecified
Priority: Very HighHigh

I still see this bug occasionally. But it's only in chutney. Maybe we'll fix it some day.

Note: See TracTickets for help on using tickets.