Opened 4 years ago

Closed 4 years ago

#13823 closed defect (fixed)

chutney intervals are too short for successful bootstrap, particularly under high CPU load on OS X

Reported by: teor Owned by: teor
Priority: Medium Milestone:
Component: Core Tor/Tor Version:
Severity: Keywords: tor-auth chutney
Cc: nickm, rl1987 Actual Points:
Parent ID: #13718 Points:
Reviewer: Sponsor:

Description

Split from #13718:

Occasionally, the CPU load on my test machine will increase (or some other condition affecting the scheduler will occur), and a bootstrap race condition will cause the test to fail 50-100% of the time for a few hours. Then it will start working again. The commands run are exactly the same each time. I'll be excluding these results from the tests, because they happen with or without the changes.

Perhaps lengthening some of the default intervals chutney uses would solve this.

Child Tickets

Change History (16)

comment:1 Changed 4 years ago by teor

Keywords: lorax added

comment:2 Changed 4 years ago by teor

Perhaps shortening the consensus intervals from 5-30 minutes to 20 seconds would help here, too.

comment:3 Changed 4 years ago by teor

This appears to be due to the fact that MIN_VOTE_INTERVAL is set to 300, and all the chutney scripts I have are set to wait for 18, 60, and 300 seconds. So the authorities only have one chance to build a sufficiently comprehensive and consistent consensus at around the 4-6 second mark, and that's it.

If they miss it, the network won't function for the first 5 minutes.

comment:4 Changed 4 years ago by teor

This is now directly affecting #13718, because we need to run two consensus to test it - one with no exits, and one after the exits have determined their own reachability using internal paths build on the first consensus.

I'd like to do this in under 5 minutes, so I've defined MIN_VOTE_INTERVAL_TESTING 10 (which is greater than (MIN_VOTE_SECONDS + MIN_DIST_SECONDS) * 2 as required) and patched tor to use it based on TestingTorNetwork 1, or during direct comparisons to Testing* options.

I'll post a patch as part of the #13718 process.

comment:5 Changed 4 years ago by teor

Trying to run two consensuses in as short as period as possible is fraught with restrictions.

When trying to use the smallest allowable voting interval for testing, I have found the following restrictions for the new macro MIN_VOTE_INTERVAL_TESTING:

  • a minimum of (MIN_VOTE_SECONDS + MIN_DIST_SECONDS) * 2 + 1 = 9 based on V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2 "V3AuthVoteDelay plus V3AuthDistDelay must be less than half V3AuthVotingInterval" in options_validate()
  • a minimum of 16 based on min_sec_before_caching = interval/16 [> 0 = 16] "slop factor in case clocks get desynchronized a little" in update_consensus_networkstatus_fetch_time_impl()
  • a minimum of 18 based on (30*60) % TestingV3AuthInitialVotingInterval != 0 "[must] divide evenly into 30 minutes" in options_validate()
  • we may be able to get away with 9, 10, 12, 15 (which all divide 30 minutes) if we allow min_sec_before_caching's "slop factor" to equal 0, which should be fine if we're running on the same host/clock

So I have set:

#define MIN_VOTE_INTERVAL_TESTING 9

But set the vote interval to 18 in the chutney templates to play it a little less unsafe.
(Other options are 20, 24, 25, 30, 36, 40, 45, 50, 60, ...)

Last edited 4 years ago by teor (previous) (diff)

comment:6 Changed 4 years ago by teor

Component: ChutneyTor
Keywords: tor-auth chutney added; lorax removed

This change successfully has the consensus run every 18 seconds on my machine in a chutney network.

I have not tested an interval of 9 seconds, but it should work as long as the clocks are strictly synchronised. See #13718 for further details and an (eventual) branch.

comment:7 Changed 4 years ago by teor

Cc: rl1987 added

I have consensus intervals down to a minimum of 10 seconds, as the calculation is actually:
V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2
(MIN_VOTE_SECONDS + MIN_DIST_SECONDS + 1) * 2 = 10

We won't be able to get it any lower without changing MIN_VOTE_SECONDS, MIN_DIST_SECONDS, or the V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2 calculation.

The src/test/test-network.sh script allows 18 seconds for chutney to launch and do its tests, which is two consensuses.

Is 10 seconds sufficient for your purposes, rl1987?
(We spoke about this being annoying on irc almost a week ago.)

comment:8 Changed 4 years ago by teor

Also, a relay doesn't re-publish its descriptor until up to 60 seconds elapses.
I've changed it so it uploads immediately when ORPort or DirPort change, but only in a testing tor network.

comment:9 Changed 4 years ago by teor

Fixed in #13718:

A relay with AssumeReachable 0 now makes it into the consensus after around 30-40 seconds, even without using TestingDirAuthVoteExit (from #13161). This means that it correctly:

  • determines that no exits are available in the consensus
  • continues to bootstrap with internal paths only
  • successfully self-tests reachability with an internal path

Composing commits over the next week.

comment:10 Changed 4 years ago by teor

See also #13976, which would vastly simplify the configuration required to get rapid tor/chutney bootstraps to work.

comment:11 Changed 4 years ago by teor

Owner: changed from nickm to teor
Status: newassigned

src/test/test-network.sh can still complete basic tests in 30 seconds, even while the machine is under heavy load. These fixes should resolve the original issue that triggered this report.

comment:12 Changed 4 years ago by teor

Status: assignedneeds_review

The changes to tor and chutney in #13718 have fixed this:

Bugs: #13718, #13814, maybe #13787, #13839, #13924, #13823, #13929, #13963
Branch: bug13718-fast-bootstrap
Note: There are 5 branches that start with bug13718, please choose the right one.
Repository: ​​​​​​​​https://github.com/teor2345/tor.git

Bugs: #13823
Branch: bug13823-fast-bootstrap
Repository: ​​​​​​​​https://github.com/teor2345/chutney.git

comment:13 Changed 4 years ago by nickm

The chutney branch looks reasonable, but one thing I'm not sure about: will merging these changes in chutney plus the changes for Tor 0.2.6 make it so that Tor 0.2.5 and earlier no longer bootstrap? Or will they just bootstrap as slowly as before?

comment:14 Changed 4 years ago by teor

tor changes committed as part of of bug13718-consensus-interval merge.
chutney changes have not yet been merged.

nickm: Tor 0.2.5 and earlier will bootstrap just as slowly as before. (Some of the torrc changes may speed earlier versions up a little.)

comment:15 Changed 4 years ago by teor

dgoulet has tested the chutney changes along with the draft tor changes in #13718.

Last edited 4 years ago by teor (previous) (diff)

comment:16 Changed 4 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Merged the the torspec changes too.

Note: See TracTickets for help on using tickets.