Opened 2 years ago

Last modified 18 months ago

#24248 new defect

bwauth goes crazy in test network with no measured nodes

Reported by: Sebastian Owned by: tom
Priority: Medium Milestone:
Component: Core Tor/Torflow Version:
Severity: Normal Keywords:
Cc: teor Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I have a small test network with 3 authorities, 5 Exits, 6 Fast relays, 3 Guards, 13 Running relays. Starting a bwauth leads to this and the last two log messages are repeated infintely, quickly filling my disk.

The network has nodes all living inside 10.0.0.0/8 address space but it's a real Tor network deployed over several machines.

The following options are set:
ExtendAllowPrivateAddresses 1
EnforceDistinctSubnets 0
ClientRejectInternalAddresses 0
ClientDNSRejectInternalAddresses 0

NOTICE[Sat Nov 11 16:13:10 2017]:TorFlow Version: master 75abb44bf73b3a061063145ebdb97029afe91fb7
NOTICE[Sat Nov 11 16:13:10 2017]:TorCtl Version: detached c8fcb25b079d52a20cafc7f7adf178e90ab76338
NOTICE[Sat Nov 11 16:13:10 2017]:Child Process Spawned...
NOTICE[Sat Nov 11 16:14:13 2017]:Starting slice for percentiles 0-100
NOTICE[Sat Nov 11 16:14:15 2017]:Ran out of choices in ExactUniformGenerator. Incrementing nodes
NOTICE[Sat Nov 11 16:14:15 2017]:Ran out of routers during buildpath..
NOTICE[Sat Nov 11 16:14:15 2017]:Ran out of choices in ExactUniformGenerator. Incrementing nodes
NOTICE[Sat Nov 11 16:14:15 2017]:Ran out of routers during buildpath..

Child Tickets

Change History (9)

comment:1 Changed 2 years ago by teor

There are 8 scanners that partition the networ' between them by bandwidth percentile (and one scanner for unmeasured nodes).
This makes it likely that one of those 8 partitions has 0 or 1 relays in it.
I have a local patch that is probably hiding this issue, because it cuts down the number of scanners to 2 (one scanner for all nodes, and one scanner for unmeasured nodes).

There are a few ways we can fix this issue:

  • configure 1 or 2 scanners in small networks - this isn't great, because it hides bugs in the default config
  • make scanners sleep when they can't find enoug nodes, like the unmeasured scanner - this may lead to small networks not getting scanned, because none of their scanners have enough nodes
  • when percentile restrictions result in not having enough nodes, remove the lower restriction (include higher-bandwidth nodes), then the lower restriction - this seems the best strategy for small networks

Making relays report bandwidths earlier (#16386) might also help get the percentiles right on short-lived networks, but we would need to be careful, because this information can be used to identify client guards (#23856).

comment:2 Changed 2 years ago by Sebastian

This is using your patch to reduce to two scanners

comment:3 Changed 2 years ago by teor

What are the bandwidths in your network? (relay bandwidths in votes)
How long has it been up?
Is scanner.1 (the standard scanner) logging these issues, or is it scanner.2 (the unmeasured nodes scanner)?

comment:4 Changed 2 years ago by Sebastian

It's been up for a couple of days, following BWs:

10000 x 3
0 x 6
75
13
7247
7243

I have no idea why some relays insist they have 0 bw because I have a script to push data through them.

comment:5 in reply to:  4 Changed 2 years ago by teor

Replying to Sebastian:

It's been up for a couple of days, following BWs:

10000 x 3
0 x 6
75
13
7247
7243

I have no idea why some relays insist they have 0 bw because I have a script to push data through them.

We should probably log a separate core tor bug for this. Please include all 3 bandwidth figures from the relay descriptors.

comment:6 Changed 2 years ago by Sebastian

Oh, and it was scanner.1 that was logging this. scanner.2 logged no errors.

comment:7 Changed 2 years ago by teor

How are the bandwidths split between exits/non-exits?
(That is, do all the exits or all the non-exits have zero bandwidths?)

I have a debugging patch for pytorctl that helped me diagnose a few issues like this.
Try my extra-logging branch at http://github.com/teor2345/pytorctl.git.

You might also like to try my pending-fixes branch, which might or might not solve this issue. But if you merge extra-logging and pending-fixes, you should get better logs.

comment:8 Changed 2 years ago by Sebastian

All relays are configured for Exiting, but those with bw=0 don't get the exit flag. I'll try the logging stuff later, thank you

comment:9 Changed 18 months ago by teor

Cc: teor added; teor@… removed

Shorten useful CCs

Note: See TracTickets for help on using tickets.