Opened 4 years ago

Last modified 5 months ago

#16559 assigned defect

bwauth code needs to be smarter about failed circuits

Reported by: TvdW Owned by: juga
Priority: Medium Milestone: sbws: unspecified
Component: Core Tor/sbws Version:
Severity: Normal Keywords: tor-bwauth, sbws, scanner
Cc: s7r@…, starlight.2015q2@…, juga, teor Actual Points:
Parent ID: #29954 Points:
Reviewer: Sponsor:

Description

In the current bandwidth authority code, when a fetch attempt fails, it will still be counted as a circuit that went through all of the nodes -- even if those nodes weren't responsible for the failure.

This has the potential of resulting in a relay not being measured sufficiently, or at all: the code will consider failures from unstable nodes to be relevant for nodes that are perfectly stable.

In slices where exits and entries aren't well-distributed (like, all of them) this can result in some nodes not being measured at all, and losing their consensus weight. This seems to affect exits a lot more than it does other relay types: people on tor-relays@ have mentioned that removing their exit policies gets their consensus weight back, and I have been able to reproduce this.

Child Tickets

Change History (24)

comment:1 Changed 4 years ago by s7r

Cc: s7r@… added

comment:2 Changed 4 years ago by s7r

The problem does affect Exits more than middle relays, and a lot of operators reported that changing to middle relay instead of exit helped, but there also have been cases when even changing the ExitPolicy to reject *:* didn't bring the consensus weight back.

How does a bwauth exactly connect to an Exit and tries to measure it? What can happen in between this to make the bwauth think the Exit is misbehaving?

comment:3 Changed 4 years ago by starlight

Cc: starlight.2015q2@… added

comment:4 Changed 22 months ago by teor

Parent ID: #13630
Severity: Blocker

This is a feature that belongs in the new bwauth replacement project, see #13630.

comment:5 Changed 22 months ago by teor

Priority: HighMedium
Severity: BlockerNormal

Priorities and Severities in torflow are meaningless, setting them all to Medium/Normal.

comment:6 Changed 22 months ago by teor

Owner: aagbsn deleted
Status: newassigned

aagbsn was the default owner, unassigning

comment:7 Changed 21 months ago by teor

Status: assignednew

Mark all tickets that are assigned to nobody as "new".

comment:8 Changed 16 months ago by juga

Cc: juga added

comment:9 Changed 16 months ago by juga

Parent ID: #1363025925

comment:10 Changed 16 months ago by juga

Parent ID: 25925#25925

comment:11 Changed 16 months ago by juga

Owner: set to juga
Status: newassigned

I'm not sure if this still need to be fixed on Torflow.
Going to work on it on sbws

comment:12 Changed 15 months ago by juga

Keywords: tor-dirauth sbws added

comment:13 Changed 14 months ago by juga

Cc: teor added

We are reporting errors in sbws and i think it doesn't have the problem described here.
Are there other ideas for this ticket?

comment:14 Changed 14 months ago by juga

Keywords: tor-bwauth added; tor-dirauth removed

comment:15 in reply to:  13 Changed 14 months ago by teor

Replying to juga:

We are reporting errors in sbws and i think it doesn't have the problem described here.
Are there other ideas for this ticket?

The description says:

In slices where exits and entries aren't well-distributed…

sbws doesn't use slices, so its entry and exit selection is more robust. That might be why it doesn't have this problem.

comment:16 Changed 9 months ago by teor

Component: Core Tor/TorflowCore Tor/sbws
Milestone: sbws 1.1

comment:17 Changed 9 months ago by teor

Parent ID: #25925

comment:18 Changed 9 months ago by teor

Milestone: sbws 1.1sbws 1.2

Milestone renamed

comment:19 Changed 9 months ago by teor

Milestone: sbws 1.2sbws: 1.2.x

Milestone renamed

comment:20 Changed 9 months ago by teor

Milestone: sbws: 1.2.xsbws: 1.2.x-final

Milestone renamed

comment:21 Changed 9 months ago by teor

Milestone: sbws: 1.2.x-finalsbws: unspecified

Milestone renamed

comment:22 Changed 6 months ago by juga

In sbws, when a relay is going to be measured, it selects randomly other relay that has double or equal bandwidth than the relay to measure, so it will likely not fail because of the other relay.
The next time it will be measured, it will likely not be measured with the same other relay.

However, the fastest relay will be restricted to the be measured with slower relays and small set of possible relays. There's an scaling process after this, but maybe it's a good idea that gets restricted anyway.

In version 1.0.2, sbws was even prioritizing to measure relays with higher number of failures, but it was observed that then it'll continuosly try to measure unstable relays that will probably fail again.
This has been removed in the last version and it only prioritizes relays to measure based on how long ago they were measured before.

Regarding the exit policies, it only affects to choose whether the relay to measure will be the first or the second hop and it only checks that policy allows to exit to port 443.
A reason why an exit might always fail to be measured is when it retrieves the data from a CDN, the local resolver returns an IPv6 address, and the exit can exit to an IPv6 address. Maybe this is something to be monitered, but it'd not happen when #28463 is implemented.

I think this ticket can be closed, but it'd be great to get opinions on whether sbws design solves this.

comment:23 Changed 5 months ago by teor

I'd like to see the number of failures that sbws reports for large relays in the first and second position in the circuit.

Then I will have an opinion.

comment:24 Changed 5 months ago by juga

Keywords: scanner added
Parent ID: #29954
Note: See TracTickets for help on using tickets.