Opened 12 months ago

Closed 3 days ago

#30719 closed defect (fixed)

Work out why 90% of sbws measurements fail

Reported by: teor Owned by: juga
Priority: High Milestone: sbws: 1.1.x-final
Component: Core Tor/sbws Version: sbws: 1.1.0
Severity: Major Keywords: sbws-roadmap
Cc: juga Actual Points:
Parent ID: #33121 Points: 2
Reviewer: Sponsor:

Description

longclaw's most recent vote shows that 90% of measurement attempts fail: recent_measurement_failure_count is 299K, and recent_measurement_attempt_count is 327K.

We should work out why sbws is doing so many failing measurements.

bandwidth-file-headers timestamp=1559442886 version=1.4.0 destinations_countries=ZZ earliest_bandwidth=2019-05-28T02:35:11 file_created=2019-06-02T02:35:03 generator_started=2019-05-19T14:04:34 latest_bandwidth=2019-06-02T02:34:46 minimum_number_eligible_relays=3934 minimum_percent_eligible_relays=60 number_consensus_relays=6556 number_eligible_relays=6287 percent_eligible_relays=96 recent_consensus_count=120 recent_measurement_attempt_count=327183 recent_measurement_failure_count=299072 recent_measurements_excluded_error_count=876 recent_measurements_excluded_few_count=678 recent_measurements_excluded_near_count=237 recent_measurements_excluded_old_count=0 recent_priority_list_count=991 recent_priority_relay_count=327183 scanner_country=US software=sbws software_version=1.1.0 time_to_report_half_network=225229
bandwidth-file-digest sha256=UkxK9KS5KZ5hKDiLI3bqGoMvpMW9gBjKGoYbD2bdZVE

Child Tickets

TicketStatusOwnerSummaryComponent
#30905closedjugaMaybe monitoring values in the state file should be reset when sbws is restartedCore Tor/sbws
#33570closedjugaCorrect the relays to keep after retrieving new consensusesCore Tor/sbws

Change History (10)

comment:1 Changed 12 months ago by teor

Milestone: sbws: unspecifiedsbws: 1.1.x-final
Priority: MediumVery High
Severity: NormalCritical
Version: sbws: unspecifiedsbws: 1.1.0

comment:2 Changed 12 months ago by teor

We're not seeing very many network errors (4%), so this bug mainly wastes CPU time.

comment:3 Changed 12 months ago by teor

Priority: Very HighHigh
Severity: CriticalMajor

Not a critical bug any more.

comment:4 Changed 12 months ago by teor

We need to re-do these checks after #30905 is fixed, because it makes the statistics inaccurate.

comment:5 Changed 11 months ago by gaba

Keywords: sbws-roadmap-october added
Points: 2

comment:6 Changed 4 months ago by gaba

Keywords: sbws-roadmap added

Changing keyword of roadmapped open sbws tickets to a general sbws-roadmap one.

comment:7 Changed 4 months ago by gaba

Keywords: sbws-roadmap-october removed

comment:8 Changed 4 months ago by gaba

Parent ID: #29710#33121

The goal is to deploy sbws in all bw authorities. We need to fix critical bugs to do this.

comment:9 Changed 6 weeks ago by juga

Owner: set to juga
Status: newassigned

comment:10 Changed 3 days ago by juga

Resolution: fixed
Status: assignedclosed

Since longclaw changed to sbws 1.1.0+84.g3033421, and as commented in https://trac.torproject.org/projects/tor/ticket/30905#comment:8, the number of failures has been around 1000.
So the high number of failures were not due to measurement failures, but bad counting of the actual errors.
I think we can close this ticket.

Note: See TracTickets for help on using tickets.