Opened 5 months ago

Last modified 3 months ago

#30227 new defect

Work out why 3% of sbws measurements are excluded because the relay only has 1 measurement

Reported by: teor Owned by:
Priority: Medium Milestone: sbws: 1.2.x-final
Component: Core Tor/sbws Version: sbws: unspecified
Severity: Normal Keywords:
Cc: juga Actual Points:
Parent ID: #29710 Points:
Reviewer: Sponsor:

Child Tickets

TicketStatusOwnerSummaryComponent
#29719newConsider to measure all the network at onceCore Tor/sbws
#30231newAllow a wider range of acceptable download timesCore Tor/sbws
#30232newReduce the number of downloads for each measurementCore Tor/sbws
#30233newAsk for more bytes in our initial requestCore Tor/sbws

Change History (7)

comment:1 Changed 5 months ago by teor

From comment 10 in #29710:

The scanner takes around 48h (i was wrong with my 24h estimation) to measure unique relays in the consensus

Let's do some analysis:

It takes sbws 48 * 60 * 60 = 172800 seconds to measure the network.
sbws runs 3 measurement threads, so the total measurement time is 3 * 172800 = 518400.
So a sbws thread takes 518400 / 7200 = 72 seconds to measure each relay.
Each relay must have 5 downloads, so each download takes 72 / 5 = 14.4 seconds.

But each download should take between 6-10 seconds, so sbws is using 30-60% of its time not doing useful downloads. I opened #30230 to work out why.

so it takes 4 days for each relay to have at least 2 measures (and not be excluded by few) and we're only considering 5.
There would be less relays excluded if we take only 1 measurement as valid or we consider more days of measurements.

It's hard to decide what to change, when we don't know why the scanner is so inefficient.
I have opened some child tickets for some things we could try: #30231, #30232, #30233.

comment:2 Changed 5 months ago by juga

These tickets looks good to improve scanner performance, but maybe we could reduce complexity by implementing what proposed in https://trac.torproject.org/projects/tor/ticket/29291#comment:4 that could be useful for onionperf too.

#29720 could also reduce the number of measurements required.

#29719 may or may not improve performance, but it'd reduce complexity on understanding why some relays get more measurements than others and the monitoring KeyValues associated to the priority loops and the time it takes to measure all the network.

comment:3 in reply to:  2 Changed 5 months ago by juga

Replying to juga:

These tickets looks good to improve scanner performance, but maybe we could reduce complexity by implementing what proposed in https://trac.torproject.org/projects/tor/ticket/29291#comment:4 that could be useful for onionperf too.

i just don't know how much would change the bandwidth values to create a request and wait it stabilizes, or to make different requests with "keep-alive" connection.

comment:4 in reply to:  2 Changed 5 months ago by teor

Replying to juga:

Replying to juga:

These tickets looks good to improve scanner performance, but maybe we could reduce complexity by implementing what proposed in https://trac.torproject.org/projects/tor/ticket/29291#comment:4 that could be useful for onionperf too.

i just don't know how much would change the bandwidth values to create a request and wait it stabilizes, or to make different requests with "keep-alive" connection.

Changing the measurement method is a high-risk change. I don't think we have time to test it properly.

Replying to juga:

#29720 could also reduce the number of measurements required.

Including bandwidths from helper relays is a high-risk change, because helper relays have higher bandwidths than the measured relays they are paired with. I don't think we have time to test it properly.

#29719 may or may not improve performance, but it'd reduce complexity on understanding why some relays get more measurements than others and the monitoring KeyValues associated to the priority loops and the time it takes to measure all the network.

I think #29719 is a good idea. It has some risk, but we can manage that risk by testing for a week or two before merging.

But I think we should do #30231, #30232, and #30233 first. They are simple, low-risk changes. And the tests will be much faster after we improve sbws measurement speed.

comment:5 Changed 4 months ago by teor

Summary: Work out why recent_measurements_excluded_few_count=733Work out why 3% of sbws measurements are excluded because the relay only has 1 measurement

comment:6 Changed 4 months ago by teor

Milestone: sbws: unspecifiedsbws: 1.2.x-final

It would be nice to make these changes in sbws 1.2

comment:7 Changed 3 months ago by teor

We need to re-do these checks after #30905 is fixed, because it makes the statistics inaccurate.

Note: See TracTickets for help on using tickets.