Opened 11 months ago

Closed 10 months ago

Last modified 8 months ago

#27338 closed task (implemented)

How long should sbws keep measured and observed bandwidths?

Reported by: teor Owned by:
Priority: Medium Milestone: sbws: 1.0.x-final
Component: Core Tor/sbws Version:
Severity: Normal Keywords: sbws-1.0-must-closed-moved-20181128
Cc: pastly, juga@…, teor, juga Actual Points:
Parent ID: #27108 Points:
Reviewer: Sponsor:

Description

In #27135, sbws starts keeping observed bandwidths for relays:

Taking the descriptor observed bandwidth only when the relay is measured and calculating the mean when there're several observed bandwidth values for the same relay

Here are some options:

  • use the latest measured and observed bandwidth
  • take the latest measured and observed bandwidth every hour, and
    • average the last N days of bandwidths
    • apply an exponentially decaying average to all bandwidths

We need to decide which strategy to use, update the bandwidth file spec, and implement this feature in sbws.

Child Tickets

Attachments (1)

20180905_092840_torflow_sbws.png (38.4 KB) - added by juga 11 months ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 11 months ago by teor

Torflow uses the latest observed bandwidth, and uses a decaying average for measurements. (I couldn't work out the exact decay factor, because it's a complex feedback loop.)

https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/aggregate.py#n117

Oh, and I think we should always use the latest {Relay,}Bandwidth{Rate,Burst}.

comment:2 Changed 11 months ago by juga

We need to decide which strategy to use, update the bandwidth file spec, and implement this feature in sbws.

i'm not sure how we're going to decide this. Try with each of the methods for 1 week and graph results and/or calculate % differences with Torflow?

In case it's useful, the descriptors' observed bandwidth collected at the time of doing measurements in the results, for 2 days:

  • number of relays' descriptor observed bandwidth: 6462
  • mean of all relays' descriptor observed bandwidth taking the last for each relay: 5621550
  • mean of all relays' descriptor observed bandwidth taking the mean from the relay's results: 5609508
  • median of all relays' descriptor observed bandwidth taking the last for each relay: 2065215
  • mean of all relays' descriptor observed bandwidth taking the mean from the relay's results: 2060907
  • number of relays for which it was collected 1 descriptor observed bandwidth: 5087 (79%)
  • number of relays for which it was collected 1 descriptor observed bandwidth: 1368 (21%)
  • number of relays for which it was collected 1 descriptor observed bandwidth: 7 (0.11%)

I've also being collecting descriptors' observed bandwidth every hour (in a separated script). Would be useful to compare only the descriptors' observed bandwidth collected in these 3 different ways?.

I'm having a lot of new code because of all the changes, tests and graphs, i could:

  1. continue with the experiments and make PR only when we have decided this
  2. keep the experiments code so that we can reproduce them in a future and start creating PRs with it.

Is it 2 ok?.

For instance, If we collect descriptors' observed bandwidth, that's new code. I think it's fine i keep the code to store descriptors' observed bandwidth only at the time of doing measurements?. I can configure it in a way that the method to be used can be passed as parameter.

Oh, and I think we should always use the latest {Relay,}Bandwidth{Rate,Burst}.

Do you mean descriptors' bandwidth burst [0]?. We have not used it yet for anything. How should we use them?.
We have only used descriptors' bandwidth average [1] to cap the measurements.

[0] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n427
[1] https://github.com/pastly/simple-bw-scanner/blob/master/sbws/lib/v3bwfile.py#L314

comment:3 in reply to:  2 ; Changed 11 months ago by juga

Replying to juga:

Oh, and I think we should always use the latest {Relay,}Bandwidth{Rate,Burst}.

Do you mean descriptors' bandwidth burst [0]?. We have not used it yet for anything. How should we use them?.
We have only used descriptors' bandwidth average [1] to cap the measurements.

Bandwidth in consensus[2] is min(observed bandwidth, bandwidth rate limit, 10MB/s)
I guess bandwidth rate limit here is bandwidth burst, right?
Should the torflow or sbws scaled bandwidth be limited to the the bandwidth burst?. AFAIK torflow is not doing that.

[0] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n427
[1] https://github.com/pastly/simple-bw-scanner/blob/master/sbws/lib/v3bwfile.py#L314

[2] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2595

comment:4 in reply to:  3 ; Changed 11 months ago by teor

Replying to juga:

We need to decide which strategy to use, update the bandwidth file spec, and implement this feature in sbws.

i'm not sure how we're going to decide this. Try with each of the methods for 1 week and graph results and/or calculate % differences with Torflow?

No, we have a method that is good enough, because it is close enough to torflow.

So we need to use what we know about the tor network to make sure we have a good design. Let's make some some rules for the minimum viable product. Then we can merge any design that fits those rules.

Here's what I suggest:

The minimum viable product must:

  • Use the latest descriptor bandwidth limit, because:
    • the latest descriptor contains the limit that the operator has asked for
  • Use at least 2 sbws measured bandwidths over at least 2 days, because:
    • tor relay usage varies on a daily cycle
    • each sbws measurement depends on the time of day
  • Use at least 2 descriptor observed bandwidths over at least 2 days, because:
    • a single download by a single client can increase the observed bandwidth
    • for security, we want results that don't depend on a single client's behaviour
  • Don't keep bandwidths for more than 1 week
    • old bandwidths do not help us work out current relay capacity

What do you think?
Can you implement something based on these suggestions?

If you want, I can write or review patches, or write a detailed spec.

I put some other suggestions in #27346. They are complicated. We don't need them for the MVP release.

In case it's useful, the descriptors' observed bandwidth collected at the time of doing measurements in the results, for 2 days:

Since most relays only observe bandwidth once per day, a 2 day collection is not long enough to be useful.

  • number of relays' descriptor observed bandwidth: 6462
  • mean of all relays' descriptor observed bandwidth taking the last for each relay: 5621550
  • mean of all relays' descriptor observed bandwidth taking the mean from the relay's results: 5609508
  • median of all relays' descriptor observed bandwidth taking the last for each relay: 2065215
  • mean of all relays' descriptor observed bandwidth taking the mean from the relay's results: 2060907
  • number of relays for which it was collected 1 descriptor observed bandwidth: 5087 (79%)
  • number of relays for which it was collected 1 descriptor observed bandwidth: 1368 (21%)
  • number of relays for which it was collected 1 descriptor observed bandwidth: 7 (0.11%)

Do you mean 1, 2, 3 on the last 3 lines?

I've also being collecting descriptors' observed bandwidth every hour (in a separated script). Would be useful to compare only the descriptors' observed bandwidth collected in these 3 different ways?.

It might be useful, but it is not essential. Let's focus on getting a minimal viable product. Then we can make small improvements later.

I'm having a lot of new code because of all the changes, tests and graphs, i could:

  1. continue with the experiments and make PR only when we have decided this
  2. keep the experiments code so that we can reproduce them in a future and start creating PRs with it.

Is it 2 ok?.

Please create a PR that fits the minimum viable product rules above. Prefer code that is simple, fast to write, and easy to read.

For instance, If we collect descriptors' observed bandwidth, that's new code. I think it's fine i keep the code to store descriptors' observed bandwidth only at the time of doing measurements?. I can configure it in a way that the method to be used can be passed as parameter.

Please implement one simple method for MVP 1.0.
We don't need alternative methods.

If you want, you can make the number of measured and observed bandwidths configurable. I suggest 2 measurements over 2 days is a good default.

Replying to juga:

Replying to juga:

Oh, and I think we should always use the latest {Relay,}Bandwidth{Rate,Burst}.

Do you mean descriptors' bandwidth burst [0]?. We have not used it yet for anything. How should we use them?.

The bandwidth burst can be ignored.

We have only used descriptors' bandwidth average [1] to cap the measurements.

That is ok.

Bandwidth in consensus[2] is min(observed bandwidth, bandwidth rate limit, 10MB/s)
I guess bandwidth rate limit here is bandwidth burst, right?

How does the consensus help us, when we are looking at relay descriptors?

If the "Measured=" bandwidth is available in the consensus, clients use it:
https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2601

The bandwidth in the "w" line in the consensus is only used if the network has less than 3 bandwidth authorities voting.

Should the torflow or sbws scaled bandwidth be limited to the the bandwidth burst?. AFAIK torflow is not doing that.

The descriptor has:

"bandwidth" bandwidth-avg bandwidth-burst bandwidth-observed NL

Torflow does:

bw_observed = min(bandwidth-avg, bandwidth-burst, bandwidth-observed)

https://gitweb.torproject.org/pytorctl.git/tree/TorCtl.py#n459

But that's redundant, because tor relays do:

"bandwidth" min(RelayBandwidthRate, RelayBandwidthBust, BandwidthRate, BandwidthBurst, MaxAdvertisedBandwidth) min(RelayBandwidthBust, BandwidthBurst) bandwidth-observed NL

See get_effective_bwrate() and get_effective_bwburst().

So you can use min(bandwidth-avg, bandwidth-burst) or just bandwidth-avg. The results will be the same.

[0] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n427
[1] https://github.com/pastly/simple-bw-scanner/blob/master/sbws/lib/v3bwfile.py#L314

[2] https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2595

comment:5 in reply to:  4 ; Changed 11 months ago by juga

Replying to teor:

The minimum viable product must:

  • Use the latest descriptor bandwidth limit, because:
    • the latest descriptor contains the limit that the operator has asked for
  • Use at least 2 sbws measured bandwidths over at least 2 days, because:
    • tor relay usage varies on a daily cycle
    • each sbws measurement depends on the time of day
  • Use at least 2 descriptor observed bandwidths over at least 2 days, because:
    • a single download by a single client can increase the observed bandwidth
    • for security, we want results that don't depend on a single client's behaviour

Currently, it's possible that after 2 (or more) days we didn't collected less than 2 measurements and descriptor observed bandwidths for some relays.
It's possible that prioritization might need some changes, which i think might be related to https://github.com/pastly/simple-bw-scanner/issues/136.
If prioritization can't change that, it might be the case that is not possible to obtain 2 measurements in the last 2 days.

  • Don't keep bandwidths for more than 1 week
    • old bandwidths do not help us work out current relay capacity

For bandwidth files, that's the default. Raw measurements are keep 90 days by default

What do you think?
Can you implement something based on these suggestions?

Yes, except for the comments above

If you want, I can write or review patches, or write a detailed spec.

I've already the code except for the comments above (need to clean a bit commits). Reviews and spec would help.

Do you mean 1, 2, 3 on the last 3 lines?

Yes, sorry, distracted copy & paste...

The descriptor has:

"bandwidth" bandwidth-avg bandwidth-burst bandwidth-observed NL

Torflow does:

bw_observed = min(bandwidth-avg, bandwidth-burst, bandwidth-observed)

https://gitweb.torproject.org/pytorctl.git/tree/TorCtl.py#n459

But that's redundant, because tor relays do:

"bandwidth" min(RelayBandwidthRate, RelayBandwidthBust, BandwidthRate, BandwidthBurst, MaxAdvertisedBandwidth) min(RelayBandwidthBust, BandwidthBurst) bandwidth-observed NL

See get_effective_bwrate() and get_effective_bwburst().

i've been collecting and documenting all these possible values and the different names they could have so i don't get confused.
I've just not put that notes online somewhere yet but intend to do so.

comment:6 in reply to:  5 Changed 11 months ago by teor

Replying to juga:

Replying to teor:

The minimum viable product must:

  • Use the latest descriptor bandwidth limit, because:
    • the latest descriptor contains the limit that the operator has asked for
  • Use at least 2 sbws measured bandwidths over at least 2 days, because:
    • tor relay usage varies on a daily cycle
    • each sbws measurement depends on the time of day
  • Use at least 2 descriptor observed bandwidths over at least 2 days, because:
    • a single download by a single client can increase the observed bandwidth
    • for security, we want results that don't depend on a single client's behaviour

Currently, it's possible that after 2 (or more) days we didn't collected less than 2 measurements and descriptor observed bandwidths for some relays.
It's possible that prioritization might need some changes, which i think might be related to https://github.com/pastly/simple-bw-scanner/issues/136.
If prioritization can't change that, it might be the case that is not possible to obtain 2 measurements in the last 2 days.

Ok, I think those rules are confusing.

Let's try to split them up:

If any of these things are true, do not put the relay in the bandwidth file:

  • there are less than 2 sbws measured bandwidths
  • all the sbws measured bandwidths are within 24 hours of each other
  • there are less than 2 descriptor observed bandwidths
  • all the descriptor observed bandwidths are within 24 hours of each other

We will need to make these settings configurable, so we can get test network results in less than 1 day.

  • Don't keep bandwidths for more than 1 week
    • old bandwidths do not help us work out current relay capacity

For bandwidth files, that's the default. Raw measurements are keep 90 days by default

Sorry, I meant:

  • Don't use sbws measured bandwidths that are older than 1 week
  • Don't use descriptor observed bandwidths that are older than 1 week

If you want, I can write or review patches, or write a detailed spec.

I've already the code except for the comments above (need to clean a bit commits). Reviews and spec would help.

Ok, when you finish a ticket, let me know, and I will do the review and spec.

The descriptor has:

"bandwidth" bandwidth-avg bandwidth-burst bandwidth-observed NL

Torflow does:

bw_observed = min(bandwidth-avg, bandwidth-burst, bandwidth-observed)

https://gitweb.torproject.org/pytorctl.git/tree/TorCtl.py#n459

But that's redundant, because tor relays do:

"bandwidth" min(RelayBandwidthRate, RelayBandwidthBust, BandwidthRate, BandwidthBurst, MaxAdvertisedBandwidth) min(RelayBandwidthBust, BandwidthBurst) bandwidth-observed NL

See get_effective_bwrate() and get_effective_bwburst().

i've been collecting and documenting all these possible values and the different names they could have so i don't get confused.
I've just not put that notes online somewhere yet but intend to do so.

Thanks!

comment:7 Changed 11 months ago by juga

Status: newneeds_review

Changed 11 months ago by juga

comment:8 Changed 11 months ago by juga

Graph generated scaling results using #27108, #27337, #27336 and this ticket:

there are less than 2 sbws measured bandwidths
all the sbws measured bandwidths are within 24 hours of each other
there are less than 2 descriptor observed bandwidths
all the descriptor observed bandwidths are within 24 hours of each other

comment:10 Changed 10 months ago by juga

Resolution: implemented
Status: needs_reviewclosed

Assing child #27346 to parent #27107, since this is implemented

comment:11 Changed 8 months ago by teor

Keywords: sbws-1.0-must-closed-moved-20181128 added
Milestone: sbws 1.0 (MVP must)sbws: 1.0.x-final

Move all closed sbws 1.0 must tickets to sbws 1.0.x-final

Note: See TracTickets for help on using tickets.