Opened 16 months ago

Closed 15 months ago

Last modified 12 months ago

#27135 closed defect (implemented)

Write descriptor bandwidths average in raw results

Reported by: juga Owned by: juga
Priority: Medium Milestone: sbws: 1.0.x-final
Component: Core Tor/sbws Version:
Severity: Normal Keywords: sbws-1.0-must-closed-moved-20181128
Cc: pastly, juga@…, teor, juga Actual Points:
Parent ID: #27108 Points:
Reviewer: Sponsor:

Description

Since generating bandwidth files in the way that Torflow does, implies using the average of the descriptors bandwidths over the time it performs measurements.
sbws only obtain the descriptor bandwidth once, as commented in https://github.com/pastly/simple-bw-scanner/issues/182#issuecomment-412845969

Child Tickets

Attachments (2)

toy.py (631 bytes) - added by pastly 16 months ago.
20180826_081902.png (37.2 KB) - added by juga 16 months ago.
sbws bw scaled as torflow compared to torflow

Download all attachments as: .zip

Change History (29)

comment:1 Changed 16 months ago by pastly

Ahh great :/

So we need to get all the relays' descriptors every hour and save (at least) the bandwidth lines in them? This is doable, and probably isn't that hard honestly. It's just probably going to have to be an entire separate thread that sbws scanner runs that does this work periodically. It doesn't make sense to me to put this data in the results files.

comment:2 Changed 16 months ago by juga

i'm not sure we should get it every hour, i haven't find yet how frequently torflow does.
Teor, do you know it?, i found when it reads it https://gitweb.torproject.org/pytorctl.git/tree/SQLSupport.py#n421, but not when it stores it.

If we need to obtain the descriptor bws more frequently that once/day, i do would store them in the results files, as a list for relay, to keep the same logic we're doing with the rest of values and to avoid to have to read/write yet another file with its locks and so.
Would not be possible to just obtained from somewhere as https://github.com/pastly/simple-bw-scanner/blob/master/sbws/core/scanner.py#L248?.

Actually, i don't know right now why relay doesn't have the updated descriptor there. Because we called both Relay in resultdump and relaylist, i'm not sure where https://github.com/pastly/simple-bw-scanner/blob/master/sbws/lib/relaylist.py#L206 is being called.

Maybe it's enough to store once per file, but for some reason i haven't figure yet, all days have the same descriptor bw.

As an example, this are the descriptor bws from 4 files from 2 relays:
relay 1: [9216000, 9216000, 9216000]
relay 2: [1073741824, 1073741824, 1073741824]

Since you know better this part of the code, would you feel like to check why is that?

comment:3 Changed 16 months ago by juga

If you think you can implement this fast and feel like, feel free to reassign it to you and work on it, since i might be afk tomorrow.

comment:4 in reply to:  2 Changed 16 months ago by juga

Replying to juga:

i'm not sure we should get it every hour, i haven't find yet how frequently torflow does.

though it would make sense if torflow is getting them every hour.

Would not be possible to just obtained from somewhere as https://github.com/pastly/simple-bw-scanner/blob/master/sbws/core/scanner.py#L248?.

hmm, it seems to be correctly getting it from there, but we still only measure the relay around once/day

Actually, i don't know right now why relay doesn't have the updated descriptor there. Because we called both Relay in resultdump and relaylist, i'm not sure where https://github.com/pastly/simple-bw-scanner/blob/master/sbws/lib/relaylist.py#L206 is being called.

Maybe it's just coincidence that descriptor bws were the same for all the relays measured here, at the time the were measured.

comment:5 in reply to:  2 ; Changed 16 months ago by pastly

Replying to juga:

As an example, this are the descriptor bws from 4 files from 2 relays:
relay 1: [9216000, 9216000, 9216000]
relay 2: [1073741824, 1073741824, 1073741824]

Since you know better this part of the code, would you feel like to check why is that?

I downloaded your sbws datadir and wrote a little script to print how often a relay has a list of saved relay_average_bandwidth containing more than one unique element.

It found that 1868/7999 relays had relay_average_bandwidth values that weren't all the same (118/7326 if you only look at success results). I'm guessing relays don't update this value in their descriptors very often.

Changed 16 months ago by pastly

Attachment: toy.py added

comment:7 in reply to:  5 ; Changed 16 months ago by teor

Replying to juga:

i'm not sure we should get it every hour, i haven't find yet how frequently torflow does.
Teor, do you know it?

We need to design sbws timings to match the Tor network.
(If torflow gets it wrong, or is outdated, we want to get it right.)

Relays running recent Tor releases (with #23856) update their descriptor bandwidths every 3-18 hours. After #24104 merges, most relays will update their descriptor bandwidths every 12-18 hours, but new relays will update every 3-18 hours.

   ORs SHOULD generate a new server descriptor and a new extra-info
   document whenever any of the following events have occurred:

      ...

      - Its uptime is less than 24h and bandwidth has changed by a factor of 2
        from the last time a descriptor was generated, and at least a given
        interval of time (3 hours by default) has passed since then.

      ...

   ORs SHOULD NOT publish a new server descriptor or extra-info document
   if none of the above events have occurred and not much time has passed
   (12 hours by default).

https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n354

Once a relay uploads its descriptor, the authorities include it in the next consensus. Descriptor updates in the consensus can take 10 minutes (the voting period) to 18 hours (the descriptor expiry period for the last descriptor).

The consensus is produced every 30 minutes or 1 hour. Then clients download the consensus from directory mirrors, which download from authorities.

Each consensus may include some new descriptors. When a client gets a consensus with unknown descriptors, it can take a few minutes for it to download those descriptors from one of its directory mirrors.

So sbws can update descriptors:

  • as descriptors arrive, via control events (as starlight says, that's how torflow does it), or
  • a few minutes after every consensus arrives, or
  • every 30 minutes to 1 hour, at a randomly chosen time.

Do whatever is easiest. Using an old descriptor is ok here.

Replying to pastly:

I downloaded your sbws datadir and wrote a little script to print how often a relay has a list of saved relay_average_bandwidth containing more than one unique element.

It found that 1868/7999 relays had relay_average_bandwidth values that weren't all the same (118/7326 if you only look at success results). I'm guessing relays don't update this value in their descriptors very often.

I don't know the period covered by juga's sbws datadir. But these figures seem reasonable for a few hours' consensuses: 1/18 * 7999 = 444.

If the datadir covers days, maybe sbws is losing descriptor updates?

comment:8 in reply to:  7 Changed 16 months ago by pastly

Replying to teor:

Replying to pastly:

I downloaded your sbws datadir and wrote a little script to print how often a relay has a list of saved relay_average_bandwidth containing more than one unique element.

It found that 1868/7999 relays had relay_average_bandwidth values that weren't all the same (118/7326 if you only look at success results). I'm guessing relays don't update this value in their descriptors very often.

I don't know the period covered by juga's sbws datadir. But these figures seem reasonable for a few hours' consensuses: 1/18 * 7999 = 444.

If the datadir covers days, maybe sbws is losing descriptor updates?

It covered days. Like maybe a week. I think the issue is that sbws only records a relay's current descriptor bandwidth when it is recording a measurement for it, which happens very roughly once a day (this is not a parameter that can be directly tuned, it depends on many things like number of measurement threads and target download length). So yes sbws is losing descriptor updates, because I/we/it never knew they were that important to begin with!

comment:9 Changed 16 months ago by teor

And that's fair enough, we weren't expecting to have to use descriptor bandwidths for anything.

Here's the simplest model we could use:

  • take the latest measurement, and scale by the descriptor bandwidth at the time of the measurement

Here's a more accurate model we could use:

  • take the decaying average of the measurements, and scale by the decaying average of the descriptor bandwidths

And we should probably think about:

  • how we handle zeroes (replace with 1 is a nice simple fix for a bunch of issues)
  • how we handle new relays (treat their old bandwidths as 1 is a nice simple way of rewarding stability)

comment:10 Changed 16 months ago by juga

take the decaying average of the measurements, and scale by the decaying average of the descriptor bandwidths

could you point out what do you mean by decaying average?, this [0]?

btw, it seems that the answer to my question in [1] is that 1/45 ~= 1/e

how we handle new relays (treat their old bandwidths as 1 is a nice simple way of rewarding stability)

you mean for their descriptor bw?, for what we write to the bw file?

[0] https://en.wikipedia.org/wiki/Exponential_decay#Mean_lifetime
[1] https://github.com/pastly/simple-bw-scanner/issues/182#issuecomment-409341398

comment:11 in reply to:  10 ; Changed 16 months ago by teor

Replying to juga:

take the decaying average of the measurements, and scale by the decaying average of the descriptor bandwidths

could you point out what do you mean by decaying average?, this [0]?

Yes, I mean an exponentially decaying average. But that article was written by mathematicians, not programmers.

Here's a simple exponential decay algorithm that can be used for both descriptor bandwidths and measured bandwidths:

  1. Set the decay constant (R) to a number in (0, 1).

We get a descriptor every 12-18 hours, and there is a daily cycle in Tor bandwidths. So we want to average over a few days, to smooth out spikes. A good value for R is 0.98 every hour, because the decaying average for 15 hour bandwidths is 26% B1 + 19% B2 + 14% B3 + 11% B4 + 30% B5-10. (B11 is less than 1%.)

  1. For each relay, set the decaying average (D) to 1
  1. At the start of each hour:
  1. multiply the decaying average by the decay constant: D *= R
  1. add the most recent bandwidth (B) to the decaying average: D += B
  1. When voting:
  1. Scale decaying averages before including them in votes (V): V = D * (1 - R)

If a relay has a constant bandwidth (B), then the decaying average will eventually be (strictly, will converge to) B * 1/(1 - R). We probably want decaying average bandwidths to be comparable to bandwidths, so we should scale them.

  1. Scale each relay's weight so the total is approximately torflow's total weight.

We want the consensus weights to be comparable with Torflow's consensus weights.

  1. If the relay's weight is 0, vote 1.

We want all relays to have a small weight, to avoid zero-division bugs.

btw, it seems that the answer to my question in [1] is that 1/45 ~= 1/e

Can you explain what you mean by e?

The base of the natural logarithm is e ~= 2.72, so 1/e ~= 0.37.

how we handle new relays (treat their old bandwidths as 1 is a nice simple way of rewarding stability)

you mean for their descriptor bw?, for what we write to the bw file?

Yes, for the initial decaying average for new relays, and the final vote bandwidth for all relays. See steps 2. and 4.c. above, where I use 1 instead of 0.

[0] https://en.wikipedia.org/wiki/Exponential_decay#Mean_lifetime
[1] https://github.com/pastly/simple-bw-scanner/issues/182#issuecomment-409341398

comment:12 in reply to:  11 ; Changed 16 months ago by juga

  1. Set the decay constant (R) to a number in (0, 1).

We get a descriptor every 12-18 hours, and there is a daily cycle in Tor bandwidths.

If i just get the descriptor every 12h, after 5 days will have 10 values for it, and if it changes, then i think would be enough to get the mean of them.

So we want to average over a few days, to smooth out spikes. A good value for R is 0.98 every hour, because the decaying average for 15 hour bandwidths is 26% B1 + 19% B2 + 14% B3 + 11% B4 + 30% B5-10. (B11 is less than 1%.)

Where these numbers (0.98, 26, 19,...) are coming from?. Mathematics might help

  1. When voting:

By when voting you mean when we generate the bandwidth list file?

btw, it seems that the answer to my question in [1] is that 1/45 ~= 1/e

Can you explain what you mean by e?

The base of the natural logarithm is e ~= 2.72, so 1/e ~= 0.37.

i meant the natural logarithm, exactly that

comment:13 in reply to:  12 Changed 16 months ago by teor

Replying to juga:

  1. Set the decay constant (R) to a number in (0, 1).

We get a descriptor every 12-18 hours, and there is a daily cycle in Tor bandwidths.

If i just get the descriptor every 12h, after 5 days will have 10 values for it, and if it changes, then i think would be enough to get the mean of them.

Descriptors can change every hour, and bandwidths can change for new relays every 3 hours. So it would be better to get the descriptor every hour.

I don't think you should use the mean, because old values are much less relevant than newer values. (But if you want to write code that takes 10 values and calculates the mean, I can rewrite it to calculate a decaying weighted average.)

So we want to average over a few days, to smooth out spikes. A good value for R is 0.98 every hour, because the decaying average for 15 hour bandwidths is 26% B1 + 19% B2 + 14% B3 + 11% B4 + 30% B5-10. (B11 is less than 1%.)

Where these numbers (0.98, 26, 19,...) are coming from?. Mathematics might help

0.98 is an arbitrarily chosen constant K that seems to produce a reasonable decay over a few days.

The percentages are a geometric progression based on K, over a week of 15-hour average descriptor times.

In python, they are:

>>> K = 0.98
>>> K15 = 0.98**15
>>> for i in xrange(0,(7*24)/15):
...   print "(1 - K15)*(K15**{}) =".format(i), (1 - K15)*(K15**i)
... 
(1 - K15)*(K15**0) = 0.261430897355
(1 - K15)*(K15**1) = 0.193084783263
(1 - K15)*(K15**2) = 0.142606455109
(1 - K15)*(K15**3) = 0.105324721581
(1 - K15)*(K15**4) = 0.0777895851047
(1 - K15)*(K15**5) = 0.0574529840659
(1 - K15)*(K15**6) = 0.0424329988859
(1 - K15)*(K15**7) = 0.0313397019097
(1 - K15)*(K15**8) = 0.0231465355166
(1 - K15)*(K15**9) = 0.0170953159659
(1 - K15)*(K15**10) = 0.0126260721723
  1. When voting:

By when voting you mean when we generate the bandwidth list file?

Yes.

btw, it seems that the answer to my question in [1] is that 1/45 ~= 1/e

Can you explain what you mean by e?

The base of the natural logarithm is e ~= 2.72, so 1/e ~= 0.37.

i meant the natural logarithm, exactly that

I still don't understand. But I don't think it's important.

comment:14 Changed 16 months ago by juga

With a separate python script, i've collected descriptors' bandwidth during 3 days, every hour.
I've tried the method you proposed, but it still don't smooth out spikes. I'm quite sure it's because the descriptors' bandwidth don't change (much).

I assumed Torflow's desc_bw and avg_desc_bw are the descriptor average bandwidth, but looking TorCtl.py [0] seems to be observed bandwidth, so i'm going to collect those instead and see how that change the curve.

In any case, if Torflow is applying a decaying average, do you know where is that in the code?.

[0] https://gitweb.torproject.org/pytorctl.git/tree/TorCtl.py#n489

comment:15 in reply to:  14 Changed 16 months ago by teor

Replying to juga:

With a separate python script, i've collected descriptors' bandwidth during 3 days, every hour.
I've tried the method you proposed, but it still don't smooth out spikes. I'm quite sure it's because the descriptors' bandwidth don't change (much).

Please retry with observed bandwidth,

I assumed Torflow's desc_bw and avg_desc_bw are the descriptor average bandwidth, but looking TorCtl.py [0] seems to be observed bandwidth, so i'm going to collect those instead and see how that change the curve.

Yes, we want the observed bandwidth. The bandwidth rate and bandwidth burst come from the torrc, so they don't change much.

In any case, if Torflow is applying a decaying average, do you know where is that in the code?.

Torflow uses a complicated calculation when pid control is active.
This calculation attempts to apply a decaying weight to previous results, so that the feedback loop converges.
(But the feedback loop does not converge, because this part of the torflow design is broken.)
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/aggregate.py#n117

sbws needs to be able to operate in two different modes:

  1. Ignore self-reported bandwidth, so sbws can be more secure
  2. Use observed bandwidth to scale measurements, so sbws can be more like torflow


Then, if sbws totals are different to torflow totals, we can apply a final linear scale step.

Right now, we need to focus on getting the code working, and comparing sbws with torflow.

Please use whatever observed bandwidth you have right now.

Any of the following methods are ok:

  • the latest observed bandwidth
  • an average of recent observed bandwidths
  • a decaying average of recent observed bandwidths

We'll also need a short explanation of how sbws works, so we can explain sbws to authority and relay operators.

comment:16 Changed 16 months ago by juga

sbws needs to be able to operate in two different modes:

i'm taking this into account, yes there'll be 2 modes

Any of the following methods are ok:

i'm going to try them in the order you propose

We'll also need a short explanation of how sbws works

i'll update the docs, once this part of the code is working

Changed 16 months ago by juga

Attachment: 20180826_081902.png added

sbws bw scaled as torflow compared to torflow

comment:17 Changed 16 months ago by juga

Taking the descriptor observed bandwidth only when the relay is measured and calculating the mean when there're several observed bandwidth values for the same relay, the graph is now very close to torflow, with data from only 1 day.
Should i still take the observed bandwidth every hour and calculate the mean and/or decaying average?

comment:18 in reply to:  17 ; Changed 16 months ago by teor

Replying to juga:

Taking the descriptor observed bandwidth only when the relay is measured and calculating the mean when there're several observed bandwidth values for the same relay, the graph is now very close to torflow, with data from only 1 day.

That's great news!
Thank you (and pastly) for all your hard work.

Should i still take the observed bandwidth every hour and calculate the mean and/or decaying average?

If the simple method works, then let's keep it simple.

How long do you keep old relay bandwidths before deleting them?
I suggest 1 week is a good time.

Next steps:

How should we deal with differences between sbws and torflow?

Here are some example rules:

  1. Any difference between sbws and torflow is a bug in sbws that should be fixed
  2. If a sbws deployment is within 50% of any existing bandwidth authority, sbws is ok (the existing bandwidth authorities are within 50% of each other)
  3. Let's choose an ideal bandwidth distribution for the Tor network, and modify sbws until we get that distribution

I suggest we use 2 as a transition rule, and 3 as a long-term rule. (Before we do rule 3 designs, we will need more research.)

Which mode do you think should be the default?

  1. Ignore self-reported bandwidth, so sbws can be more secure (and then scale linearly)
  2. Use observed bandwidth to scale measurements, so sbws can be more like torflow

I think we should ask the bandwidth authority operators and network team for feedback, so we can answer both these questions.

comment:19 in reply to:  18 ; Changed 16 months ago by juga

Should i still take the observed bandwidth every hour and calculate the mean and/or decaying average?

If the simple method works, then let's keep it simple.

agree, i also looked again Torflow and realized the is only taking the last one, so it's even simpler

How long do you keep old relay bandwidths before deleting them?
I suggest 1 week is a good time.

If you mean the v3bw files, that's the default right now:

https://github.com/pastly/simple-bw-scanner/blob/master/sbws/config.default.ini#L77

However they will be compressed after 1 day with the debian default configuration:

https://salsa.debian.org/pkg-privacy-team/sbws/blob/debian/master/debian/sbws.cron.d#L1

How should we deal with differences between sbws and torflow?

Here are some example rules:

  1. Any difference between sbws and torflow is a bug in sbws that should be fixed
  2. If a sbws deployment is within 50% of any existing bandwidth authority, sbws is ok (the existing bandwidth authorities are within 50% of each other)
  3. Let's choose an ideal bandwidth distribution for the Tor network, and modify sbws until we get that distribution

I suggest we use 2 as a transition rule, and 3 as a long-term rule. (Before we do rule 3 designs, we will need more research.)

  1. sounds good. By 50% you mean each relay bandwidth? (or the mean or median?). How do we know that the existing bandwidth authorities are within 50% each other?, is there any script or graph that shows that?

Regarding 3., i think that could be include in the 1. below

Which mode do you think should be the default?

  1. Ignore self-reported bandwidth, so sbws can be more secure (and then scale linearly)

maybe it does not have to be linear, but use here exponentail decaying average or other method over the sbws measurements (not self-reported)

  1. Use observed bandwidth to scale measurements, so sbws can be more like torflow

i was imaging that we'd start running it as 2. and after some more research on how to scale sbws, switch to 1.

I think we should ask the bandwidth authority operators and network team for feedback, so we can answer both these questions.

i think so too. All of this would be easier to explain and discuss in the next meeting. It's still 1 month away, but anyway it won't be possible to run sbws in the dirauths until then, since there won't be new stem release until them #26914#comment:3.

What i would propose is:

  1. make PR with the code that store descriptors' observed bandwidth
  2. make PR with the code that scale as torflow (previous graph) #27108
  3. update sbws documentation with this (probably new ticket)
  4. update debian package with new release
  5. continue with #27107: obtain more measurements, make more graphs, think about previous 3. maybe update the bandwidth file specification
  6. continue with #21377 and other little-t-tor tickets until the meeting
  7. discuss in the meeting with the network team and dirauths how sbws should be scaled

comment:20 in reply to:  19 ; Changed 16 months ago by teor

Replying to juga:

...

How long do you keep old relay bandwidths before deleting them?
I suggest 1 week is a good time.

If you mean the v3bw files, that's the default right now:

https://github.com/pastly/simple-bw-scanner/blob/master/sbws/config.default.ini#L77

You said:

Taking the descriptor observed bandwidth only when the relay is measured and calculating the mean when there're several observed bandwidth values for the same relay

How many observed bandwidth values does sbws use for the mean?
Does sbws stop using old observed bandwidths after 1 week?

I think 1 day or 1 measurement is too short. Most observed bandwidths are the maximum for 1 day, so they include the whole daily cycle. But the tor network has a weekly cycle as well.

Also, if we want stable bandwidths, using a few days of activity is a good idea.

How should we deal with differences between sbws and torflow?

Here are some example rules:

  1. Any difference between sbws and torflow is a bug in sbws that should be fixed
  2. If a sbws deployment is within 50% of any existing bandwidth authority, sbws is ok (the existing bandwidth authorities are within 50% of each other)
  3. Let's choose an ideal bandwidth distribution for the Tor network, and modify sbws until we get that distribution

I suggest we use 2 as a transition rule, and 3 as a long-term rule. (Before we do rule 3 designs, we will need more research.)

  1. sounds good. By 50% you mean each relay bandwidth? (or the mean or median?). How do we know that the existing bandwidth authorities are within 50% each other?, is there any script or graph that shows that?

Karsten made a graph betweenn July 2017 and February 2018:
https://trac.torproject.org/projects/tor/ticket/25459#comment:5

In the graph, the lowest bwauth has 35M total consensus weight, and the highest has 70M.

Adding the graph to metrics.torproject.org is on the metrics roadmap.
Until then, we can ask Karsten to re-do the graph, or we can write a script that adds the bandwidths from each vote.

Regarding 3., i think that could be include in the 1. below

I agree.

Which mode do you think should be the default?

  1. Ignore self-reported bandwidth, so sbws can be more secure (and then scale linearly)

maybe it does not have to be linear, but use here exponentail decaying average or other method over the sbws measurements (not self-reported)

While sbws shares the network with torflow, we need three stages:

  1. produce measured and observed bandwidths for each relay, using:
    • the most recent result or
    • an average of recent results or
    • a decaying average
  2. scale the sbws results, using:
    • measured bandwidth only (option 1)
    • observed bandwidth and measured bandwidth (option 2)
  3. scale sbws results linearly so the total consensus weight is close to torflow's
  1. Use observed bandwidth to scale measurements, so sbws can be more like torflow

i was imaging that we'd start running it as 2. and after some more research on how to scale sbws, switch to 1.

I agree.

I think we should ask the bandwidth authority operators and network team for feedback, so we can answer both these questions.

i think so too. All of this would be easier to explain and discuss in the next meeting. It's still 1 month away, but anyway it won't be possible to run sbws in the dirauths until then, since there won't be new stem release until them #26914#comment:3.

Ok, let's make sure we have meetings for:

  • status update and overall plan
  • helping dirauths (and test network operators) install sbws
  • detailed designs and bug fixes

(Let's not have too many meetings!)

What i would propose is:

  1. make PR with the code that store descriptors' observed bandwidth
  2. make PR with the code that scale as torflow (previous graph) #27108
  3. update sbws documentation with this (probably new ticket)
  4. update debian package with new release
  5. continue with #27107: obtain more measurements, make more graphs, think about previous 3. maybe update the bandwidth file specification

Yes, we need to update the scaling part of the bandwidth file specification. I can do that if you want. Please assign the spec ticket to me, when the code is ready for review. Then we can do the spec and code and make sure they match before we merge.

  1. continue with #21377 and other little-t-tor tickets until the meeting

Let's catch up with pastly and prioritise the remaining tickets.

Please take a break if you want to.

  1. discuss in the meeting with the network team and dirauths how sbws should be scaled

Sounds good to me.

comment:21 in reply to:  20 Changed 16 months ago by teor

This has become a huge ticket, so I'm splitting it into child tickets of #27108.

Replying to teor:

Replying to juga:

...

How long do you keep old relay bandwidths before deleting them?
I suggest 1 week is a good time.

If you mean the v3bw files, that's the default right now:

https://github.com/pastly/simple-bw-scanner/blob/master/sbws/config.default.ini#L77

You said:

Taking the descriptor observed bandwidth only when the relay is measured and calculating the mean when there're several observed bandwidth values for the same relay

How many observed bandwidth values does sbws use for the mean?
Does sbws stop using old observed bandwidths after 1 week?

I think 1 day or 1 measurement is too short. Most observed bandwidths are the maximum for 1 day, so they include the whole daily cycle. But the tor network has a weekly cycle as well.

Also, if we want stable bandwidths, using a few days of activity is a good idea.

Let's talk about how long to keep bandwidths in #27338.

How should we deal with differences between sbws and torflow?

Here are some example rules:

  1. Any difference between sbws and torflow is a bug in sbws that should be fixed
  2. If a sbws deployment is within 50% of any existing bandwidth authority, sbws is ok (the existing bandwidth authorities are within 50% of each other)
  3. Let's choose an ideal bandwidth distribution for the Tor network, and modify sbws until we get that distribution

I suggest we use 2 as a transition rule, and 3 as a long-term rule. (Before we do rule 3 designs, we will need more research.)

  1. sounds good. By 50% you mean each relay bandwidth? (or the mean or median?). How do we know that the existing bandwidth authorities are within 50% each other?, is there any script or graph that shows that?

Karsten made a graph betweenn July 2017 and February 2018:
https://trac.torproject.org/projects/tor/ticket/25459#comment:5

In the graph, the lowest bwauth has 35M total consensus weight, and the highest has 70M.

Adding the graph to metrics.torproject.org is on the metrics roadmap.
Until then, we can ask Karsten to re-do the graph, or we can write a script that adds the bandwidths from each vote.

Regarding 3., i think that could be include in the 1. below

I agree.

Which mode do you think should be the default?

Let's discuss the policy for differences between torflow and sbws in #27339.

  1. Ignore self-reported bandwidth, so sbws can be more secure (and then scale linearly)

maybe it does not have to be linear, but use here exponentail decaying average or other method over the sbws measurements (not self-reported)

While sbws shares the network with torflow, we need three stages:

  1. produce measured and observed bandwidths for each relay, using:
    • the most recent result or
    • an average of recent results or
    • a decaying average

See #27338.

  1. scale the sbws results, using:
    • measured bandwidth only (option 1)
    • observed bandwidth and measured bandwidth (option 2)

See #27339.

  1. scale sbws results linearly so the total consensus weight is close to torflow's

Let's talk about final scaling in #27340.

After final scaling, we also need a rounding step. See #27337.

  1. Use observed bandwidth to scale measurements, so sbws can be more like torflow

i was imaging that we'd start running it as 2. and after some more research on how to scale sbws, switch to 1.

I agree.

I think we should ask the bandwidth authority operators and network team for feedback, so we can answer both these questions.

I will send an email to tor-dev now.

i think so too. All of this would be easier to explain and discuss in the next meeting. It's still 1 month away, but anyway it won't be possible to run sbws in the dirauths until then, since there won't be new stem release until them #26914#comment:3.

Ok, let's make sure we have meetings for:

  • status update and overall plan
  • helping dirauths (and test network operators) install sbws
  • detailed designs and bug fixes

(Let's not have too many meetings!)

What i would propose is:

  1. make PR with the code that store descriptors' observed bandwidth
  2. make PR with the code that scale as torflow (previous graph) #27108
  3. update sbws documentation with this (probably new ticket)
  4. update debian package with new release
  5. continue with #27107: obtain more measurements, make more graphs, think about previous 3. maybe update the bandwidth file specification

Yes, we need to update the scaling part of the bandwidth file specification. I can do that if you want. Please assign the spec ticket to me, when the code is ready for review. Then we can do the spec and code and make sure they match before we merge.

  1. continue with #21377 and other little-t-tor tickets until the meeting

Let's catch up with pastly and prioritise the remaining tickets.

Please take a break if you want to.

  1. discuss in the meeting with the network team and dirauths how sbws should be scaled

Sounds good to me.

comment:23 Changed 15 months ago by juga

Simple commented approach: write descriptor observed bandwidth at the time of measuring the relay
https://github.com/pastly/simple-bw-scanner/pull/247
I'll do other PR with the torflow scaling using it.
I should probably change the title, since here i'm ot taking average bandwidth more often, but observed bandwidth

comment:24 Changed 15 months ago by juga

Status: assignedneeds_review

comment:26 Changed 15 months ago by juga

Resolution: implemented
Status: needs_reviewclosed

comment:27 Changed 12 months ago by teor

Keywords: sbws-1.0-must-closed-moved-20181128 added
Milestone: sbws 1.0 (MVP must)sbws: 1.0.x-final

Move all closed sbws 1.0 must tickets to sbws 1.0.x-final

Note: See TracTickets for help on using tickets.