Opened 8 months ago

Last modified 7 weeks ago

#25687 new defect

over-report of observed / self-measure bandwidth on fast hardware -- important to torflow / peerflow

Reported by: starlight Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.2.6.10
Severity: Normal Keywords: tor-bwauth, needs-research, needs-proposal?
Cc: juga Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Have observed that on fast hardware the maximum bandwidth estimate reported by rep_hist_bandwidth_assess() and published via ri->bandwidthcapacity is frequently overstated and have seen it go as high 160% of true physical bandwidth, stay there for days.

Connected today in my mind that this may be one of the larger causes of torflow misbehavior No control system will function correctly with bad input data and without question +60% qualifies as bad--GIGO.

Problem appears to have worsened with the arrival of KIST scheduler.

Have no idea why it occurs, but have see for years, with overrating appearing relative to absolute physical link speeds and relative to Linux tc-police rate limits.

Even peerflow when it arrives will require reliable measurement data to function properly.

Child Tickets

Attachments (1)

20180920_060844_sbws_2nd sbws.png (24.5 KB) - added by juga 2 months ago.

Download all attachments as: .zip

Change History (20)

comment:1 in reply to:  description Changed 8 months ago by arma

Replying to starlight:

Have observed that on fast hardware the maximum bandwidth estimate reported by rep_hist_bandwidth_assess() and published via ri->bandwidthcapacity is frequently overstated and have seen it go as high 160% of true physical bandwidth, stay there for days.

Well, from Tor's perspective I think it really is seeing you get that level of throughput.

That is, it was able to see a sustained 10 second period where it got that average rate.

Part of the explanation might be that the kernel is saying "yep, it's sent" when actually it's just queued inside the kernel. For those cases, in theory Kist should be doing *better*, because it has a chance of saying "ok, actually I checked with the kernel and there's some stuff queued so I'm not going to try to write more quite yet". That said, the way the kernel maintains good throughput is by having a bunch of waiting stuff queued, so it can always be sending the next bit as quickly as possible rather than having to wait for user space to decide to ask it to send some more.

comment:2 Changed 8 months ago by teor

Milestone: Tor: unspecified
Version: Tor: 0.3.3.3-alpha

comment:3 Changed 8 months ago by starlight

Version: Tor: 0.3.3.3-alpha

In my view the problem constitutes a critical measurement error. The most important consumer of this value is torflow and overstating true available bandwidth distorts consensus allocations. No chance overstatements are correct, especially in the traffic-control scenario where Linux caps bandwidth with fine-grained, reliable precision. Some ISPs permit traffic to burst briefly before enforcing bandwidth restrictions, but in the cases I've seen bandwidth self-measure is well in excess of the the line burst rate and anyway reporting burst-rate maximum capacity to torflow is counterproductive. The impact of the issue may be acute due to torflow's control-signal reliance on measurements of marginal unused bandwidth rather than actual capacity.

To venture a guess, any of queuing (as suggested), chunky event-loop processing delays, course time precision, misalignment of time determination vs. data accounting are likely culprits.

Setting milestone was unintended, was thinking of the latest reproducing version.

comment:4 Changed 8 months ago by starlight

Summary: extreme over-report of observed / self-measure bandwidth on fast hardware -- critical to torflow / peerflowover-report of observed / self-measure bandwidth on fast hardware -- important to torflow / peerflow

Realized the +60% number I threw down is flawed and went back over data and configurations carefully. Nonetheless the problem is, or was significant.

Found an example where, with a tc-police rate limit, reported self-measure was 19% in excess of the limit. Daemon version 0.2.6.10. At the time of the observation the relay was overrated in the consensus 50% and flooded with ingress data at about 150% of the cap. More typically have seen continuous overrating at +5% with tc and a reasonable consensus weight.

I retract the assertion that KIST increased the problem--admit that I do not know.

Also stand by the recommendation that self-measure should not include peaks resulting from adaptive ISP rate limits and am sure that I've seen this.

comment:5 Changed 8 months ago by starlight

Version: Tor: 0.3.3.3-alphaTor: 0.2.6.10

have to walk-back the report-version for now; will run an experiment with 0.3.3

comment:6 Changed 8 months ago by teor

The current variance between bandwidth authorities is 50%.
Perhaps this issue is a contributing factor.

We're designing a new bandwidth authority codebase that doesn't rely on self-reported bandwidths.
We'll be able to prioritise this issue better once it is deployed.

comment:7 Changed 8 months ago by starlight

thank you for the update, bwscanner I presume?

I won't spend time quantifying this issue with 0.3.3, at least for now, though I did come up with a plan. If peerflow happens then measurement quality is a concern; could spend some time on the aforementioned analysis.

comment:8 in reply to:  7 Changed 8 months ago by teor

Replying to starlight:

thank you for the update, bwscanner I presume?

No, simple bandwidth scanner.
bwscanner doesn't generate files that authorities can use yet, so I don't know if it will use self-reported bandwidths for scanning.

I won't spend time quantifying this issue with 0.3.3, at least for now, though I did come up with a plan. If peerflow happens then measurement quality is a concern; could spend some time on the aforementioned analysis.

I think there are some inaccuracies we will just have to live with, and this might be one of them.

Edit: alternately, we can measure inbound data from peers, or measure acknowledged sent data, rather than raw sent data. There are plenty of options.

Last edited 8 months ago by teor (previous) (diff)

comment:9 Changed 2 months ago by juga

Cc: juga added
Keywords: tor-bwauth added

This ticket is consistent with what we've observed in torflow: when it scale the raw measurements, it uses mostly the observed bandwidth. The distribution of the observed bandwidth is almost exponential while the one of raw measurements are almost linear, what makes the final bandwidth to be read by the dirauths to have an almost exponential distribution too.

comment:10 Changed 2 months ago by starlight

The concern of this ticket is with regard to the observed value published by the daemon rather than the Torflow calculated vote resulting from it. I suspect that changes to measurement logic in the daemon have likely reduced or eliminated the error last definitively observed in 2.6.10. I may yet write a monitor to evaluate the quality of daemon measurements.

Torflow adjusts self-report with the ratio of the measured bandwidth for a relay to the average of all measured bandwidths in a manner short on nuance. Wrote about possible ways to address that here:

https://github.com/torproject/sbws/issues/182#issuecomment-410745893

Have considered opening a Trac ticket encompassing the ideas, but am waiting for an opportune moment at a point where serious comparison work is evident.

comment:11 in reply to:  10 ; Changed 2 months ago by juga

Replying to starlight:

The concern of this ticket is with regard to the observed value published by the daemon rather than the Torflow calculated vote resulting from it.

if by daemon you mean little-t-tor, i know

I suspect that changes to measurement logic in the daemon have likely reduced or eliminated the error last definitively observed in 2.6.10. I may yet write a monitor to evaluate the quality of daemon measurements.


if you do, would be awesome you point us to the code and results

Torflow adjusts self-report with the ratio

exactly, and that ratio is a small number (0.8-8 in the samples i took), therefore observed / self-measured bandwidth affects a lot on what Torflow finally reports.

of the measured bandwidth for a relay to the average of all measured bandwidths in a manner short on nuance. Wrote about possible ways to address that here:

https://github.com/torproject/sbws/issues/182#issuecomment-410745893

sorry, i found that comment hard to read

FWIW, we're continuing that ticket in multiple others. Follow the links of https://trac.torproject.org/projects/tor/ticket/27107

Have considered opening a Trac ticket encompassing the ideas, but am waiting for an opportune moment at a point where serious comparison work is evident.

on what you would like to have a serious comparison?

comment:12 in reply to:  11 Changed 2 months ago by teor

Keywords: needs-research needs-proposal? added

Replying to juga:

Replying to starlight:

The concern of this ticket is with regard to the observed value published by the daemon rather than the Torflow calculated vote resulting from it.


of the measured bandwidth for a relay to the average of all measured bandwidths in a manner short on nuance. Wrote about possible ways to address that here:

https://github.com/torproject/sbws/issues/182#issuecomment-410745893

sorry, i found that comment hard to read

I found it hard to read, too.

If we can't understand what you want, it's hard to know what we can do to improve tor.

In tickets or comments, short, specific suggestions are most helpful.

For substantial changes, please follow the proposal process:
https://gitweb.torproject.org/torspec.git/tree/proposals/001-process.txt

For substantial analysis, let's work on questions to ask researchers or metrics.
Many short questions are best, rather than large blocks of text.

FWIW, we're continuing that ticket in multiple others. Follow the links of https://trac.torproject.org/projects/tor/ticket/27107

In particular, see #27739 for our draft policy for resolving differences between sbws and torflow for sbws 1.0:

If a sbws deployment is within X% of an existing bandwidth authority, sbws is ok. (The total consensus weights of the existing bandwidth authorities are within 50% of each other, see #25459.)

In later releases, we may modify sbws to achieve better network load-balancing.

We talked about choosing an ideal distribution, but that might not be the best idea.

Instead, see Mike Perry's email:

as you make

choices about how to load balance, please have a specific goal as to
what target load balancing equilibrium point you're actually going for.

we should definitely watch the change

in our average and quartile variance performance metrics when we first
switch to sbws.

https://lists.torproject.org/pipermail/tor-dev/2018-August/013419.html

We'll have a better idea how to achieve this goal after more network measurements and research.

Have considered opening a Trac ticket encompassing the ideas, but am waiting for an opportune moment at a point where serious comparison work is evident.

Perhaps a proposal and a ticket would be best?

on what you would like to have a serious comparison?

I'm not sure what sort of comparisons we'll be doing. Mike's email has some options.

comment:13 Changed 2 months ago by starlight

The essence of Torflow's active approach is that observed bandwidth capacity at each relay is the key measurement and that it can only be reliably determined locally but that it requires adjustment, principally to account for used vs unused capacity and secondly the relative performance of each node in the asymmetric domain of internet traffic routing. IMO indisputably correct. The Peerflow paper tacitly recognizes this.

However the simple linear adjustment algorithm cannot be fine-tuned for better results across the vast range of relay performance. IIRC polynomial equations of sufficient order can describe curves of near arbitrary complexity and therefore parameterized polynomials can be used interactively, in a gradual empirical search, to describe an improving set of adjustment biases for applying scanner measurements to advertised bandwidths. This link illustrates the general principal, though the idea is to design and construct bias curves with polynomials rather then to fit them somehow.

https://en.wikipedia.org/wiki/Polynomial_regression

Averaging all relay measurements to a single value appears too simple in the face of reality. My suggestion was and is to apply some variation of a moving average in such a way that relay biases are determined relative to relays of similar capacity rather than to all relays. Note this is not at all the same as the "slice" scheme used by Torflow to group relays for measurement and I agree eliminating it was a good idea.

https://en.wikipedia.org/wiki/Moving_average

Evaluating relays against others of the same class seems a good idea. (I.e. process guards, exits and unflagged middle relays as isolated collections.) Torflow implemented it when PID logic was active, but presently it is disabled.

I believe the "filtered" and "stream" logic in Torflow increases noise in the system rather than reducing it, and functions in opposition to the intent of the design. The "filtered" treatment option should be eliminated.

Central to all the above is iterative empirical adjustment possibly assisted by automated collection and analysis, with control knobs exposed as consensus parameters. The above can easily be implemented in a manner that allows the new system to start exactly where Torflow is and to cautiously refine, with the option to reverse course at any point.

Last edited 2 months ago by starlight (previous) (diff)

comment:14 in reply to:  13 ; Changed 2 months ago by teor

Replying to starlight:

The essence of Torflow's active approach is that observed bandwidth capacity at each relay is the key measurement and that it can only be reliably determined locally but that it requires adjustment, principally to account for used vs unused capacity and secondly the relative performance of each node in the asymmetric domain of internet traffic routing. IMO indisputably correct. The Peerflow paper tacitly recognizes this.

Our parent ticket for these kinds of adjustments is #27346.

However the simple linear adjustment algorithm cannot be fine-tuned for better results across the vast range of relay performance. IIRC polynomial equations of sufficient order can describe curves of near arbitrary complexity and therefore parameterized polynomials can be used interactively, in a gradual empirical search, to describe an improving set of adjustment biases for applying scanner measurements to advertised bandwidths. This link illustrates the general principal, though the idea is to design and construct bias curves with polynomials rather then to fit them somehow.

https://en.wikipedia.org/wiki/Polynomial_regression

I have opened #27790 for this idea.

Averaging all relay measurements to a single value appears too simple in the face of reality. My suggestion was and is to apply some variation of a moving average in such a way that relay biases are determined relative to relays of similar capacity rather than to all relays. Note this is not at all the same as the "slice" scheme used by Torflow to group relays for measurement and I agree eliminating it was a good idea.

https://en.wikipedia.org/wiki/Moving_average

The ticket for implementing moving averages is #27789.

Evaluating relays against others of the same class seems a good idea. (I.e. process guards, exits and unflagged middle relays as isolated collections.) Torflow implemented it when PID logic was active, but presently it is disabled.

I opened #27791 for this idea.

I believe the "filtered" and "stream" logic in Torflow increases noise in the system rather than reducing it, and functions in opposition to the intent of the design. The "filtered" treatment option should be eliminated.

We have no plans to implement this feature in sbws. We have no plans to modify torflow.

Central to all the above is iterative empirical adjustment possibly assisted by automated collection and analysis, with control knobs exposed as consensus parameters. The above can easily be implemented in a manner that allows the new system to start exactly where Torflow is and to cautiously refine, with the option to reverse course at any point.

There are quite a few tickets open with metrics, torflow, and sbws for this kind of analysis. For example, see #7177, #24045, #24834, #25459, #21994.

The existing performance and bandwidth distribution metrics are quite good, too:
https://metrics.torproject.org/torperf.html
https://metrics.torproject.org/rs.html#map

Edit: spacing

Last edited 2 months ago by teor (previous) (diff)

comment:15 Changed 2 months ago by starlight

Thank you. I appreciate the effort to codify the ideas as tickets greatly. Sadly I do not have much time available to contribute right now.

We have no plans to implement this feature in sbws. We have no plans to modify torflow.

Of course. Mentioned this because the SBWS patch candidates I read implement the behavior, though perhaps this has changed since I looked. If the logic is present suggest having a consensus parameter to determine whether or not the behavior is in force. Best of luck!

comment:16 in reply to:  15 ; Changed 2 months ago by teor

Replying to starlight:

Thank you. I appreciate the effort to codify the ideas as tickets greatly. Sadly I do not have much time available to contribute right now.

We have no plans to implement this feature in sbws. We have no plans to modify torflow.

Of course. Mentioned this because the SBWS patch candidates I read implement the behavior, though perhaps this has changed since I looked. If the logic is present suggest having a consensus parameter to determine whether or not the behavior is in force. Best of luck!

Oh right, maybe I'm wrong then.

Juga, does sbws implement torflow's "filtered"option?

comment:17 in reply to:  16 Changed 2 months ago by juga

Replying to teor:

Replying to starlight:

Of course. Mentioned this because the SBWS patch candidates I read implement the behavior, though perhaps this has changed since I looked. If the logic is present suggest having a consensus parameter to determine whether or not the behavior is in force. Best of luck!

Oh right, maybe I'm wrong then.

Juga, does sbws implement torflow's "filtered"option?

yes, i did [0], since we wanted sbws to be able to behave like torflow (and it does [1]) to later decide which scaling algorithm we should have.
The filtered option doesn't change much, since torflow is taking the maximum between that and the stream bandwidth. See the attached graph.
The point i was trying to make in #comment:11 is:

  • we need to come out with a better algorithm that takes into account differently observed bandwidth and measured bandwidth. Measured bandwidth has little value in Torflow.
  • we (/i) might need to review how observed bandwidth is obtained

I think i'll be able to explain this better in a week, since these comments get messy across tickets (my fault)

[0] https://github.com/torproject/sbws/blob/master/sbws/lib/v3bwfile.py#L762
[1] https://trac.torproject.org/projects/tor/attachment/ticket/27338/20180905_092840_torflow_sbws.png

Changed 2 months ago by juga

comment:18 in reply to:  14 Changed 2 months ago by starlight

Replying to teor:

Averaging all relay measurements to a single value appears too simple in the face of reality. My suggestion was and is to apply some variation of a moving average in such a way that relay biases are determined relative to relays of similar capacity rather than to all relays. Note this is not at all the same as the "slice" scheme used by Torflow to group relays for measurement and I agree eliminating it was a good idea.

https://en.wikipedia.org/wiki/Moving_average

The ticket for implementing moving averages is #27789.

BTW what I mean here is a moving average where the X-axis is not a time series of measurements for a single relay, but where the X-axis is, for a single point-in-time consensus calculation, the ordered set of "observed bandwidth" measurements. I believe this could improve the behavior of consensus voting dramatically. The idea is that superfast relays should have their self-measure adjusted relative to the performance of superfast relays, that medium speed relays be compared with other medium speed relays, and low-capacity relays likewise are compared with other similarly slow relays. The width of the averaging window could vary depending on the X value of observed bandwidth, the behavior easily parameterized any number of ways. Please note my emphasis on "observed bandwidth" specifically, rather than the effective self advertised bandwidth that might be limited by "maximum advertised" bandwidth. Certainly the max advertised value should be used in calculating the vote, but for comparing similar relays actual capacity seems more appropriate. But, this could also vary depending on a control parameter and it could be tried both ways.

Of course the above will bring the high-end of the measurement offset ratio down from 8.0 to +/- in the neighborhood of 1.0, but the the polynomial control bias curve can be used to restore the emphasis on fast relays over slower relays though perhaps at less than the current extreme. Per my initial comment in SBWS issue 182, I recommend two bias curves, one applied to positive offsets and the other applied to negative offsets.

I believe it is critical to look at individual relays when evaluating results as well as graphical comparisons of all relays. A few gnarly spread-sheets or python scripts similar in concept to the ones I have posted will help illuminate both the micro and macro outcomes of different approaches.

I am an engineer rather than a mathematician, and everything I've suggested falls into the category of "empirical trial-and error." The problem of balancing the Tor network is so complex I believe it defies deterministic analysis in the way that weather does. The crux of it is taking an idea that does work (Torflow spec section 2 logic) an refining it via cautious experiment while keeping it reasonably simple. Nothing I've suggested requires more than the sort of medium complex math learned in advanced high-school course work. I was taught the basics of polynomial expressions in middle school (by an unusually gifted instructor whom I vividly recall, no doubt). Balancing the consensus is a dynamic feedback-loop control system of the sort where prediction modeling with any semblance of precision is impossible.

Instead of writing proposals, why not spend two or three weeks just coding it all up and then try it out in various mix/match combinations of the three general suggestions along with other research currently under way? Similar to how most Internet protocols came into existence. More often than not RFCs were written after the code was working and was already deployed in limited production.

Finally, I should reiterate that everything suggested can be "turned off" simply by applying parameter values. The polynomial bias curves can be disabled with 1 * x 0 (i.e a line where y=1), the moving average disabled by setting the window width to infinity, and classful grouping of relays disabled in a fashion similar to Torflow with an on/off toggle. In particular the bias curves and moving averages can be tested gently in increments and hasty retreats made when outcomes arrive in a manner not desired.

Last edited 7 weeks ago by starlight (previous) (diff)

comment:19 Changed 2 months ago by starlight

After writing the above I realized that perhaps the recommendation should be that three separate bias curves be constructed:

a) curve to affect the degree to which positive offsets of a relay's measurement to the average measurement is applied to the effective self-measure

b) curve to affect the degree to which negarive offsets of a relay's measurement to the average measurement is applied to the effective self-measure

c) curve to weight bias the result once either (a) or (b) have been applied to effective self measure

(a) and (b) are a fancier version of Torflow's Kp value, and (c) is intended to allow a designed rather than accidental biasing of faster over slower relays that results from the ratio offset approach. Since a moving average should remove the current behavior where fast relays have very high multiples over the all-relay average measurement, the (c) curve allows thoughtful restoration of a similar effect and IMO the combination of a moving average and the (c) bias curve can be used to solve the problem of outlier prone scoring w/r/t to individual relays.

Perhaps (a) and (b) are overkill, and will end up as 0.95, 1.00 or 1.05 * x 0 or the like, but since it's easy to code putting it in entails no downside and it could be useful.

Last edited 7 weeks ago by starlight (previous) (diff)
Note: See TracTickets for help on using tickets.