Opened 7 years ago

Closed 7 years ago

#5755 closed enhancement (implemented)

Atlas could show "fraction of Tor network by weight" graphs over time?

Reported by: arma Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Relay Search Version:
Severity: Keywords:
Cc: ln5, hellais, mikeperry Actual Points:
Parent ID: #6460 Points:
Reviewer: Sponsor:

Description

Atlas currently has "bandwidth carried by the relay" over time graphs, which are great.

How about a "fraction of the weight for that relay compared to total weights" over time graph?

I expect most relays will be ~0%, but some of the big ones will be 1% or 3% or something. And whether they go from 1% to 3% or back could help to debug issues like the one Andy asks about in http://archives.seul.org/tor/relays/May-2012/msg00001.html

Child Tickets

Attachments (3)

consensus-weights-Amunet1.png (119.1 KB) - added by karsten 7 years ago.
Fraction of Amunet1's consensus weights
consensus-weights-noiseexit01a.png (96.0 KB) - added by karsten 7 years ago.
Fraction of noiseexit01a's consensus weights
path-selection-weights-2012-07-17.png (236.8 KB) - added by karsten 7 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 7 years ago by arma

We might also want, for relays with the Exit flag, to know what fraction of the Exit capacity they are over time.

comment:2 Changed 7 years ago by karsten

Component: AtlasAnalysis
Owner: changed from hellais to karsten
Status: newassigned

I'm going to make a graph for noisetor-01 and a few others manually to see if they'd be useful to have in Atlas. Just to be clear, you're referring to consensus weights here, not bandwidth histories, right?

If the graphs turn out to be useful, the next step will be to make Onionoo provide the data for Atlas. Once we have that, we can make graphs in Atlas.

comment:3 in reply to:  1 Changed 7 years ago by karsten

Replying to arma:

We might also want, for relays with the Exit flag, to know what fraction of the Exit capacity they are over time.

If these graphs are all about consensus weights, we can have a) fraction of all weights, b) fraction of Exit weights, c) fraction of Guard weights, d) fraction of non-Guard and non-Exit weights, and so on.

comment:4 in reply to:  2 Changed 7 years ago by arma

Replying to karsten:

Just to be clear, you're referring to consensus weights here, not bandwidth histories, right?

Right.

comment:5 Changed 7 years ago by ln5

Cc: ln5 added

Changed 7 years ago by karsten

Fraction of Amunet1's consensus weights

comment:6 Changed 7 years ago by karsten

Here's a sample graph for Amunet1 (I didn't find a relay with nickname noisetor*; what's the correct nickname of that relay?):

Fraction of Amunet1's consensus weights

The three lines show how the fraction of Amunet1's consensus weights compared to different subsets of relays. The first two lines are what you asked for in this ticket.

The third line shows what happens if we further reduce the subset of relays to compare Amunet1 with by also requiring the Guard flag. The fraction goes up, because the subset is smaller.

Likewise, if Amunet1 didn't have the Exit flag, we could draw lines for "Running & !Exit" and for "Running & Guard & !Exit". Not sure how relevant the graphs with three flags are, but it's possible to make them. We should decide early if we want them or not.

comment:7 in reply to:  6 ; Changed 7 years ago by arma

Replying to karsten:

(I didn't find a relay with nickname noisetor*; what's the correct nickname of that relay?):

r noiseexit01c OkFUc4VPnwgvFu7N/kNiGP4Raeo EK8fXk8HYg/rrJo+HhkYMM4AkTk 2012-05-02 19:25:28 173.254.216.68 443 80
r noiseexit01d nJizj+JwVGxpIF4WBHuNRru7BEc x6OA4GZvKSPO7ZbkqYYT8ghfbDQ 2012-05-02 19:25:24 173.254.216.69 443 80
r noiseexit01a +X87FT/tZgQjDNSXo9HpgVsAdjY 5OJJhPvFKVHSgABNPnA200JUNVY 2012-05-02 19:25:25 173.254.216.66 443 80
r noiseexit01b /mgwcEq5UxiXia4ExbIt40yiudg O1sdvdEpaIf4RpTgkh5ic6/AWeA 2012-05-02 19:25:27 173.254.216.67 443 80

Changed 7 years ago by karsten

Fraction of noiseexit01a's consensus weights

comment:8 in reply to:  7 Changed 7 years ago by karsten

Replying to arma:

Replying to karsten:

(I didn't find a relay with nickname noisetor*; what's the correct nickname of that relay?):

r noiseexit01c OkFUc4VPnwgvFu7N/kNiGP4Raeo EK8fXk8HYg/rrJo+HhkYMM4AkTk 2012-05-02 19:25:28 173.254.216.68 443 80
r noiseexit01d nJizj+JwVGxpIF4WBHuNRru7BEc x6OA4GZvKSPO7ZbkqYYT8ghfbDQ 2012-05-02 19:25:24 173.254.216.69 443 80
r noiseexit01a +X87FT/tZgQjDNSXo9HpgVsAdjY 5OJJhPvFKVHSgABNPnA200JUNVY 2012-05-02 19:25:25 173.254.216.66 443 80
r noiseexit01b /mgwcEq5UxiXia4ExbIt40yiudg O1sdvdEpaIf4RpTgkh5ic6/AWeA 2012-05-02 19:25:27 173.254.216.67 443 80

Aha. There we go. Let me know if you want graphs for the other noiseexits, too.

comment:9 Changed 7 years ago by karsten

Cc: hellais added
Component: AnalysisOnionoo

It seems the graph was at least somewhat useful, right? Moving forward by making this an Onionoo task to provide the required data for Atlas.

What bandwidth-based graphs would we want Atlas to present? I'm asking because I'd like to implement all bandwidth-related extensions at once. Note that each graph requires up to 5 KB of data per relay in bandwidth documents. We should only add data for graphs that we think will be quite useful. We'll also have to think about Atlas' interface, because it shouldn't end up displaying 6 x 6 graphs on a relay's details page.

a) Written and read bytes (we already have those and they're actually 2 x 5 KB per relay in size)

b) Written and read directory request bytes.

c) Advertised bandwidth; minimum of bandwidth rate, burst, and observed bandwidth as reported by the relay in its server descriptor.

d) Consensus weight fraction as compared to all other relays in the network.

e) Consensus weight fraction as compared to all other relays in the network having the same Exit and Guard flag as the relay in question. If we provide this graph, we should use all flags that are relevant for clients to make path-selection decisions (are Exit and Guard the correct ones?). Having graphs for single flags (e.g., comparing all relays with the Exit flag in a single graph) is probably too much.

f) Consensus weight as absolute number instead of a fraction.

There would be similar intervals for these graphs as for the current graphs: 3 days, 1 week, 1 month, 3 months, 1 year, 5 years. Unless it doesn't make sense to include some of these intervals for some graphs, of course.

comment:10 Changed 7 years ago by karsten

Roger and I today talked about the list of possible bandwidth data to be provided by Onionoo. We came up with these three new graphs:

  1. Fraction of advertised bandwidth in the network (based on c above). This metric is similar to the current consensus weights which are based on measured bandwidths, but for the case that we'd return to self-reported bandwidths.
  1. Consensus weight fraction as compared to all other relays in the network (same as d above).
  1. Approximate probability for a relay to be selected by clients (based on e above). This metric is based on consensus weights and the Exit and Guard flags. I'm not entirely clear yet how this metric would be calculated. Maybe it's something like 1/3 * (P(selected as guard) + P(selected as middle node) + P(selected as exit)).

Maybe we should have some sample graphs for a few relays first and decide that these are the graphs we want Onionoo/Atlas to provide.

Arturo and I also talked about improving Atlas' interface to display different types of bandwidth graphs. Instead of having six graphs for different time periods of the same data, we could have a single graph with six buttons to switch between the various intervals. That also allows us to add new bandwidth graphs.

Changed 7 years ago by karsten

comment:11 Changed 7 years ago by karsten

Here's a sample graph of nine relays in June 2012. This graph contains five lines for every relay:

  • advbw_frac is the relative advertised bandwidth of a relay compared to the total advertised bandwidth in the network. This is case 1 in the comment above.
  • cw_frac is the fraction of a relay's consensus weight compared to the sum of all consensus weights in the network. This is case 2 above.
  • P_guard is the approximate probability of a relay to be selected for the guard position. Consensus weights are weighted by Wgd and Wgg, whereas Wge and Wgm are both set to 0. This is part of case 3 above.
  • P_middle is the approximate probability of a relay to be selected for the middle position. Consensus weights are weighted by Wmd, Wmg, Wme, and Wmm. This is part of case 3 above.
  • P_exit is the approximate probability of a relay to be selected for the exit position. Consensus weights are weighted by Wed and Wee, whereas Weg and Wem are both set to 0. This is part of case 3 above.

So, the first two lines are easy to compute. I'm less sure about the probabilities though. I figured that path selection is a lot more complex than I thought. For example, the subset of relays that are suitable for a given position in a given circuit varies from circuit to circuit which can cut the subset in half. I tried to compute an "average" probability for selecting a given relay for a given position. I'm not sure how close to reality these probabilities are.

comment:12 Changed 7 years ago by karsten

Cc: mikeperry added

Mike, can you comment on how closely the probabilities above (P_guard, P_middle, and P_exit) resemble reality?

comment:13 Changed 7 years ago by mikeperry

It sounds like you're doing it right, though if you hardcoded Wge and Wgm at 0, be aware that they're only 0 now because exits are scarce relative to Guard capacity.

I'd probably need to see the equations you used to produce those P_* values to be sure, actually.

Also, during his guard research, Tariq discovered that Family lines have a rather substantial effect on node selection probabilities. The conditional probability of choosing additional nodes once certain families are selected for other nodes varies dramatically by family, apparently... But that's perhaps not worth worrying about for this ticket.

comment:14 in reply to:  13 ; Changed 7 years ago by karsten

Replying to mikeperry:

It sounds like you're doing it right, though if you hardcoded Wge and Wgm at 0, be aware that they're only 0 now because exits are scarce relative to Guard capacity.

Hmm, can we even pick a relay without the Guard flag for the guard position?

It took me a bit longer to decide on hard-coding Weg and Wem to 0. There might be relays with weird exit policies which don't have the Exit flag, but which could be selected for the exit position. On the other hand, it's just too weird to see a relay with reject *:* that has a non-zero probability for being picked as exit. But I don't know if hard-coding the weights to 0 is a good idea here. It's a simple solution, though. :)

I'd probably need to see the equations you used to produce those P_* values to be sure, actually.

I just cleaned up and committed the Java code that I used to produce the P_* values. You'll probably be interested in the part beginning in line 174.

Also, during his guard research, Tariq discovered that Family lines have a rather substantial effect on node selection probabilities. The conditional probability of choosing additional nodes once certain families are selected for other nodes varies dramatically by family, apparently... But that's perhaps not worth worrying about for this ticket.

I noticed that there are quite a few influences on path selection which we can't model. I ran a modified client that prints out some details about its path-selection decisions. At times, it picked a relay from less than half of the relays in the network (though it had descriptors for all of them). I didn't investigate reasons for disregarding all the other relays. It could be families, /16's, or anything else. I guess what I'm looking here is an average probability, preferably one that can be computed rather easily from looking at the consensus only. Also, I'd like to use whatever P_guard and P_exit we come up with for the network diversity calculation in #6232.

Thanks!

comment:15 in reply to:  14 ; Changed 7 years ago by mikeperry

Replying to karsten:

Replying to mikeperry:

It sounds like you're doing it right, though if you hardcoded Wge and Wgm at 0, be aware that they're only 0 now because exits are scarce relative to Guard capacity.

Hmm, can we even pick a relay without the Guard flag for the guard position?

Err duh, not normally. I misbrained the special case hack we do to allow bridges to be weighted like guards (Wgm=Wgg), but that doesn't matter to you.

It took me a bit longer to decide on hard-coding Weg and Wem to 0. There might be relays with weird exit policies which don't have the Exit flag, but which could be selected for the exit position. On the other hand, it's just too weird to see a relay with reject *:* that has a non-zero probability for being picked as exit. But I don't know if hard-coding the weights to 0 is a good idea here. It's a simple solution, though. :)

Yeah, you're right. I think you can leave it as-is, because even for the weird exit policy case, it's not representative of the total traffic flow.

I'd probably need to see the equations you used to produce those P_* values to be sure, actually.

I just cleaned up and committed the Java code that I used to produce the P_* values. You'll probably be interested in the part beginning in line 174.

This looks good to me.

comment:16 in reply to:  15 Changed 7 years ago by karsten

Replying to mikeperry:

Replying to karsten:

I just cleaned up and committed the Java code that I used to produce the P_* values. You'll probably be interested in the part beginning in line 174.

This looks good to me.

Thanks for taking a look!

comment:17 Changed 7 years ago by karsten

Parent ID: #6460

This ticket is related to metrics to measure the safety of the Tor network, too.

I'm also making good progress on it. The code is implemented and tested, but it might still be too slow for deployment. Hoping to fix that this week. Once Onionoo has the data, Atlas needs to graph it.

comment:18 Changed 7 years ago by karsten

Component: OnionooAtlas

The changes to Onionoo are now implemented and deployed. It offers a new GET weights URL that returns weights (advertised bandwidth fraction, consensus weight fraction, guard/middle/exit probability) to be graphed by Atlas. Making this an Atlas ticket again. hellais, want to grab this ticket?

comment:19 Changed 7 years ago by karsten

Resolution: implemented
Status: assignedclosed

Atlas now contains weights graphs at the bottom of details pages. That concludes this ticket. Closing.

Note: See TracTickets for help on using tickets.