We should write a script to parse past consensuses and display the ratio values computed for different speeds and types of nodes. (The ratio is the "w Bandwidth=" line divided by the descriptor observed bandwidth value). This would provide us with similar output to the statsplitter.py script (https://ides.fscked.org/transient/stats.log).
It would be useful to plot this data over time.
We can also export more raw stats from the scanners themselves, including measured stream data and time pairs, and circuit failure rates.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
We should write a script to parse past consensuses and display the ratio values computed for different speeds and types of nodes. (The ratio is the "w Bandwidth=" line divided by the descriptor observed bandwidth value). This would provide us with similar output to the statsplitter.py script (https://ides.fscked.org/transient/stats.log).
It would be useful to plot this data over time.
We already have metrics-db which parses past consensuses and server descriptors and puts the described information into a database. We also have metrics-web which uses R to query the database and generate CSV files or graphs. If we want these data on a regular basis and without having to care about yet another script, let's extend metrics-db and metrics-web.
We can also export more raw stats from the scanners themselves, including measured stream data and time pairs, and circuit failure rates.
We can extend the metrics-db database schema to hold these measurements, too.
Maybe we should discuss the required changes on IRC and add a summary here?
Right now I am thinking it would be illustrative to graph ratio history for Guard, Middle, Exit, and Guard+Exit for the whole network, as well as for the top 10% and the bottom 10%. We can do that with the data in the consensus documents+descriptors already.
Adding new stats from the individual bw authorities is a longer term task. We should possibly make a different ticket for that one.
I'm not exactly sure what history graphs you have in mind. Also, maybe we should look closer at a single consensus before aggregating everything over time. Please have a look at the attached PDF which is based on the consensus published at 2011-01-24 00:00:00.
Page 1 shows a scatterplot of measured bandwidth as reported in consensuses and observed bandwidth as reported in descriptors. Note that observed really means observed, not advertised = min(observed, average). The graph shows that we prefer fast relays by giving them ratios of more than 5 times their observed bandwidth (dashed line). There's no apparent preference for fast relays with/without !Guard/Exit flag.
Page 2 zooms in the lower left part of page 1. We can see that we're discriminating against slow Exit and Middle relays, maybe even against all slow relays.
Page 3 shows cumulative distribution functions of the ratios. The point where lines cross 1 is where ratios turn from relays being rated down to relays being rated up. For example, we rate only 1/3 of Guard and Guard+Exit nodes down, but more than 3/4 of the Exit and Middle nodes. Also, about 5 % of Middle and 10 % of Exit relays have a ratio of exactly 1.
Did you already open a ticket for new stats from bw authorities? If so, we can reassign this ticket to component Metrics.
I'm not sure if this ticket should be a child of #2532 (moved). The original task description had two ideas in it: 1) compare self-reported to measured bandwidth and 2) export more raw data from the bandwidth scanners. We already have the data for 1) and only need to come up with useful graphs (see comments above). But in order to implement 2), we'll have to find a good data format, start collecting the data, analyze them to see what we can learn from them, and finally come up with useful graphs. I'd like to keep 1) and 2) separate tasks.
I'm changing this ticket to component Metrics and removing the parent ticket relation.
The next step for this ticket would be to discuss the attached graph or what graphs in general we want for comparing self-reported to measured bandwidth.
Trac: Component: Torflow to Metrics Owner: mikeperry to karsten Parent: #2532 (moved)toN/A
Karsten, while you're working on the system to generate this data for the consensus, can you also try to make your code easily tunable to produce it for the votes too? Seeing that data in comparison with the consensus would also be very informative.
Karsten: Re your PDF, I think you're right. Let's just snapshot each vote and consensus individually for now, but archive like a weeks worth.
I think that graph 3 is most useful the the consensus itself, but can we also produce a version of the CDF that weights itself by Fraction of Consensus Bandwidth?
For the votes, it may also be useful to include those labeled scatterplots from graph 1 and 3, so we can spot-check how the different bw authorities are voting for each relay. (Unless you can think of a better way to visualize this?)
Oh, we also want a seperate category for people who run the default exit policy on each graph. Maybe calling them SuperExits, so we would also have Guard+SuperExits.
> observation 1 is that there are basically no unmeasured guards<mikeperry> is that spike at 1 due to unmeasured exits?> yes, i think so.
> so more than half of the guards are measured at better than their advertised> 60% or so> whereas more than half the exits and middles are measured at less> i wonder if the guards that are measured as better are that way because theyrecently became guards so they don't have as much load yet
I focused on the data part first and left out the visualization part for the moment.
In the attachment you'll find a CSV file with relays contained in consensuses on January 24, 2011 (the same day that I also used in the graphs above). This CSV file contains 6 bandwidth values for each relay: observed bandwidth from the descriptor, bandwidth weight from the consensus, and the bandwidth weights as measured from the 4 bandwidth authorities. Also, there are 6 categories for each relay for the combinations of (Guard, !Guard) x (Exit with default exit policy, Exit with custom exit policy, !Exit).
The code to process descriptor tarballs and generate such a CSV file is in the metrics-tasks repository here.
I'm going to look into visualizations next week. If you have any suggestions, please let me know.
There are three new graphs attached to this ticket:
The first graph shows empirical cumulative distribution functions of ratios of measured by self-reported bandwidth for six categories of relays in the consensus from January 24, 2011 at 00:00:00 UTC.
The second graph compares ratios in votes to the ratios in the consensus for the six categories. The black line is the consensus, the gray lines are the four votes.
The third graph uses measured bandwidth weights on the y axis rather than absolute relay numbers. This graph is somewhat misleading, because it over-represents, well, relays with high measured bandwidth (surprise!). I wonder if there's a better weight to use here. The second graph is probably more useful than this one.
Can we have a refresh of graphs 2 and 3? We're just about to upgrade the bw scanners. I am particularly interested in the votes from urras. If those could be colored in a new color, that would help to notice anything strange from the new grouping process (#3444 (moved)) and file sizes (#1975 (moved)).
There are three new graphs attached to this ticket:
The second graph compares ratios in votes to the ratios in the consensus for the six categories. The black line is the consensus, the gray lines are the four votes.
The third graph uses measured bandwidth weights on the y axis rather than absolute relay numbers. This graph is somewhat misleading, because it over-represents, well, relays with high measured bandwidth (surprise!). I wonder if there's a better weight to use here. The second graph is probably more useful than this one.
See the updated code. There are still a few manual steps and you still need to download and extract server descriptor tarballs, but at least you don't have to download 2G vote tarballs anymore.
See the updated code. There are still a few manual steps and you still need to download and extract server descriptor tarballs, but at least you don't have to download 2G vote tarballs anymore.
Is it possible to provide a way to get all descriptors with a certain range of dates for "published"? That might make make automating the graphing process possible for a given date range. The monthly server descriptors are still a little painful to handle..
We should then showcase these urls and scripts on the data.html page for other people who want to study bw auths.
Is it possible to provide a way to get all descriptors with a certain range of dates for "published"? That might make make automating the graphing process possible for a given date range. The monthly server descriptors are still a little painful to handle..
Hmm, I don't like the idea of giving out custom selections of server descriptors covering a few days. But I could imagine providing a URL to download all server descriptors referenced from a given consensus. That URL could be !https://metrics.torproject.org/serverdesc?valid-after=2011-07-13-05-00-00 (DOES NOT WORK YET). In this case these are exactly the descriptors you want to have. I'll put it on my TODO list.
We should then showcase these urls and scripts on the data.html page for other people who want to study bw auths.
data.html is probably not the place where people would go to find out they want to study bw auths. Is there a bwauth page on the Tor homepage where we could put the URLs and scripts?