Opened 9 months ago

Closed 7 months ago

Last modified 6 months ago

#33255 closed task (fixed)

Review existing graphing code

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics Version:
Severity: Normal Keywords: metrics-team-roadmap-2020Q1
Cc: metrics-team Actual Points:
Parent ID: #33327 Points: 1
Reviewer: Sponsor: Sponsor59-must


We're going to modify and/or extend the graphing code in OnionPerf. Therefore we first need to review the existing graphing code. In particular, we need to document:

  • external dependencies like plotting libraries,
  • internal interdependencies with other OnionPerf code parts,
  • user interface with possible parameters,
  • input data requirements, and
  • all produced output files.

Child Tickets

Change History (7)

comment:1 Changed 9 months ago by karsten

Keywords: metrics-team-roadmap-2020Q1 added

comment:2 Changed 9 months ago by gaba

Parent ID: #33327

comment:3 Changed 8 months ago by gaba

Sponsor: Sponsor59

comment:4 Changed 8 months ago by karsten

Status: newneeds_review

Here's my review of OnionPerf commit a64b0e6, authored on 2019-10-24, still the latest commit in master as of 2020-02-25.

1. External dependencies like plotting libraries

The main Python requirements for the visualize subcommand are scipy, numpy, and matplotlib. The current versions as installed in my buster VM are:

ii  python-numpy                         1:1.16.2-1                  amd64        Numerical Python adds a fast array facility to the Python language
ii  python-scipy                         1.1.0-7                     amd64        scientific tools for Python
ii  python-matplotlib                    2.2.3-6                     amd64        Python based plotting system in a style similar to Matlab

2. Internal interdependencies with other OnionPerf code parts

Most of the visualization code is in onionperf/, with a tiny part in onionperf/onionperf for parsing arguments to the visualize subcommand and calling code in onionperf/

3. User interface with possible parameters

The visualize subcommand has the following arguments:

$ onionperf visualize -h
usage: onionperf visualize [-h] -d PATH LABEL [-p STRING] [-f LIST]

Loads an OnionPerf json file, e.g., one produced with the `analyze` subcommand,
and plots various interesting performance metrics to PDF files.

optional arguments:
  -h, --help            show this help message and exit
                        Append a PATH to a onionperf.analysis.json analysis
                        results file, and a LABEL that we should use for the
                        graph legend for this dataset (default: None)
  -p STRING, --prefix STRING
                        a STRING filename prefix for graphs we generate
                        (default: None)
  -f LIST, --format LIST
                        A comma-separated LIST of color/line format strings to
                        cycle to matplotlib's plot command (see
                        matplotlib.pyplot.plot) (default: k-,r-,b-,g-,c-,m-,y-

It's worth noting that the -d PATH LABEL argument can be given multiple times to plot multiple data sets as different CDFs or time series.

For example, the following command produces visualizations of measurements performed on 2019-01-11, 2019-01-21, and 2019-01-31 as three different data sets:

onionperf visualize \
  -d 2019-01-11.onionperf.analysis.json.xz 2019-01-11 \
  -d 2019-01-21.onionperf.analysis.json.xz 2019-01-21 \
  -d 2019-01-31.onionperf.analysis.json.xz 2019-01-31

4. Input data requirements

Input data consists of one or more data sets. Each data set uses the values from exactly one OnionPerf analysis document in the JSON format, which typically contains 1 UTC day of measurements from a single OnionPerf instance with different requested file sizes and server types (public, v2 onion, v3 onion).

5. All produced output files

The visualize subcommand produces 2 PDF files as output:

The first output file is called tgen.onionperf.viz.$timestamp.pdf and contains:

  • time to download first byte, all clients
  • mean time to download first of {51200,1048576,5242880} bytes, all clients over time
  • time to download {51200,1048576,5242880} bytes, all downloads
  • median time to download {51200,1048576,5242880} bytes, each client
  • mean time to download {51200,1048576,5242880} bytes, each client
  • max time to download {51200,1048576,5242880} bytes, each client
  • mean time to download last of {51200,1048576,5242880} bytes, all clients over time
  • number of {51200,1048576,5242880} byte downloads completed, each client
  • number of {51200,1048576,5242880} byte downloads completed, all clients over time
  • number of transfer {PROXY,READ} errors, each client
  • number of transfer {PROXY,READ} errors, all clients over time
  • bytes transferred before {PROXY,READ} error, all downloads
  • median bytes transferred before {PROXY,READ} error, each client
  • mean bytes transferred before {PROXY,READ} error, each client

The second output file is called tor.onionperf.viz.$timestamp.pdf and contains:

  • 60 second moving average throughput, read, all relays
  • 1 second throughput, read, all relays
  • 1 second throughput, read, each relay
  • 60 second moving average throughput, write, all relays
  • 1 second throughput, write, all relays
  • 1 second throughput, write, each relay


There are some good news:

  • The plotting libraries are pretty much standard and therefore a good basis for making more and better graphs.
  • The visualization code is nicely separated from the analysis and the measurement code in OnionPerf.
  • The user interface is very simple but also extensible towards adding more and better graphs.
  • There can be multiple input data sets per visualization, which is going to be useful.

There are also some challenges:

  • Input data sets are limited to a single analysis file each. This makes it difficult to plot several days of measurements before/during/after an experiment. In theory, it would be possible to process several days of logs into a single analysis document with the analyze subcommand with minimal code changes. But that requires having raw tgen and Tor controller logs around for creating a visualization, which is not very practical.
  • It's also not yet possible to filter measurements in the visualize subcommand. In theory, these changes could be made in the analyze subcommand to only include measurements of interest in the analysis file. But that's also not very practical. It would be easier to make the visualize subcommand more powerful by filtering measurements in each or all data sets.
  • Another aspect worth noting is that current visualizations are either based on logs from the tgen process or the tor process running at the client. Visualizations do not combine these two data sources, nor do they consider logs from server-side processes.

There we are. What did I miss? Setting to needs_review to hear what other parts need (closer) review. If this covers everything to be reviewed, we can resolve this ticket.

comment:5 Changed 7 months ago by acute

This was a comprehensive review, and your conclusions are spot on.

mean time to download first of {51200,1048576,5242880} bytes...
number of transfer {PROXY,READ} errors...

To add to this, if we were to add different sizes of transfers into our tgen models, they will automatically be added as keys for the analysis/visualisation also.

Similarly, all types of errors encountered by tgen will be present in the final visualization, so any {PROXY,READ,WRITE,AUTH,TIMEOUT,STALLOUT,MISC} errors can appear in the plots, if they were encountered.

I believe this covers everything.

comment:6 Changed 7 months ago by acute

Resolution: fixed
Status: needs_reviewclosed

comment:7 Changed 6 months ago by karsten

Sponsor: Sponsor59Sponsor59-must

Moving this to Sponsor59-must, because they have been important prerequisites for working on the other Sponsor59 tasks.

Note: See TracTickets for help on using tickets.