Opened 3 years ago

Last modified 22 months ago

#19183 assigned enhancement

Add sybilhunter's visualisations to Metrics website

Reported by: phw Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Website Version:
Severity: Normal Keywords: sybilhunter, visualization, churn, uptime
Cc: karsten, phw Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

It would be great to have sybilhunter's churn and uptime visualisations on the Metrics website. The churn plots are time series, just like the ones we already have on Metrics. Uptime visualisations are jpeg images. We could have weekly or monthly uptime images, and daily churn diagrams.

Sybilhunter is a Go program that expects as input files that are structured like CollecTor's archives. It should be straightforward to run it over cron.

Karsten, I don't know ggplot2. Could you help with plotting the churn values? The format is quite simple. Every line represents the churn changes for the current consensus, and starts with a timestamp, which is then followed by flag-specific churn values in the interval [0, 1].

As I understand it, at least the following two steps are necessary to incorporate both visualisations:

  • Modify ./website/etc/metrics.json.
  • Write a shell script for the cron job to run.

Is there anything else we need?

Child Tickets

Attachments (2)

networkchurn-on-metrics-2016-07-13.jpg (184.2 KB) - added by karsten 3 years ago.
uptimes-on-metrics-2016-07-13.jpg (161.3 KB) - added by karsten 3 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 3 years ago by karsten

Can you give me a sample input file for the churn values, so that I can write some R/ggplot2 for that?

And can you include a link to the Go code that's supposed to run on the metrics machine?

comment:2 Changed 3 years ago by phw

Here's a CSV file for the churn values: https://nymity.ch/sybilhunting/churn-values/churn-all.csv.bz2 (3.6 MiB). For each flag, there are four columns, two of which are interesting to us: NewFLAG and GoneFLAG. NewFLAG denotes the churn for new relays while GoneFLAG denotes the churn for relays that left the network. If this is difficult to process for you, then I'm happy to change the output format.

The Go code is available here:
https://gitweb.torproject.org/user/phw/sybilhunter.git/
Go directly compiles to statically-linked ELF binaries, so we can build a binary somewhere else and then copy it to the metrics machines. To build sybilhunter, run:

go get git.torproject.org/user/phw/sybilhunter.git

To create churn values, run:

sybilhunter -data path/to/collector/archive/ -churn -startdate 2016-06-01 -enddate 2016-06-02 2>/dev/null

comment:3 Changed 3 years ago by phw

Cc: phw added

comment:4 Changed 3 years ago by karsten

Hmm, I had some trouble fetching that .csv file. The server seems quite overloaded, and the downloaded file was partially corrupt. But I think I got the overall picture.

However, I noticed that you didn't implement the wide-to-long suggestion I mentioned a few months ago on metrics-team@, and I think that would make the graphing code somewhat easier. How likely is it that you'll find the time to work on that issue?

But we probably shouldn't block on that for putting stuff on Tor Metrics. How's this approach:

  • We add a new page type to Metrics called "gallery" which displays image files from a local directory directly on the Tor Metrics site. We need this type anyway for the uptime visualizations even when we replace the churn visualizations by something more interactive below. We'd produce these images exactly how you're currently producing them on your server but on the metrics server. Once we deploy this gallery pages, we'll replace the corresponding link pages, though we'd keep the URLs unchanged.
  • We write some R/ggplot2 code to make the churn visualizations somewhat more interactive by letting users select start and end date, flag type, and displayed metric (absolute numbers, fractions, etc.).

comment:5 in reply to:  4 Changed 3 years ago by phw

Replying to karsten:

However, I noticed that you didn't implement the wide-to-long suggestion I mentioned a few months ago on metrics-team@, and I think that would make the graphing code somewhat easier. How likely is it that you'll find the time to work on that issue?

I just added that feature. It's in the following branch:

git clone -b long-format https://git.torproject.org/user/phw/sybilhunter.git

The default output is now the long format. Here's an example:

Date,Authority,BadExit,Exit,Fast,Guard,HSDir,Named,Running,Stable,Unnamed,V2Dir,Valid,NewChurn,GoneChurn
2016-05-31T01:00:00Z,T,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.00000,0.00000
2016-05-31T01:00:00Z,NA,T,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.00000,0.00000
2016-05-31T01:00:00Z,NA,NA,T,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.00457,0.00457
2016-05-31T01:00:00Z,NA,NA,NA,T,NA,NA,NA,NA,NA,NA,NA,NA,0.00480,0.00315
2016-05-31T01:00:00Z,NA,NA,NA,NA,T,NA,NA,NA,NA,NA,NA,NA,0.00552,0.00184
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,T,NA,NA,NA,NA,NA,NA,0.00300,0.00030
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,T,NA,NA,NA,NA,NA,NaN,NaN
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,T,NA,NA,NA,NA,0.00643,0.00514
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,NA,T,NA,NA,NA,0.00118,0.00067
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,NA,NA,T,NA,NA,NaN,NaN
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,T,NA,0.00349,0.00501
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,T,0.00643,0.00514

Is that something you can work with?

Here's what I would add to website/etc/metrics.json:

  {
    "id": "uptimes",
    "title": "Monthly uptime of Tor relays",
    "tags": [
      "Relays"
    ],
    "type": "Graph",
    "level": "Advanced",
    "description": "<p>The following image illustrates the uptime of Tor relays for the past month.  Each row of pixels denotes one consensus (that is, one hour), and each column denotes one relay.  Black pixels mean that a relay was online, and white means offline.  So, each pixel denotes if a given relay was online or offline at a given hour.  We use red pixels to highlight relays with identical uptime patterns.</p>",
    "function": "plot_uptimes",
    "parameters": [
      "start",
      "end"
    ],
    "data": [
      "servers-data"
    ],
    "related": [
      "networkchurn"
    ]
  },
  {
    "id": "networkchurn",
    "title": "Network churn rate by relay flag",
    "tags": [
      "Relays"
    ],
    "type": "Graph",
    "level": "Advanced",
    "description": "<p>The following graph shows the churn rate of the Tor network by <a href=\"about.html#relay\">relay</a> flag. The churn rate, a value in the interval [0,1] captures the rate of relays joining and leaving the network.</p>",
    "function": "plot_networkchurn",
    "parameters": [
      "start",
      "end"
    ],
    "data": [
      "servers-data"
    ],
    "related": [
      "uptimes",
      "networksize",
      "relayflags"
    ]
  },
  • We add a new page type to Metrics called "gallery" which displays image files from a local directory directly on the Tor Metrics site. We need this type anyway for the uptime visualizations even when we replace the churn visualizations by something more interactive below. We'd produce these images exactly how you're currently producing them on your server but on the metrics server. Once we deploy this gallery pages, we'll replace the corresponding link pages, though we'd keep the URLs unchanged.
  • We write some R/ggplot2 code to make the churn visualizations somewhat more interactive by letting users select start and end date, flag type, and displayed metric (absolute numbers, fractions, etc.).

Sounds good to me. Please let me know if there's anything I can do to help.

Changed 3 years ago by karsten

Changed 3 years ago by karsten

comment:6 Changed 3 years ago by karsten

Replying to phw:

Date,Authority,BadExit,Exit,Fast,Guard,HSDir,Named,Running,Stable,Unnamed,V2Dir,Valid,NewChurn,GoneChurn
2016-05-31T01:00:00Z,T,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.00000,0.00000
2016-05-31T01:00:00Z,NA,T,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.00000,0.00000
2016-05-31T01:00:00Z,NA,NA,T,NA,NA,NA,NA,NA,NA,NA,NA,NA,0.00457,0.00457
2016-05-31T01:00:00Z,NA,NA,NA,T,NA,NA,NA,NA,NA,NA,NA,NA,0.00480,0.00315
2016-05-31T01:00:00Z,NA,NA,NA,NA,T,NA,NA,NA,NA,NA,NA,NA,0.00552,0.00184
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,T,NA,NA,NA,NA,NA,NA,0.00300,0.00030
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,T,NA,NA,NA,NA,NA,NaN,NaN
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,T,NA,NA,NA,NA,0.00643,0.00514
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,NA,T,NA,NA,NA,0.00118,0.00067
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,NA,NA,T,NA,NA,NaN,NaN
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,T,NA,0.00349,0.00501
2016-05-31T01:00:00Z,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,T,0.00643,0.00514

Is that something you can work with?

Neat! Yes, looks great! I didn't start writing code for this, but I don't see any problems with your data format right now.

[...] Here's what I would add to website/etc/metrics.json:

I rewrote your text a bit to fit more seamlessly into the rest of Metrics (well, I hope). Please take a look at my task-19183 branch.

I also attached two screenshots of the new pages (which are not yet deployed on the main Metrics instance yet):



Please let me know if you spot any problems or want me to change something. Like, want me to pick a different month as example? Happy to make such changes.

Oh, would you be able to update your image galleries? The latest graphs there are from 2016-01, and I bet people will ask for recent months when these pages go online.

comment:7 Changed 3 years ago by phw

Please let me know if you spot any problems or want me to change something. Like, want me to pick a different month as example? Happy to make such changes.

It looks good to me. Thanks for your work.

Oh, would you be able to update your image galleries? The latest graphs there are from 2016-01, and I bet people will ask for recent months when these pages go online.

I did it for now, for the uptime images, but I don't have plans to do that in the future. I'm just providing code and past analyses, but I don't want to sign up for providing continuous visualisations.

comment:8 in reply to:  7 Changed 3 years ago by karsten

Owner: changed from phw to karsten
Status: newassigned

Replying to phw:

Please let me know if you spot any problems or want me to change something. Like, want me to pick a different month as example? Happy to make such changes.

It looks good to me. Thanks for your work.

Thanks for looking. Pushed to master and deployed. Leaving this ticket open for the next steps.

Oh, would you be able to update your image galleries? The latest graphs there are from 2016-01, and I bet people will ask for recent months when these pages go online.

I did it for now, for the uptime images, but I don't have plans to do that in the future. I'm just providing code and past analyses, but I don't want to sign up for providing continuous visualisations.

Fair enough. Yet one more reason to get the next steps here done soon. :)

comment:9 Changed 22 months ago by karsten

Owner: changed from karsten to metrics-team

Handing over to metrics-team, because I'm not currently working on this.

Note: See TracTickets for help on using tickets.