#25382 closed enhancement (fixed)

Make all graph data available as CSV

Reported by: karsten Owned by: karsten
Priority: High Milestone:
Component: Metrics/Website Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: iwakeh Sponsor:

Description

Right now, we provide links to CSV files that graphs are based on. But in some cases it requires non-trivial data wrangling skills to obtain the data in the graph, which is less usable than it could be.

What we could do instead is generate CSV files based on the graph and selected parameters. This will enable users to quickly obtain the data in a graph and further process it using tools of their choice.

I already wrote this code and deployed it but did not merge it yet. I'm going to post a branch in a minute. My plan is to merge later today. Rushing this a bit, because it's something that should still go into the February report.

Child Tickets

Change History (12)

comment:1 Changed 13 months ago by karsten

Status: assignedneeds_review

Here's my task-25382 branch that implements this feature.

Changes are already deployed and can be tried out on https://metrics.torproject.org/ (just look out for "Download data as CSV." on the graph pages).

Happy to make fixes before this goes into master later today.

comment:2 Changed 13 months ago by iwakeh

Reviewer: iwakeh

Taking a look :-)

comment:3 Changed 13 months ago by karsten

Suggested by iwakeh and briefly discussed before opening this ticket: let's look into adding comments to these CSV files with a short copyright notice and minimal specification. The format should be self-explanatory in most cases. But it wouldn't hurt to write a sentence or two about that. Doesn't have to happen before merging, though.

comment:4 Changed 13 months ago by iwakeh

Did the 'possible censorship events' get lost (example)?

comment:5 in reply to:  3 Changed 13 months ago by iwakeh

Replying to karsten:

Suggested by iwakeh and briefly discussed before opening this ticket: let's look into adding comments to these CSV files with a short copyright notice and minimal specification. The format should be self-explanatory in most cases. But it wouldn't hurt to write a sentence or two about that. Doesn't have to happen before merging, though.

Timing is fine, the discussion about format and explanatory text might take a while.

Starting suggestion:

##
## The Tor Project
##
# URL:
#  https://metrics.torproject.org/networksize.html?start=2017-11-30&end=2018-02-28
# Parameters:
#  networksize: start=2017-11-30 end=2018-02-28
#
# Legend:
#  date: UTC date (YYYY-MM-DD) when relays or bridges have been listed as running.
#  relays: average number on the given day.
#  bridges: average number on the given day.
#
date,relays,bridges
2017-11-30,6512,1955
2017-12-01,6629,1959
2017-12-02,6647,1963
2017-12-03,6650,1976
...

comment:6 in reply to:  4 Changed 13 months ago by iwakeh

Replying to iwakeh:

Did the 'possible censorship events' get lost (example)?

All fine, we don't display these for 'all users' only for separate countries, where things are fine.

comment:7 Changed 13 months ago by karsten

The suggestion there looks like a fine start. Without going into the details yet,

  • we'll have to see how to generate those lines in R using write.csv and
  • we'll be able to re-use quite some content from the current stats.html.
Last edited 13 months ago by karsten (previous) (diff)

comment:8 in reply to:  7 Changed 13 months ago by iwakeh

Replying to karsten:

The suggestion there looks like a fine start. Without going into the details yet,

  • we'll have to see how to generate those lines in R using write.csv and

We need to use write.table, for example:

## the data summary
summary(y)
         date        users       downturns        upturns            lower           upper     
 2018-01-01: 1   Min.   :716.0   Mode :logical   Mode :logical   Min.   :165.0   Min.   :1039  
 2018-01-02: 1   1st Qu.:755.0   FALSE:58        FALSE:58        1st Qu.:403.5   1st Qu.:1106  
 2018-01-03: 1   Median :780.0                                   Median :455.0   Median :1173  
 2018-01-04: 1   Mean   :778.9                                   Mean   :444.1   Mean   :1179  
 2018-01-05: 1   3rd Qu.:802.5                                   3rd Qu.:496.2   3rd Qu.:1225  
 2018-01-06: 1   Max.   :858.0                                   Max.   :593.0   Max.   :1584  
 (Other)   :52                                                                                 


# writing
write("# some comments", file="data.csv")
write("# some more comments", file="data.csv", append = TRUE)
write.table(y, file="data.csv", append = TRUE, quote=FALSE, sep=",", row.names = FALSE)

Yields:

# some comments
# some more comments
date,users,downturns,upturns,lower,upper
2018-01-01,787,FALSE,FALSE,369,1319
2018-01-02,754,FALSE,FALSE,427,1268
2018-01-03,791,FALSE,FALSE,485,1107
2018-01-04,823,FALSE,FALSE,452,1119
...
  • we'll be able to re-use quite some content from the current stats.html.

True, shortened versions of these explanations.

comment:9 Changed 13 months ago by karsten

Priority: MediumHigh

There, I moved the comments discussion to a separate ticket: #25387.

If I don't hear otherwise, I'll merge changes to master in an hour or so.

comment:10 Changed 13 months ago by iwakeh

Status: needs_reviewmerge_ready

Changes look ok and the csv files are very useful. Currently, as all links to stats.html are gone, it might be hard for users to find explanations for the data in files. But, this will only be temporary (Thanks, for adding #25387!).

comment:11 Changed 13 months ago by karsten

Thanks for looking! I'll merge later today.

Regarding explanations, I agree that users will need to make sense of these CSV files themselves, at least for now. But at least I made sure they're all in "wide" format which I believe is what most Excel users would expect. And R users can easily reformat files. And yes, #25387 will fix that.

comment:12 Changed 13 months ago by karsten

Resolution: fixed
Status: merge_readyclosed

Merged. We'll take care of the rest in other tickets. Closing. Thanks!

Note: See TracTickets for help on using tickets.