Make all graph data available as CSV

added component::metrics/website owner::karsten priority::high resolution::fixed reviewer::iwakeh severity::normal status::closed type::enhancement labels

Here's my task-25382 branch that implements this feature.

Changes are already deployed and can be tried out on https://metrics.torproject.org/ (just look out for "Download data as CSV." on the graph pages).

Happy to make fixes before this goes into master later today.

Trac:
Status: assigned to needs_review

Taking a look :-)

Trac:
Reviewer: N/A to iwakeh

Suggested by iwakeh and briefly discussed before opening this ticket: let's look into adding comments to these CSV files with a short copyright notice and minimal specification. The format should be self-explanatory in most cases. But it wouldn't hurt to write a sentence or two about that. Doesn't have to happen before merging, though.

Did the 'possible censorship events' get lost (example)?

Replying to karsten:

Suggested by iwakeh and briefly discussed before opening this ticket: let's look into adding comments to these CSV files with a short copyright notice and minimal specification. The format should be self-explanatory in most cases. But it wouldn't hurt to write a sentence or two about that. Doesn't have to happen before merging, though.

Timing is fine, the discussion about format and explanatory text might take a while.

Starting suggestion:

##
## The Tor Project
##
# URL:
#  https://metrics.torproject.org/networksize.html?start=2017-11-30&end=2018-02-28
# Parameters:
#  networksize: start=2017-11-30 end=2018-02-28
#
# Legend:
#  date: UTC date (YYYY-MM-DD) when relays or bridges have been listed as running.
#  relays: average number on the given day.
#  bridges: average number on the given day.
#
date,relays,bridges
2017-11-30,6512,1955
2017-12-01,6629,1959
2017-12-02,6647,1963
2017-12-03,6650,1976
...

Replying to iwakeh:

Did the 'possible censorship events' get lost (example)?

All fine, we don't display these for 'all users' only for separate countries, where things are fine.

The suggestion there looks like a fine start. Without going into the details yet,

we'll have to see how to generate those lines in R using write.csv and
we'll be able to re-use quite some content from the current stats.html.

Replying to karsten:

The suggestion there looks like a fine start. Without going into the details yet,

we'll have to see how to generate those lines in R using write.csv and

We need to use write.table, for example:

## the data summary
summary(y)
         date        users       downturns        upturns            lower           upper     
 2018-01-01: 1   Min.   :716.0   Mode :logical   Mode :logical   Min.   :165.0   Min.   :1039  
 2018-01-02: 1   1st Qu.:755.0   FALSE:58        FALSE:58        1st Qu.:403.5   1st Qu.:1106  
 2018-01-03: 1   Median :780.0                                   Median :455.0   Median :1173  
 2018-01-04: 1   Mean   :778.9                                   Mean   :444.1   Mean   :1179  
 2018-01-05: 1   3rd Qu.:802.5                                   3rd Qu.:496.2   3rd Qu.:1225  
 2018-01-06: 1   Max.   :858.0                                   Max.   :593.0   Max.   :1584  
 (Other)   :52                                                                                 


# writing
write("# some comments", file="data.csv")
write("# some more comments", file="data.csv", append = TRUE)
write.table(y, file="data.csv", append = TRUE, quote=FALSE, sep=",", row.names = FALSE)

Yields:

# some comments
# some more comments
date,users,downturns,upturns,lower,upper
2018-01-01,787,FALSE,FALSE,369,1319
2018-01-02,754,FALSE,FALSE,427,1268
2018-01-03,791,FALSE,FALSE,485,1107
2018-01-04,823,FALSE,FALSE,452,1119
...

we'll be able to re-use quite some content from the current stats.html.

True, shortened versions of these explanations.

There, I moved the comments discussion to a separate ticket: #25387 (moved).

If I don't hear otherwise, I'll merge changes to master in an hour or so.

Trac:
Priority: Medium to High

Changes look ok and the csv files are very useful. Currently, as all links to stats.html are gone, it might be hard for users to find explanations for the data in files. But, this will only be temporary (Thanks, for adding #25387 (moved)!).

Trac:
Status: needs_review to merge_ready

Thanks for looking! I'll merge later today.

Regarding explanations, I agree that users will need to make sense of these CSV files themselves, at least for now. But at least I made sure they're all in "wide" format which I believe is what most Excel users would expect. And R users can easily reformat files. And yes, #25387 (moved) will fix that.

Merged. We'll take care of the rest in other tickets. Closing. Thanks!

Trac:
Status: merge_ready to closed
Resolution: N/A to fixed

closed

mentioned in issue #25383 (moved)

mentioned in issue #25387 (moved)

Make all graph data available as CSV

Child items 0

Activity