Right now, we provide links to CSV files that graphs are based on. But in some cases it requires non-trivial data wrangling skills to obtain the data in the graph, which is less usable than it could be.
What we could do instead is generate CSV files based on the graph and selected parameters. This will enable users to quickly obtain the data in a graph and further process it using tools of their choice.
I already wrote this code and deployed it but did not merge it yet. I'm going to post a branch in a minute. My plan is to merge later today. Rushing this a bit, because it's something that should still go into the February report.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Suggested by iwakeh and briefly discussed before opening this ticket: let's look into adding comments to these CSV files with a short copyright notice and minimal specification. The format should be self-explanatory in most cases. But it wouldn't hurt to write a sentence or two about that. Doesn't have to happen before merging, though.
Suggested by iwakeh and briefly discussed before opening this ticket: let's look into adding comments to these CSV files with a short copyright notice and minimal specification. The format should be self-explanatory in most cases. But it wouldn't hurt to write a sentence or two about that. Doesn't have to happen before merging, though.
Timing is fine, the discussion about format and explanatory text might take a while.
Starting suggestion:
#### The Tor Project### URL:# https://metrics.torproject.org/networksize.html?start=2017-11-30&end=2018-02-28# Parameters:# networksize: start=2017-11-30 end=2018-02-28## Legend:# date: UTC date (YYYY-MM-DD) when relays or bridges have been listed as running.# relays: average number on the given day.# bridges: average number on the given day.#date,relays,bridges2017-11-30,6512,19552017-12-01,6629,19592017-12-02,6647,19632017-12-03,6650,1976...
The suggestion there looks like a fine start. Without going into the details yet,
we'll have to see how to generate those lines in R using write.csv and
We need to use write.table, for example:
## the data summarysummary(y) date users downturns upturns lower upper 2018-01-01: 1 Min. :716.0 Mode :logical Mode :logical Min. :165.0 Min. :1039 2018-01-02: 1 1st Qu.:755.0 FALSE:58 FALSE:58 1st Qu.:403.5 1st Qu.:1106 2018-01-03: 1 Median :780.0 Median :455.0 Median :1173 2018-01-04: 1 Mean :778.9 Mean :444.1 Mean :1179 2018-01-05: 1 3rd Qu.:802.5 3rd Qu.:496.2 3rd Qu.:1225 2018-01-06: 1 Max. :858.0 Max. :593.0 Max. :1584 (Other) :52 # writingwrite("# some comments", file="data.csv")write("# some more comments", file="data.csv", append = TRUE)write.table(y, file="data.csv", append = TRUE, quote=FALSE, sep=",", row.names = FALSE)
Yields:
# some comments# some more commentsdate,users,downturns,upturns,lower,upper2018-01-01,787,FALSE,FALSE,369,13192018-01-02,754,FALSE,FALSE,427,12682018-01-03,791,FALSE,FALSE,485,11072018-01-04,823,FALSE,FALSE,452,1119...
we'll be able to re-use quite some content from the current stats.html.
Changes look ok and the csv files are very useful. Currently, as all links to stats.html are gone, it might be hard for users to find explanations for the data in files. But, this will only be temporary (Thanks, for adding #25387 (moved)!).
Regarding explanations, I agree that users will need to make sense of these CSV files themselves, at least for now. But at least I made sure they're all in "wide" format which I believe is what most Excel users would expect. And R users can easily reformat files. And yes, #25387 (moved) will fix that.