Opened 2 years ago

Closed 17 months ago

#24217 closed enhancement (duplicate)

Specify data format and aggregation process of statistics offered by metrics.tp.o

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Statistics Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

As soon as we know what graphs we should make for #23761, we'll have to 1) specify the data format of the CSV file we need to produce for that.

And for Sponsor 13 we'll need to 2) specify the aggregation process, which will enable others to reproduce our results.

I'll start with 1) as soon as we have green light from #23761, but I'd appreciate help with 2).

Child Tickets

Change History (22)

comment:1 Changed 2 years ago by karsten

Including notes from yesterday's team meeting:

  • it might make sense to write the specification based on the prototype and comments and fresh memory of why decisions are made
  • still need to decide on the format (txt, latex, html, xml) and maybe prototype html export

comment:2 Changed 2 years ago by iwakeh

See #24218 (attachment) for a first content structure.

comment:3 Changed 23 months ago by karsten

Priority: MediumHigh
Status: newneeds_review

I made a start here by writing a draft for graphs in the Servers section of Tor Metrics. I did not bother with format and simply picked .md for now, with a script to convert that to .html for easier reading.

I also came up with a different content structure, based on thinking a bit more about what would be doable and what would actually help the intended audience. It's supposed to be a start, with a lot of room left for future improvement. Don't be shy, I'm open to suggestions.

Please review this first draft. The source .md file is available for download here and under version control in my metrics-web task-24217 branch.

Setting priority to high, because we'd ideally publish a draft of this by the end of the month to show that we're making real progress on this sponsor deliverable.

comment:4 Changed 23 months ago by karsten

Owner: changed from metrics-team to karsten
Status: needs_reviewaccepted

I guess I should accept the ticket, as I'll likely going to be working on it over the next week.

comment:5 Changed 23 months ago by karsten

Status: acceptedneeds_review

comment:6 Changed 23 months ago by iwakeh

Reviewer: iwakeh

Cool!
Adding myself to the reviewer field for reference (and assuming that other can add their username there, too).
Trying to get a first feedback done before tomorrows team-meeting.

comment:7 Changed 23 months ago by karsten

Thanks, iwakeh!

comment:8 Changed 23 months ago by iwakeh

Please find two commits on my branch.

Thanks for supplying the script! I converted it to an Ant task and will create a new ticket (#25022) for other scripts in this product. Aiming at having all build steps in build.xml, no matter what final format we use the ant task should be adapted.

Usage: ant generate-reproducible-spec File will be in generated/spec/reproducible-metrics.html

The second commit adds comments and ideas to the spec file. I'll add some questions to our meeting agenda too.

Last edited 23 months ago by iwakeh (previous) (diff)

comment:9 Changed 23 months ago by karsten

Thanks for the quick feedback!

I figured it's going to be much easier to discuss this content when it's on a Google Doc and not in Git, especially when we want to ask folks like t0mmy or our target audience to help us with comments, edits, or suggestions.

I put everything in the following Google Doc, including your comments and my comments to yours:

https://docs.google.com/document/d/19A98ymeidOcmHKl-3OUXkM8ZC7_VuooPTEZSW8DV7SI/edit?usp=sharing

Please take another look. There are some suggested next steps at the end. Thanks!

comment:10 in reply to:  9 Changed 23 months ago by iwakeh

Replying to karsten:

Thanks for the quick feedback!

I figured it's going to be much easier to discuss this content when it's on a Google Doc and not in Git, especially when we want to ask folks like t0mmy or our target audience to help us with comments, edits, or suggestions.

I put everything in the following Google Doc, including your comments and my comments to yours:

https://docs.google.com/document/d/19A98ymeidOcmHKl-3OUXkM8ZC7_VuooPTEZSW8DV7SI/edit?usp=sharing

Taking a look. (Weird, I just checked my inbox and the last mail I received about this is about comment 3 ???)

Please take another look. There are some suggested next steps at the end. Thanks!

Last edited 23 months ago by iwakeh (previous) (diff)

comment:11 in reply to:  9 Changed 23 months ago by iwakeh

Replying to karsten:

Thanks for the quick feedback!

I figured it's going to be much easier to discuss this content when it's on a Google Doc and not in Git, especially when we want to ask folks like t0mmy or our target audience to help us with comments, edits, or suggestions.

Hmm, I added as first next step (there too):

Agree on a format: the G-doc is very cumbersome to edit&read. Let’s use a common format like LaTeX, Markdown, whatever. Especially, if we want reviewers to comment rather than edit (as suggested below). The more common formats will also be easier to transform and we should keep the git versioning.

When we want comments and edits in the doc let's stay with markdown. Keeping git I find essential. The editors/commenters don't need to use it, but it will be easier for us to keep track of versions.

The shortcut/complete discussion should be resolved, too. (I commented in the G-doc)

comment:12 Changed 23 months ago by karsten

I commented on the Google Doc. Do you receive notifications for that?

Independent of these discussions, do you mind if I publish a PDF of the state by end of day to metrics-team@, so that we have something to reference in the January report for Sponsor 13?

comment:13 Changed 22 months ago by iwakeh

I did comment and resolved comments that seemed to be agreement to reduce some clutter there.
From one comment:

I think a storm-etherpad is best:

  1. our potential reviewers know how to handle it;
  2. the access can be fine grained;
  3. it has comments that allow providing replacement suggestions;
  4. it clearly marks the author; 5. login doesn't require some 'weird' accounts;
  5. the final product can be exported to Markdown and other formats, and
  6. last but not least, it is on Tor infrastructure.

I can help moving there before the first language reviewer works with the doc.

Another topic that needs more discussion outside of tiny comment-boxes:

Another viewpoint derived from the discussion here:
I think all 'shortcuts' should be actually a tutorial collection. "Hands on" data processing.
The 'complete journey' is the 'How Metrics processes the data'. In different sections only with cross-references.

Maybe, we should just list the options we have so far and let the reviewers help. They might even come up with some totally different neat idea here.

comment:14 Changed 22 months ago by karsten

Okay, let's move to a storm-etherpad. Or maybe a storm-markdown-thing. In any case a storm-something.

Regarding shortcuts vs. complete journeys, I agree that we should ask reviewers what they'd find most useful. Another aspect that we're not covering very well yet is background. The current description says what we're doing, but not why. I could imagine that some questions are left unanswered by the current format.

I want to move forward here. I'll copy over what we have on the Google Doc to storm, and then I'll add more content. If you have ideas what specifically I should be adding next, just let me know.

comment:15 Changed 22 months ago by karsten

Quick update: we moved to storm, and I added more content. Then I realized that the shortcuts should better be written in code and wrote the "download data as CSV" feature that is now deployed on all graph pages on Tor Metrics. Next step will be to add more content for the complete journeys. There's not much to see yet for that part.

I wonder if we should generalize the scope of this ticket. We're writing documentation for all data available on Tor Metrics, not just IPv6 relay statistics. But we can change the summary when we move forward here.

comment:16 Changed 22 months ago by iwakeh

Summary: Specify data format and aggregation process of new IPv6 relay statisticsSpecify data format and aggregation process of statistics offered by metrics.tp.o

Yes, the summary ought to be changed. Attempted a generalization.

comment:17 Changed 21 months ago by iwakeh

Is the status needs_review still current?

comment:18 in reply to:  17 Changed 21 months ago by karsten

Status: needs_reviewaccepted

Replying to iwakeh:

Is the status needs_review still current?

Ah, no, there's nothing to review at the moment.

comment:19 Changed 21 months ago by irl

Cc: metrics-team added

Adding metrics-team to cc

comment:20 Changed 19 months ago by iwakeh

Reviewer: iwakeh

comment:21 Changed 17 months ago by karsten

Priority: HighMedium

This ticket is about the same priority as most other Metrics/* tickets. Setting priority back to medium.

comment:22 Changed 17 months ago by karsten

Resolution: duplicate
Status: acceptedclosed

Looks like I opened #26857 when I could as well have revived the discussion on this ticket. Anyway, I looked through the comments above and did not find anything that's left unaddressed. Closing as near duplicate.

Note: See TracTickets for help on using tickets.