Opened 4 months ago

Closed 2 months ago

#26857 closed enhancement (implemented)

Add new page with specifications for reproducing graphs and tables

Reported by: karsten Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Website Version:
Severity: Normal Keywords:
Cc: metrics-team Actual Points:
Parent ID: Points:
Reviewer: irl Sponsor: Sponsor13

Description

This ticket is (finally) going to implement Sponsor 13 objective 2.

Child Tickets

Change History (12)

comment:1 Changed 4 months ago by karsten

Status: assignedneeds_review

irl, would you be able to commit a43a2b8 in my task-26857 branch? (Just in case you care about history how this page was put together, look at my repro branch; but I'd say that's not really required.)

Yes, it's a huge commit with lots of text. This review is just to make sure we're not publishing nonsense. It's going to be a work-in-progress page for the next weeks, and we're going to ask other folks to review sections they care most about.

And as soon as we're confident that the page is okay, we're going to link it from other places on Tor Metrics. Just not yet.

Thanks!

comment:2 Changed 4 months ago by irl

Reviewer: irl

Will take a look.

comment:3 Changed 4 months ago by irl

Status: needs_reviewneeds_revision

There is inconsistent use of anchors with headings.

  is used where I think &emdash; was intended (in any case,   looks weird).

Inconsistent use of spaces between digits and percent signs.

s/equivalant/equivalent/
s/suceeded/succeeded/
s/section fo the consensus/section of the consensus/

Inconsistent use of inter-quartile vs. interquartile.

skip this line to avoid including statistics in the aggregation that have very likely been reported and processed before

It's not clear to me in the users section, especially in the above quote, how time works. Is the assumption that you are always calculating the latest statistics, or can you use this method to calculate statistics at any point in time?

what transports they use

Can we instead say "pluggable transports" and link to the pluggable transport entry in the glossary? There are other terms that should probably be linked to the glossary too.

comment:4 in reply to:  3 ; Changed 4 months ago by karsten

Replying to irl:

There is inconsistent use of anchors with headings.

Hmm, can you give an example? Not exactly sure what you mean.

  is used where I think &emdash; was intended (in any case,   looks weird).

Changed to &emdash;.

Inconsistent use of spaces between digits and percent signs.

Removed spaces.

s/equivalant/equivalent/
s/suceeded/succeeded/
s/section fo the consensus/section of the consensus/

Fixed.

Inconsistent use of inter-quartile vs. interquartile.

Changed to interquartile.

skip this line to avoid including statistics in the aggregation that have very likely been reported and processed before

It's not clear to me in the users section, especially in the above quote, how time works. Is the assumption that you are always calculating the latest statistics, or can you use this method to calculate statistics at any point in time?

It's independent of when you're calculating statistics. The idea is just to avoid processing statistics that have been written long before the descriptor is published, because the same statistics were likely included in earlier descriptors. Practically, this is relevant, because we're only touching dates in the database covering the past week. It's to some extent an implementation detail, but I fear that people might come up with slightly different results if a relay first includes statistics after a downtime of over 1 week.

Does this make sense? Any idea how to phrase it more clearly?

what transports they use

Can we instead say "pluggable transports" and link to the pluggable transport entry in the glossary? There are other terms that should probably be linked to the glossary too.

Ah, the glossary. Yes, I'll link terms.

Regarding "transports" vs. "pluggable transports", I guess that "transports" includes all "pluggable transports" along with the default OR protocol. I'll take another look while linking terms to the glossary which one we should be using. But even if we're talking about "transports", we could easily link to the "pluggable transports" glossary entry.

Working on the glossary links now and making more changes as soon as I know better what to improve. Thanks!

comment:5 in reply to:  4 Changed 4 months ago by irl

Replying to karsten:

Regarding "transports" vs. "pluggable transports", I guess that "transports" includes all "pluggable transports" along with the default OR protocol. I'll take another look while linking terms to the glossary which one we should be using. But even if we're talking about "transports", we could easily link to the "pluggable transports" glossary entry.

Maybe we should add a "transports" entry to the glossary that is an umbrella term that describes PTs and OR, linking to PT and OR terms.

comment:6 Changed 4 months ago by karsten

Sure, we can add a glossary entry for "transports". Want to quickly write one?

And did you see my two other questions above regarding 1) anchors/headings and 2) my attempted clarification regarding time?

comment:7 Changed 4 months ago by karsten

Status: needs_revisionneeds_review

There, added glossary links and made a few more smaller changes. All in my task-26857 branch. Thanks!

comment:8 Changed 4 months ago by irl

Status: needs_reviewneeds_revision

The anchors/headings, I mean:

<h4>Step 3: Estimate fraction of reported directory-request statistics
<a href="#relay-users-fraction" name="relay-users-fraction" class="anchor">#</a></h4>

While other <h4> headings don't have the anchors.

to avoid including statistics in the aggregation that have very likely been reported and processed before

Does it make sense to say instead "to avoid including statistics in the aggregation that are likely to not be relevant to the time period we are calculating for"?

Other than that it is all looking OK.

comment:9 in reply to:  8 ; Changed 4 months ago by karsten

Status: needs_revisionneeds_information

Replying to irl:

The anchors/headings, I mean:

<h4>Step 3: Estimate fraction of reported directory-request statistics
<a href="#relay-users-fraction" name="relay-users-fraction" class="anchor">#</a></h4>

While other <h4> headings don't have the anchors.

Ah, the reason was that we were only referring to those two <h4> headings, not to any others. But I see the point of trying to be consistent. I removed those anchors and adapted the links in commit 62e60f0.

to avoid including statistics in the aggregation that have very likely been reported and processed before

Does it make sense to say instead "to avoid including statistics in the aggregation that are likely to not be relevant to the time period we are calculating for"?

While that would work, too, I tried a slightly different change (new text in italics): "that have very likely been reported in earlier descriptors and processed before." See commit 22c304d.

Other than that it is all looking OK.

Okay, great, I went ahead and rebased/squashed these commits into master and deployed them. This page is certainly not "done", but I think it's done enough now to be shared publicly.

Another part that remains to be done is to fully integrate this page into the rest of Tor Metrics by adding links. Let's first ask more folks to review what's there, though. Setting to needs_information until then.

Thanks!

comment:10 in reply to:  9 Changed 2 months ago by karsten

Status: needs_informationneeds_review

Replying to karsten:

Another part that remains to be done is to fully integrate this page into the rest of Tor Metrics by adding links. Let's first ask more folks to review what's there, though. Setting to needs_information until then.

We did ask a few folks but did not receive as many reviews as we had hoped. Maybe there will be more folks reviewing if they find the new pages (Statistics and Reproducible Metrics) more easily. Please review commit 4082219 in my task-26857-2 branch. Language tweaks or simplifications appreciated! Thanks!

comment:11 Changed 2 months ago by irl

Status: needs_reviewneeds_revision

I think I would make this change:

-graph data <a href="/stats.html#${id}">format</a>
+CSV data <a href="/stats.html#${id}">format</a>

I think that some may be confused to think this is a link to learn more about interpreting the plot than the CSV file.

Alternatively:

-graph data <a href="/stats.html#${id}">format</a>
+graph source data <a href="/stats.html#${id}">format</a>

After this change this is merge_ready.

comment:12 Changed 2 months ago by karsten

Resolution: implemented
Status: needs_revisionclosed

Great! I went with your first suggestion. Merged and deployed. Closing. Thanks!

Note: See TracTickets for help on using tickets.