Opened 6 years ago

Closed 6 years ago

#10712 closed defect (fixed)

Mention the time period more explicitly

Reported by: infinity0 Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Website Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

It would be useful if the metrics website mentioned the aggregation time period more explicitly. The only thing that mentions this on the page itself is an implicit "mean daily users", and dir-spec.txt is not exactly an obvious place to look for new people visiting the metrics page.

Suggested patch is attached - I think it's better to put this on each graph (rather than in a single place on the page) so that they are self-contained and can be included elsewhere (e.g. as on http://crypto.stanford.edu/flashproxy).

Child Tickets

Attachments (1)

time-period.patch (1.8 KB) - added by infinity0 6 years ago.
Add the aggregation time period to each graph

Download all attachments as: .zip

Change History (9)

Changed 6 years ago by infinity0

Attachment: time-period.patch added

Add the aggregation time period to each graph

comment:1 Changed 6 years ago by infinity0

Owner: set to karsten
Status: newassigned

Trac didn't add an owner to the ticket and someone mentioned you as the person responsible for this. :)

comment:2 Changed 6 years ago by karsten

Actually, users "per day" is misleading here. The metrics graphs show the average number of users that are connected at any time on a given day. But it's just coincidence that graphs (and the raw CSV data behind them) use 1 day as smallest unit. The graphs would still look the same if we used 1 week as smallest unit on the x axis, and in that case we wouldn't want to say users "per week" either.

This (probably frequently asked) question is also answered in the Questions and Answers section. Any idea how we can make this more explicit and easier to understand?

comment:3 Changed 6 years ago by infinity0

I'm a bit confused by what you mean by "average" - averaged over what? Once you count those point-events ("directory requests") in any given 24-hour period, what else do you do to them?

comment:4 Changed 6 years ago by karsten

Average as in average users connected to the Tor network at the same time. Concurrent users, averaged over the day.

We divide total directory requests in a UTC day by 10, putting in the assumption that a user that is connected all day needs to refresh their network status 10 times per day.

comment:5 Changed 6 years ago by karsten

Status: assignedneeds_information

Not sure what to do with this ticket. Setting to needs_information and planning to close in a few weeks. Unless I should actually change some text somewhere.

comment:6 Changed 6 years ago by infinity0

If you "divide total directory requests in a UTC day by 10", this is still "users per day" (i.e. you observed x requests, or x/10 users, during the course of that day), so I think my patch should still be applied.

However, I am confused by your earlier comment that "The graphs would still look the same if we used 1 week as smallest unit on the x axis".

  • the default graph uses 1 month as the smallest display unit. You are right that if 1 week were used instead, the graph would still look the same. But that's not what my patch is about.
  • OTOH, if you instead "divide total requests in a UTC week by 10", then the graphs would not look the same, and you would want to apply my patch but say "per week" instead of "per day".

This last point is what my patch attempts to clarify. Without saying "per x", the user has no obvious visual indicator what the aggregation period is.

comment:7 in reply to:  6 ; Changed 6 years ago by infinity0

Replying to infinity0:

  • OTOH, if you instead "divide total requests in a UTC week by 10", then the graphs would not look the same, and you would want to apply my patch but say "per week" instead of "per day".

Ah, I suppose you would divide by 70, if you were counting over the course of 1 week. I understand your point better now. You are basically treating each point-event (a directory request) as an event that lasts for 2.4 hours, then counting "how many concurrent events" exist over the course of a day on average. In this context, there is not really an "aggregation" period, more of an "averaging" period.

I don't think "per day" is misleading though - it lets the user know what the resolution of the data is. You would still get a different graph if you averaged per-week - it would have a similar scale to the per-day graph, but the actual precise output would be different.

Alternatively, you could say "concurrent users" or "est. concurrent users" instead of "users per day".

Some suggestions for the FAQ:

 Q: How do you get from these directory requests to user numbers?
-A: [etc] the average client makes 10 such requests per day. [etc]
+A: [etc] the average client makes 10 such requests per day. Another way of looking at it, is that we assume that each request represents a client that stays online for 2h24m.

 Q: So, are these distinct users per day, average number of users connected over the day, or what?
-A: Average number of users connected over the day. [etc]
+A: Average number of concurrent users, estimated from data collected over a day. [etc]

comment:8 in reply to:  7 Changed 6 years ago by karsten

Resolution: fixed
Status: needs_informationclosed

Replying to infinity0:

Replying to infinity0:

  • OTOH, if you instead "divide total requests in a UTC week by 10", then the graphs would not look the same, and you would want to apply my patch but say "per week" instead of "per day".

Ah, I suppose you would divide by 70, if you were counting over the course of 1 week. I understand your point better now. You are basically treating each point-event (a directory request) as an event that lasts for 2.4 hours, then counting "how many concurrent events" exist over the course of a day on average. In this context, there is not really an "aggregation" period, more of an "averaging" period.

Yes, this describes the approach pretty well.

I don't think "per day" is misleading though - it lets the user know what the resolution of the data is. You would still get a different graph if you averaged per-week - it would have a similar scale to the per-day graph, but the actual precise output would be different.

Yes.

Alternatively, you could say "concurrent users" or "est. concurrent users" instead of "users per day".

I agree.

Some suggestions for the FAQ:

Thanks, those are good suggestions. Applied!

Closing as fixed. Thanks!

Note: See TracTickets for help on using tickets.