Opened 9 years ago

Closed 7 years ago

#3822 closed enhancement (fixed)

Should we not publish data points when the ratio of stat-reporting relays is too low?

Reported by: arma Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/Website Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In #3338 we discovered that some of the big spikes on the usage graphs are due to dips in the fraction of directory mirrors reporting stats.

Perhaps we should go through and take out the data points for the most extreme dips? Right now we show usage spikes that are false positives, which distract from the usage spikes that actually reflect an increase in users.

If we should, what fraction of reporting nodes should be the cutoff?

Child Tickets

Attachments (1)

daily-users-2009-2011.pdf (49.7 KB) - added by karsten 9 years ago.
Estimate users from 2009 until today

Download all attachments as: .zip

Change History (3)

Changed 9 years ago by karsten

Attachment: daily-users-2009-2011.pdf added

Estimate users from 2009 until today

comment:1 in reply to:  description Changed 9 years ago by karsten

Replying to arma:

Perhaps we should go through and take out the data points for the most extreme dips? Right now we show usage spikes that are false positives, which distract from the usage spikes that actually reflect an increase in users.

Cutting off values based on the fraction is going to be difficult. The analysis in #3338 only looked at 2011, and from those data it looks like excluding everything with a fraction of 10% or less leads to "better" results. But before 2011, we had even fewer directories reporting statistics, and we had fine results, too. See the attached graph.

I wonder if there are better ways to smoothen the graph. Like, for every day X, take the median of X-1, X, and X+1 and plot that.

comment:2 Changed 7 years ago by karsten

Resolution: fixed
Status: newclosed

I think this ticket is obsolete, because we changed the estimation method on https://metrics.torproject.org/users.html.

The new data comes with a frac column that says what fraction of stats the estimation is based on. We could easily require a certain threshold there. Example with fraction of 29%:

date,node,country,transport,version,frac,users
2013-12-01,relay,,,,29,3264691

Closing as fixed, because I don't think we have this problem with the new graphs.

Note: See TracTickets for help on using tickets.