Opened 3 weeks ago

Last modified 5 days ago

#28555 new task

Assess methodology for modern privcount Tor user counts

Reported by: arma Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor: Sponsor19

Description

With our traditional user count methods, based on extrapolating from consensus fetch counts -- and assuming each client is online all day -- we see approximately 2 million daily users.

The IMC 2018 paper estimates closer to 8 million daily users:
https://www.ohmygodel.com/publications/tor-usage-imc18.pdf

Understanding how many users we have is a critical building block for Isabela's "user retention" cycle. It also turns out to really impact how other large organizations view us: from Mozilla's perspective, 8 million daily users is wildly more attractive than 2 million daily users, and similarly with larger numbers we're in a position to negotiate funding for a spot in the search box. And third, understanding our user base helps us understand the capacity of the Tor network, by making us better able to predict how Tor would handle an influx of n million new users (from Brave, from Firefox private browsing mode, or from other apps that integrate Tor and then become popular).

We should figure out what we think about this paper's counting methodology, and either (1) identify follow-up research questions that we need to investigate to convince ourselves that this newer number is more right, or (2) decide that the methodology is solid, in which case we should (a) tell the world about it in a blog post or similar, (b) update our various documentation and metrics graphs, and (c) figure out a way to deploy ongoing user count measurements with this new approach.

Child Tickets

Change History (3)

comment:1 Changed 3 weeks ago by irl

Whenever I've been talking about the current users statistics from Metrics, I've been calling them daily concurrent users. We don't seem to mention the word concurrent in the description in the relay-userstats description though.

comment:2 Changed 5 days ago by arma

I've been avoiding the word concurrent, because to me it implies that two users are concurrent if they are both interacting with the Tor network at the same time. Whereas this daily user count is the total number of people who, some time during the day, have had a Tor client online.

comment:3 Changed 5 days ago by teor

I just wrote about this in #tor-project:

The 8 million user figure was measured using PSC, which does unique counts across multiple relays. We won't be implementing unique counts in PrivCount in Tor, because it's an entirely different protocol, which takes ~24 hours to aggregate the counts.

Pages 8-11 of https://export.arxiv.org/pdf/1809.08481 give a few more details about the methods we used to calculate the 8 million figure.

The best way to get a better estimate out of metrics, is to measure how many consensuses a client downloads in 24 hours, and update metrics' estimate. (We can't really do anything to measure clients that move IP addresses, or clients that are only online for part of a day.)

Maybe we could measure the average number of consensuses per day per IP address using PrivCount? But then we'd be keeping sensitive info in RAM (IP addresses).

Note: See TracTickets for help on using tickets.