Opened 5 years ago

Closed 6 months ago

#10675 closed task (wontfix)

Let's graph an estimate of the number of Tor Browser users

Reported by: lunar Owned by:
Priority: Medium Milestone:
Component: Archived/general Version:
Severity: Normal Keywords: archived-closed-2018-07-04
Cc: karsten, mikeperry, gk, mcs, arthuredelstein Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We already have graph to estimate the number of Tor clients and that's great. But with the botnet it became quite hard during summer 2013 to understand if the amount of Tor Browser users was stable or increasing.

The Tor Browser regularly hits on https://check.torproject.org/RecommendedTBBVersions to see if a new version is available. Knowing how many times this file is hit could help us figuring out trends in Tor Browser usage. It would only give us trends, and not an estimate of the number of Tor Browser users though, given the complexity of browser usage patterns.

I think the day and the user agent are enough information. With Apache, that can
be kept by adding to the main configuration:

LogFormat "%{%Y-%m-%d}t %{User-Agent}i" tbbversions

And then to the check.tpo VirtualHost:

        SetEnvIf Request_URI ^/RecommendedTBBVersions$ tbbversions
        CustomLog ${APACHE_LOG_DIR}/tbbversions.log tbbversions env=tbbversions

Child Tickets

Attachments (2)

tbbhits.txt (3.3 KB) - added by phobos 5 years ago.
hits to tbb version file by day
tb-stats.png (47.1 KB) - added by lunar 3 years ago.
Hits on RecommendedTBBVersions each day between 2015-09-21 and 2016-03-21

Download all attachments as: .zip

Change History (25)

comment:1 Changed 5 years ago by karsten

Cc: karsten added

comment:2 Changed 5 years ago by phobos

We already have this data in the check webserver logs.

comment:3 Changed 5 years ago by phobos

grep -c RecommendedTBBVersions check.torproject.org-access.log
283649

today alone

comment:4 Changed 5 years ago by lunar

What's the easiest way to get the data in sanitized form so that it could be graphed somewhere? (metrics maybe)

Changed 5 years ago by phobos

Attachment: tbbhits.txt added

hits to tbb version file by day

comment:5 Changed 5 years ago by phobos

attached hits by day

comment:6 Changed 5 years ago by lunar

It's great to have a month of data already but what I would really like to see is a regularly updated graph so we can get an idea of the trends over the months. I am not sure how the data could be exported to a system that would create such graph.

comment:7 Changed 5 years ago by karsten

On reflection, I'd rather not want this graph to be added to the metrics website.

The reason is that this takes way more than a few lines of code to draw the graph, and I'm worried about the maintenance effort in the long run. To give you an idea, here's what it would take to put this graph on the metrics website:

  1. Define a data format of the new log files and put it on https://metrics.torproject.org/formats.html.
  2. Extend metrics-db to collect the new logs and make them available via rsync and/or in monthly tarballs.
  3. Extend metrics-lib to parse the new file format.
  4. Extend metrics-web to process the new data and produce a new CSV file similar to the ones on https://metrics.torproject.org/stats.html, and specify the new CSV file format on that page.
  5. Extend metrics-web to draw a graph and put it on the website.

I understand that most people only care about step 5 here. But the real purpose of metrics is to collect and archive interesting data about the Tor network and make them available to researchers and interested people. The graphs are just one output of metrics, and certainly the most visible, but not the most important.

Note that I kicked out GetTor package downloads a while ago for the same reasons why I don't want to add RecommendedTBBVersions hits now. Somebody has to maintain all this code.

Can these new graphs live on check.tpo maybe?

comment:8 Changed 5 years ago by lunar

I can script a graph using rrdtool on check, lemmonii or somewhere else. Maybe this can also be done using AWStats or Visitors directly on the logs. I am missing knowledge about the infrastructure and policies to do more informed suggestions.

comment:9 Changed 5 years ago by lunar

Summary: Please create a very restricted logs of hits on the RecommendedTBBVersions fileLet's make a graph of hits on the RecommendedTBBVersions file

comment:10 Changed 5 years ago by cypherpunks

Component: Tor Sysadmin Teamgeneral

comment:11 Changed 5 years ago by arma

Would we get most of the way there if somebody, e.g. lunar, scripted it and ran it out of a cron on his people.torproject.org account?

Then it isn't an official thing that we promise to support, but it could catch the eye of more people and gain momentum.

comment:12 Changed 5 years ago by lunar

I'm not able to do anything without access to data.

comment:13 Changed 4 years ago by lunar

Cc: mikeperry gk added
Summary: Let's make a graph of hits on the RecommendedTBBVersions fileLet's graph an estimate of the number of Tor Browser users

Now that the Tor Browser look if there is an upgrade available, we could use the hits on update feeds to compute an estimate of the number of users.

comment:14 Changed 4 years ago by mcs

Cc: mcs added

comment:15 Changed 3 years ago by lunar

Severity: Normal

Anonymized website statistics are now available at https://webstats.torproject.org/
Tor Browser now queries https://www.torproject.org/projects/torbrowser/RecommendedTBBVersions
So guess we could now make those graphs!

Changed 3 years ago by lunar

Attachment: tb-stats.png added

Hits on RecommendedTBBVersions each day between 2015-09-21 and 2016-03-21

comment:16 Changed 3 years ago by lunar

I extracted hits on RecommendedTBBVersions using the following small script:

find . -name 'www.torproject.org-access.log-*.xz' -printf '%f\n' | sort -u | while read logfile; do
	echo -n "$logfile" | sed 's/.*-\([0-9]*\)\..*/\1 /'
	find . -name "$logfile" | xargs xzgrep '"GET /projects/torbrowser/RecommendedTBBVersions HTTP/1.1" 200' | wc -l
done

A quick plot later using LibreOffice:

Hits on RecommendedTBBVersions each day between 2015-09-21 and 2016-03-21

comment:17 Changed 3 years ago by arma

Neat!

How many times per day does the average Tor Browser hit this url?

comment:18 in reply to:  17 Changed 3 years ago by gk

Replying to arma:

Neat!

How many times per day does the average Tor Browser hit this url?

The check is made when the user is opening (a new tab or a new window) AND more than two hours passed since the last check got done. So Tor Browser usage hours / 2 might be something to start with assuming that users are opening tabs and doing NEWNYM things.

comment:19 Changed 3 years ago by arma

To paraphrase Georg's answer, if we describe an "active session" as "up to 2 hours of browsing with tor browser", then we can say that there's one active session per hit on this url.

So we are looking at approximately 1M active sessions per day.

(And of course, maybe some users produce more than one session in a day, but we can't know how many, because anonymity. But it is plausible to imagine that at that volume of users, most of them don't do more than 1 session per day. Also, 1 user probably does not produce a session *every* day, so we also can't make a guess here about how many total deployed tor browsers there are.)

comment:20 Changed 3 years ago by arma

Do we know if Orbot et al hit this page too? Or if Whonix does? Or any other apps?

comment:21 Changed 3 years ago by arma

Is it also the case that Lunar's graph represents an *upper bound* on the number of users from https://metrics.torproject.org/userstats-relay-country.html who are using (a recent) Tor Browser?

comment:22 Changed 2 years ago by arthuredelstein

Cc: arthuredelstein added

comment:23 Changed 6 months ago by teor

Keywords: archived-closed-2018-07-04 added
Resolution: wontfix
Status: newclosed

Close all tickets in archived components

Note: See TracTickets for help on using tickets.