We already have graph to estimate the number of Tor clients and that's great. But with the botnet it became quite hard during summer 2013 to understand if the amount of Tor Browser users was stable or increasing.
The Tor Browser regularly hits on https://check.torproject.org/RecommendedTBBVersions to see if a new version is available. Knowing how many times this file is hit could help us figuring out trends in Tor Browser usage. It would only give us trends, and not an estimate of the number of Tor Browser users though, given the complexity of browser usage patterns.
I think the day and the user agent are enough information. With Apache, that can
be kept by adding to the main configuration:
It's great to have a month of data already but what I would really like to see is a regularly updated graph so we can get an idea of the trends over the months. I am not sure how the data could be exported to a system that would create such graph.
On reflection, I'd rather not want this graph to be added to the metrics website.
The reason is that this takes way more than a few lines of code to draw the graph, and I'm worried about the maintenance effort in the long run. To give you an idea, here's what it would take to put this graph on the metrics website:
Extend metrics-db to collect the new logs and make them available via rsync and/or in monthly tarballs.
Extend metrics-lib to parse the new file format.
Extend metrics-web to process the new data and produce a new CSV file similar to the ones on https://metrics.torproject.org/stats.html, and specify the new CSV file format on that page.
Extend metrics-web to draw a graph and put it on the website.
I understand that most people only care about step 5 here. But the real purpose of metrics is to collect and archive interesting data about the Tor network and make them available to researchers and interested people. The graphs are just one output of metrics, and certainly the most visible, but not the most important.
Note that I kicked out GetTor package downloads a while ago for the same reasons why I don't want to add RecommendedTBBVersions hits now. Somebody has to maintain all this code.
I can script a graph using rrdtool on check, lemmonii or somewhere else. Maybe this can also be done using AWStats or Visitors directly on the logs. I am missing knowledge about the infrastructure and policies to do more informed suggestions.
Trac: Summary: Please create a very restricted logs of hits on the RecommendedTBBVersions file to Let's make a graph of hits on the RecommendedTBBVersions file
Now that the Tor Browser look if there is an upgrade available, we could use the hits on update feeds to compute an estimate of the number of users.
Trac: Cc: karsten to karsten, mikeperry, gk Summary: Let's make a graph of hits on the RecommendedTBBVersions file to Let's graph an estimate of the number of Tor Browser users
How many times per day does the average Tor Browser hit this url?
The check is made when the user is opening (a new tab or a new window) AND more than two hours passed since the last check got done. So Tor Browser usage hours / 2 might be something to start with assuming that users are opening tabs and doing NEWNYM things.
To paraphrase Georg's answer, if we describe an "active session" as "up to 2 hours of browsing with tor browser", then we can say that there's one active session per hit on this url.
So we are looking at approximately 1M active sessions per day.
(And of course, maybe some users produce more than one session in a day, but we can't know how many, because anonymity. But it is plausible to imagine that at that volume of users, most of them don't do more than 1 session per day. Also, 1 user probably does not produce a session every day, so we also can't make a guess here about how many total deployed tor browsers there are.)