Opened 3 years ago

Closed 2 years ago

#18768 closed task (fixed)

Calculate the fraction of dist.torproject.org traffic for Tor Browser downloads and updates

Reported by: karsten Owned by:
Priority: Medium Milestone:
Component: Metrics/Analysis Version:
Severity: Normal Keywords:
Cc: arma, gk, boklm, brade, mcs, mrphs Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We'd like to know what fraction of dist.torproject.org traffic is caused by updates, because it would be easy to move that traffic elsewhere.

Here's one way to find out: We go through sanitized web server logs from dist.torproject.org written by aroides in March 2016. This time frame covers a large and a small (in terms of incremental update size) Tor Browser release on March 8 and 18 respectively. We sum up response bytes by file extension, where .exe, .tar.xz, and .dmg are counted as Tor Browser downloads, .mar as Tor Browser updates, and other extensions as being unrelated to Tor Browser.

$ cat dist.torproject.org-access.log-201603?? | cut -d" " -f7,10 | grep " [0-9]*$" > dist.torproject.org-access.log-201603-part
$ cat dist.torproject.org-access.log-201603-part | grep "\.exe " | cut -d" " -f2 | paste -sd+ - | bc
62650080788327
$ cat dist.torproject.org-access.log-201603-part | grep "\.tar\.xz " | cut -d" " -f2 | paste -sd+ - | bc
12497704216145
$ cat dist.torproject.org-access.log-201603-part | grep "\.dmg " | cut -d" " -f2 | paste -sd+ - | bc
8352205765328
$ cat dist.torproject.org-access.log-201603-part | grep "\.mar " | cut -d" " -f2 | paste -sd+ - | bc
29958084372393
$ cat dist.torproject.org-access.log-201603-part | cut -d" " -f2 | paste -sd+ - | bc
113689403444481

Results:

Extension Bytes Fraction
.exe 62650080788327 55%
.tar.xz 12497704216145 11%
.dmg 8352205765328 7%
.mar 29958084372393 26%
other 231328302288 0%
total 113689403444481 100%

In words, 73% of dist.torproject.org traffic is caused by downloads, 26% by updates.

Thoughts?

Child Tickets

Change History (11)

comment:1 Changed 3 years ago by boklm

Cc: gk boklm added; GeKo removed

comment:2 Changed 3 years ago by gk

Could we get the amount of requests causing these downloads? This might help us to interpret the data better.

comment:3 Changed 3 years ago by karsten

Sure, we can include number of requests. But does a single Tor Browser instance only make exactly 1 request to update to a new version, or does it make several of those? That might make it difficult to compare the numbers.

$ cat dist.torproject.org-access.log-201603-part | grep -c "\.exe "
2360824
$ cat dist.torproject.org-access.log-201603-part | grep -c "\.tar\.xz "
227004
$ cat dist.torproject.org-access.log-201603-part | grep -c "\.dmg "
156397
$ cat dist.torproject.org-access.log-201603-part | grep -c "\.mar "
43530110
$ cat dist.torproject.org-access.log-201603-part | wc -l
 74797001
Extension Number of Bytes Fraction of Bytes Number of Requests Fraction of Requests
.exe 62650080788327 55% 2360824 3%
.tar.xz 12497704216145 11% 227004 0%
.dmg 8352205765328 7% 156397 0%
.mar 29958084372393 26% 43530110 58%
other 231328302288 0% 28522666 38%
total 113689403444481 100% 74797001 100%

comment:4 Changed 3 years ago by mcs

Cc: brade mcs added

comment:5 in reply to:  3 ; Changed 3 years ago by gk

Replying to karsten:

Sure, we can include number of requests. But does a single Tor Browser instance only make exactly 1 request to update to a new version, or does it make several of those? That might make it difficult to compare the numbers.

It checks periodically whether there is a new update available. But once it determines there is indeed one out there then it should fetch just one .mar file in the vast majority of cases. That's why I was explicitly asking for the requests responsible for the downloads, e.g. those requesting the .mar files.

Do we understand the nature of the other requests? 38% sounds quite like a non-negligible fraction. Could they come form things like https://www.torproject.org/dist/torbrowser/update_2/hardened/Linux_x86_64-gcc3/6.0a4-hardened/ALL which are the update checks I talked about above (and which should get an update.xml file back as an answer)?

comment:6 in reply to:  5 ; Changed 3 years ago by cypherpunks

Replying to gk:

Do we understand the nature of the other requests? 38% sounds quite like a non-negligible fraction. Could they come form things like https://www.torproject.org/dist/torbrowser/update_2/hardened/Linux_x86_64-gcc3/6.0a4-hardened/ALL which are the update checks I talked about above (and which should get an update.xml file back as an answer)?

38 is still less than 58.

Some time ago I looked at this part of the code and found that update checks happen:

  • at startup;
  • if online, every app.update.interval seconds (12 hours);
  • on offline->online transitions;
  • when opening Help>About Tor Browser;
  • when opening chrome://mozapps/content/update/updates.xul

If this is correct, I would expect the update checks to far far outnumber the update downloads.

comment:7 in reply to:  6 ; Changed 3 years ago by gk

Replying to cypherpunks:

Replying to gk:

Do we understand the nature of the other requests? 38% sounds quite like a non-negligible fraction. Could they come form things like https://www.torproject.org/dist/torbrowser/update_2/hardened/Linux_x86_64-gcc3/6.0a4-hardened/ALL which are the update checks I talked about above (and which should get an update.xml file back as an answer)?

38 is still less than 58.

Some time ago I looked at this part of the code and found that update checks happen:

  • at startup;
  • if online, every app.update.interval seconds (12 hours);
  • on offline->online transitions;
  • when opening Help>About Tor Browser;
  • when opening chrome://mozapps/content/update/updates.xul

If this is correct, I would expect the update checks to far far outnumber the update downloads.

I think you are right here. One admittedly far-fetched scenario I had in mind while writing my last comment was our incremental .mar file update failing badly which could lead up to 2 .mar file requests (one for the incremental one and one for the full one) per one update ping. But that would imply a weird usage pattern of almost all of our users. So, yes, these requests are something else.

comment:8 in reply to:  7 Changed 3 years ago by cypherpunks

Replying to gk:

One admittedly far-fetched scenario I had in mind while writing my last comment was our incremental .mar file update failing badly which could lead up to 2 .mar file requests (one for the incremental one and one for the full one)

Actually I remember seeing that the updater can also resume interrupted background downloads (one kind of interruption could be terminating the browser, for example). So one could imagine increasing that figure some.

per one update ping. But that would imply a weird usage pattern of almost all of our users.

Yes, it would imply that the non-incremental patch is also failing, every time... It has to be something else.

I had thought that maybe there's something else out there, other than Tor Browser's updater, pulling a bunch of .mar files. I thought maybe if the server hosts a large number of those and a few mirrors are very eagerly syncing (over http) that could pad the ".mar requests" figure.

In fact, if we look at this: 29958084372393 / 43530110 ≅ 688215.22
That's ".mar bytes" divided by ".mar requests", which results in about 670 KiB. That's significantly less than what I would have expected for an average update. (Notice that similar calculations for the other items all result in reasonable values.)

Last edited 3 years ago by cypherpunks (previous) (diff)

comment:9 Changed 3 years ago by mrphs

Cc: mrphs added

comment:10 Changed 3 years ago by karsten

Here's a new row for /update_2/ which contains almost all of those 38% of requests that were previously contained in the "other" row and an additional column with the mean number of bytes per request:

Pattern Number of Bytes Fraction of Bytes Number of Requests Fraction of Requests Mean Number of Bytes per Request
\.exe$ 62650080788327 55% 2360824 3% 25 MiB
\.tar\.xz$ 12497704216145 11% 227004 0% 53 MiB
\.dmg$ 8352205765328 7% 156397 0% 51 MiB
\.mar$ 29958084372393 26% 43530110 58% 672 KiB
/update_2/ 2930515856 0% 28256673 38% 103 B
other 228397786432 0% 265993 0% 839 KiB
total 113689403444481 100% 74797001 100% 1 MiB

comment:11 Changed 2 years ago by karsten

Resolution: fixed
Status: newclosed

I believe this ticket is obsolete with the application statistics added to Tor Metrics earlier this year. Closing.

Note: See TracTickets for help on using tickets.