Opened 14 months ago

Last modified 3 months ago

#22346 new defect

Investigate drop in Tor Browser update pings in early 2017 and 2018

Reported by: cypherpunks Owned by: metrics-team
Priority: Medium Milestone:
Component: Metrics/Statistics Version:
Severity: Normal Keywords:
Cc: gk, boklm, brade, mcs, arthuredelstein Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

https://mastodon.potager.org/@lunar/13977
https://metrics.torproject.org/webstats-tb.html

Did you fix this retroactively and is the _currently_ still shown drop (2017-01-24 - 2017-04-04) real or caused by the update URL change?

Child Tickets

Change History (18)

comment:1 Changed 14 months ago by boklm

In version 6.5 which was published on January 24, with ticket #19481 we changed the update URL from https://www.torproject.org/dist/torbrowser/update_2/ to https://aus1.torproject.org/dist/torbrowser/update_2/. The www.torproject.org/dist/ URL was redirected to dist.torproject.org which then returned the result, whereas with the aus1.torpoject.org URL the result is returned directly without a redirect. A possible explanation for the drop from 2017-01-24 is that the update ping were counted twice before the URL change because of the redirect.

In version 6.5.2 we changed the update_2 part in the URL to update_3, with ticket #19316. Initially metrics didn't count the update_3 requests as update pings, so this caused a drop in the update pings graph, but this has now been fixed. What we can see now is an increase in the update pings around the 4th and 5th of April, but it does not seem related to the URL change as there was no release around that time. I don't know the reason for this increase in update pings.

comment:2 Changed 14 months ago by karsten

Fine question. Looks like we're missing some other resource string used for update pings, but I don't know which. Here's the requests we're including, by month, site, and resource_part up to update_[23]/`:

webstats=> SELECT '2017-0' || date_part('month', log_date) AS month, site,
webstats->     substr(resource_string, 1,
webstats(>       strpos(resource_string, 'update_') + 8) AS resource_part,
webstats->     SUM(count) AS count
webstats->   FROM files NATURAL JOIN requests NATURAL JOIN resources
webstats->   WHERE resource_string LIKE '%/torbrowser/update\__/%'
webstats->   AND resource_string NOT LIKE '%.xml'
webstats->   AND response_code = 200
webstats->   AND method = 'GET'
webstats->   AND log_date >= '2017-01-01'
webstats->   GROUP BY month, site, resource_part
webstats->   ORDER BY month, count DESC;
  month  |          site          |                  resource_part                   |  count   
---------+------------------------+--------------------------------------------------+----------
 2017-01 | dist.torproject.org    | /torbrowser/update_2/                            | 48888500
 2017-01 | aus1.torproject.org    | /torbrowser/update_2/                            |  3576134
 2017-01 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/        |      119
 2017-01 | dist.torproject.org    | https://dist.torproject.org/torbrowser/update_2/ |        2
 2017-02 | aus1.torproject.org    | /torbrowser/update_2/                            | 17695061
 2017-02 | dist.torproject.org    | /torbrowser/update_2/                            |  2827113
 2017-02 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/        |      536
 2017-03 | aus1.torproject.org    | /torbrowser/update_2/                            | 19250809
 2017-03 | dist.torproject.org    | /torbrowser/update_2/                            |  1977765
 2017-03 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/        |      616
 2017-04 | aus1.torproject.org    | /torbrowser/update_2/                            | 31079925
 2017-04 | aus1.torproject.org    | /torbrowser/update_3/                            | 16694038
 2017-04 | dist.torproject.org    | /torbrowser/update_2/                            |  1469608
 2017-04 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/        |      386
 2017-05 | aus1.torproject.org    | /torbrowser/update_3/                            | 39459138
 2017-05 | aus1.torproject.org    | /torbrowser/update_2/                            |   991946
 2017-05 | dist.torproject.org    | /torbrowser/update_2/                            |   982639
 2017-05 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/        |      529
(18 rows)

boklm, any ideas which other resource string we should be including?

comment:3 Changed 14 months ago by gk

Cc: gk added

comment:4 Changed 14 months ago by karsten

Cc: boklm added

boklm, gk, any ideas what we're missing?

comment:5 in reply to:  4 ; Changed 14 months ago by gk

Replying to karsten:

boklm, gk, any ideas what we're missing?

Not yet. How are we dealing with redirects we have/had in place? Do/did we double-count requests that get/got redirected?

comment:6 in reply to:  4 ; Changed 14 months ago by boklm

Replying to karsten:

boklm, gk, any ideas what we're missing?

I don't see what is missing, or if something is missing.

Would it be possible to run the same request, for the days January 24, 25, 26 (when the update pings dropped), and April 4, 5, 6 (when they increased), to try to understand what changed? Maybe seeing which type of URL dropped or increased on those days can tell us if we are missing something.

comment:7 in reply to:  5 Changed 14 months ago by karsten

Replying to gk:

Not yet. How are we dealing with redirects we have/had in place? Do/did we double-count requests that get/got redirected?

We're disregarding redirects (code 302) and only counting succeeded requests (code 200). Should we do this differently?

comment:8 in reply to:  6 ; Changed 14 months ago by karsten

Replying to boklm:

Replying to karsten:

boklm, gk, any ideas what we're missing?

I don't see what is missing, or if something is missing.

Would it be possible to run the same request, for the days January 24, 25, 26 (when the update pings dropped), and April 4, 5, 6 (when they increased), to try to understand what changed? Maybe seeing which type of URL dropped or increased on those days can tell us if we are missing something.

Sure, here's the output:

webstats=> SELECT log_date, site,
webstats->     substr(resource_string, 1,
webstats(>       strpos(resource_string, 'update_') + 8) AS resource_part,
webstats->     SUM(count) AS count
webstats->   FROM files NATURAL JOIN requests NATURAL JOIN resources
webstats->   WHERE resource_string LIKE '%/torbrowser/update\__/%'
webstats->   AND resource_string NOT LIKE '%.xml'
webstats->   AND response_code = 200
webstats->   AND method = 'GET'
webstats->   AND (log_date = '2017-01-24'
webstats(>     OR log_date = '2017-01-25'
webstats(>     OR log_date = '2017-01-26'
webstats(>     OR log_date = '2017-04-04'
webstats(>     OR log_date = '2017-04-05'
webstats(>     OR log_date = '2017-04-06')
webstats->   GROUP BY log_date, site, resource_part
webstats->   ORDER BY log_date, count DESC;
  log_date  |          site          |               resource_part               |  count  
------------+------------------------+-------------------------------------------+---------
 2017-01-24 | dist.torproject.org    | /torbrowser/update_2/                     | 2025386
 2017-01-24 | aus1.torproject.org    | /torbrowser/update_2/                     |   33549
 2017-01-24 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/ |       1
 2017-01-25 | dist.torproject.org    | /torbrowser/update_2/                     |  692113
 2017-01-25 | aus1.torproject.org    | /torbrowser/update_2/                     |  151832
 2017-01-26 | aus1.torproject.org    | /torbrowser/update_2/                     |  381621
 2017-01-26 | dist.torproject.org    | /torbrowser/update_2/                     |  362971
 2017-01-26 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/ |       2
 2017-04-04 | aus1.torproject.org    | /torbrowser/update_2/                     |  655434
 2017-04-04 | dist.torproject.org    | /torbrowser/update_2/                     |   50278
 2017-04-04 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/ |       8
 2017-04-05 | aus1.torproject.org    | /torbrowser/update_2/                     | 1488508
 2017-04-05 | dist.torproject.org    | /torbrowser/update_2/                     |   51111
 2017-04-05 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/ |      23
 2017-04-06 | aus1.torproject.org    | /torbrowser/update_2/                     | 1847522
 2017-04-06 | dist.torproject.org    | /torbrowser/update_2/                     |   50576
 2017-04-06 | archive.torproject.org | /tor-package-archive/torbrowser/update_2/ |      11
(17 rows)

Would you want to play with the database yourself? It's ~3G uncompressed, so it shouldn't be that hard to dump and compress it. You'd have to create a local PostgreSQL database and import that file, and then you could run requests like this yourself. (I'd still be around to help with the schema as needed!)

comment:9 Changed 10 months ago by karsten

Component: Metrics/WebsiteMetrics/Statistics

Moving all tickets to Metrics/Statistics that are more related to the data-aggregating modules rather than the website parts of metric-web.

comment:10 Changed 10 months ago by karsten

Summary: tor browser update URL change and the update ping metricsInvestigate drop in Tor Browser update pings in early 2017, possibly caused by update URL change

Tweak summary.

comment:11 Changed 5 months ago by gk

Summary: Investigate drop in Tor Browser update pings in early 2017, possibly caused by update URL changeInvestigate drop in Tor Browser update pings in early 2017 and 2018

Interestingly it seems this is happening again with a X.5 release. It seems we need a better theory assuming both incidents can be explained by the same underlying cause.

comment:12 Changed 5 months ago by boklm

The drop from 2018-01-24 seems to be related to the release of Tor Browser 7.5. However I can't find any change between 7.0.11 and 7.5 that could explain that. The app.update.* prefs seems to be the same in both versions.

comment:13 in reply to:  8 ; Changed 5 months ago by boklm

Replying to karsten:

Would you want to play with the database yourself? It's ~3G uncompressed, so it shouldn't be that hard to dump and compress it. You'd have to create a local PostgreSQL database and import that file, and then you could run requests like this yourself. (I'd still be around to help with the schema as needed!)

Yes, if you can send me a dump of this database, I will look more closely at the numbers from the drop around 2018-01-24 to try to understand it.

comment:14 Changed 5 months ago by mcs

Cc: brade mcs added

comment:15 in reply to:  13 Changed 5 months ago by karsten

Replying to boklm:

Yes, if you can send me a dump of this database, I will look more closely at the numbers from the drop around 2018-01-24 to try to understand it.

Great! I just created a database dump and sent you the link via private mail.

comment:16 Changed 3 months ago by boklm

On April 6, 2018, we had again a big increase in the number of pings:
https://metrics.torproject.org/webstats-tb.html?start=2018-04-01&end=2018-04-17

From March 26 to April 10, we also had an increase in downloads and signature downloads:
https://metrics.torproject.org/webstats-tb.html?start=2018-04-01&end=2018-04-17

comment:17 in reply to:  16 Changed 3 months ago by boklm

Replying to boklm:

On April 6, 2018, we had again a big increase in the number of pings:
https://metrics.torproject.org/webstats-tb.html?start=2018-04-01&end=2018-04-17

The last release was on March 26, so this does not seem to be related to a new release.

From March 26 to April 10, we also had an increase in downloads and signature downloads:
https://metrics.torproject.org/webstats-tb.html?start=2018-04-01&end=2018-04-17

This one seems related to the new release. However it is surprising to see the number increasing in 2 days, staying stable for around 12 days, then decreasing back to the previous level in 4 days. It is also the first time we see a big increase in signature downloads. It seems signatures were downloaded around 1.2M times in 12 days.

comment:18 Changed 3 months ago by arthuredelstein

Cc: arthuredelstein added
Note: See TracTickets for help on using tickets.