Could this be due to the HTTPS-Everywhere update in 5.0.4? I wonder if something about how it is being unpacked is causing the force-update in the incrementals to fail..
(The grep -v for the 4.0.8 update is due to another issue.. We have a lot of users also trying to download 4.0.8 mars for some reason).
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Unfortunately it is very difficult to tell why the incremental updates might be failing. Is this happening across all platforms and languages? Kathy and I will try some more 4.0.8 -> 4.5.1 updates (with updates to HTTPS-Everywhere, etc.) to see if we can reproduce this problem.
At a glance, it does not appear to be specific to any one locale. I see at least 6 locales in this list.
I think the fact that this ratio is getting worse points strongly to a concurrent update. Maybe test upgrading vs not upgrading HTTPS-Everywhere as your first experiment?
We are still trying to reproduce the problem (trying on Windows 7 at the moment). It does not seem to matter whether we update H-E first, or after the browser update, or concurrently.
Are the failures with 4.0.8 -> 4.5.1 updates?
Are many of the 4.5 -> 4.5.1 or 4.5 -> 5.0a1 incremental updates failing?
Do the Apache logs tell us whether the entire incremental mar file was downloaded?
If I have access to the Apache logs I can check the above myself.
Since the 4.0.8 -> 4.5.1 incremental MAR effectively does rm -rf .../extensions/https-everywhere@eff.org and then adds the new files, Kathy and I do not think the root cause of this ticket is the H-E update.
Kathy and I have come up with a few scenarios that may be causing the TB updater to fallback to a full MAR:
The user makes changes to torrc-defaults (the 4.0.8 -> 4.5.1 incremental MAR tries to patch that file). There are a bunch of files that the incremental MAR tries to patch, any of which would cause this same problem; torrc-defaults just seems more likely to be modified by users than the others.
A network failure of the wrong kind occurs during download of the incremental MAR. In our testing on Windows, New Identity triggered a complete MAR download.
The user exits the browser (or a crash occurs) during the download. When this happens, the update service is supposed to resume the incremental MAR download when the browser is restarted, but we have seen it treat this as a network error in some cases.
We should be able to distinguish 1. above from 2. and 3. because if a failure occurs while trying to apply the incremental MAR it should have been completely downloaded (and only partially downloaded in the other two situations). So maybe check the size field within the Apache logs if we have that info.
Wrt the size question, if I ignore 206 requests, I am not seeing any evidence of partial downloads of incrementals. This probably rules out 2 and 3, unless there is an issue in the 206 behavior that I can't see (we scrub IP addresses in logs, so even if concurrent exit IP usage wasn't common I still couldn't total 206 values across requests). Oddly, there are a few 416 requests (partially satisfied range requests), but less than a hundred per day.
There does appear to be at least one crawler involved. It is setting the referer header to the containing directory. It is also performing less than a hundred requests per day.
One more datapoint: The total counts of full update downloads exceeded the incremental download counts on May 17th, and there have been more full update downloads than incremental downloads every since. I am now wondering if we may actually be seeing users failing the full update and retrying repeatedly, perhaps due to #15857 (moved) or some other issue? Here's the counts from today on one mirror:
Unless we have other ideas, my next plan is to create some munin scripts to monitor this for future releases, so we can get an idea what happens when we don't update the torrc or startup scripts, and don't have #15857 (moved) in the mix.
One more datapoint: The total counts of full update downloads exceeded the incremental download counts on May 17th, and there have been more full update downloads than incremental downloads every since. I am now wondering if we may actually be seeing users failing the full update and retrying repeatedly, perhaps due to #15857 (moved) or some other issue?
#15857 (moved) is possible although one would think that if #15857 (moved) was a common problem Mozilla would have noticed it and fixed it by now (but a TB installation does have a lot more files / more hierarchy than Firefox).
Unless we have other ideas, my next plan is to create some munin scripts to monitor this for future releases, so we can get an idea what happens when we don't update the torrc or startup scripts, and don't have #15857 (moved) in the mix.
That sounds like a good idea. Another thing we could do is to build a prompt and upload mechanism into TB to allow users to send us the updater log after failed incremental updates (but maybe most of our users would just click "No Thanks").