Opened 2 years ago

Closed 2 years ago

#21886 closed defect (fixed)

Downloading of binary files stalls at 0/0 bytes in Tor Browser based on ESR52 with e10s off

Reported by: gk Owned by: gk
Priority: High Milestone:
Component: Applications/Tor Browser Version:
Severity: Normal Keywords: tbb-usability, ff52-esr, TorBrowserTeam201705, GeorgKoppen201705, tbb-7.0-must
Cc: mcs, brade, arthuredelstein Actual Points:
Parent ID: Points:
Reviewer: Sponsor: Sponsor4

Description

A user reported in #21340that downloading of zip file stalls reliably at 0 bytes with

[04-07 10:07:18] Torbutton INFO: External app requested
http channel Listener OnDataAvailable contract violation

Stalling happens as well for other binaries that need the external helper dialog, e.g. https://dist.torproject.org/torbrowser/6.5.1/TorBrowser-6.5.1-osx64_ar.dmg.

A workaround is to cancel the download and restart it on the about:downloads page.

Child Tickets

Attachments (2)

tor-21886.patch (1.3 KB) - added by mcs 2 years ago.
a possible fix (work in progress)
stack-with-e10s-and-patch.txt.gz (1.7 KB) - added by mcs 2 years ago.
stack with e10s enabled and patch applied (gzip)

Download all attachments as: .zip

Change History (31)

comment:1 Changed 2 years ago by gk

Component: ApplicationsApplications/Tor Browser
Owner: set to tbb-team
Status: newassigned

comment:2 Changed 2 years ago by cypherpunks

esr52 made #18778 more obvious.

comment:3 Changed 2 years ago by gk

Cc: mcs brade added
Summary: Downloading of binary filles stalls at 0/0 bytes in Tor Browser based on ESR52Downloading of binary files stalls at 0/0 bytes in Tor Browser based on ESR52

Might be related to #21766 (as comment:14:ticket:21340 mentions).

comment:4 Changed 2 years ago by gk

Keywords: tbb-7.0-must-alpha added; tbb-7.0-must removed
Priority: MediumHigh

comment:5 Changed 2 years ago by gk

Keywords: GeorgKoppen201704 added

comment:6 Changed 2 years ago by gk

Status: assignedneeds_information

After looking a bit closer at it I think this is highly related to #21766. mcs/brade: would it be helpful to deal with that bug by looking closer at this issue? Or do you feel you have a solution for #21766? If so you could test whether my assumption is right and this bug gets fixed with it, too.

comment:7 in reply to:  6 ; Changed 2 years ago by mcs

Replying to gk:

After looking a bit closer at it I think this is highly related to #21766. mcs/brade: would it be helpful to deal with that bug by looking closer at this issue?

We will look at this one at the same time as #21766.

Or do you feel you have a solution for #21766? If so you could test whether my assumption is right and this bug gets fixed with it, too.

We do not have a solution yet but are planning to work on that bug tomorrow.

comment:8 Changed 2 years ago by gk

Keywords: GeorgKoppen201704 removed
Owner: changed from tbb-team to mcs
Status: needs_informationassigned

Oh, I did not mean to put that on your plate. I was just trying to avoid different folks working on the same issue. :) But I am fine if you look at that one as well while wrapping your head around #21766.

comment:9 Changed 2 years ago by mcs

Kathy and I can not reproduce this bug when Tor Browser is running in multiprocess mode. When multiprocess mode is enabled, we always see a tab crash, which is #21766. Also, if the pref extensions.torbutton.launch_warning is set to false, neither this bug nor #21766 occur.

I wanted to mention our experience in case someone is seeing different behavior.

comment:10 in reply to:  9 Changed 2 years ago by gk

Summary: Downloading of binary files stalls at 0/0 bytes in Tor Browser based on ESR52Downloading of binary files stalls at 0/0 bytes in Tor Browser based on ESR52 with e10s off

Replying to mcs:

Kathy and I can not reproduce this bug when Tor Browser is running in multiprocess mode. When multiprocess mode is enabled, we always see a tab crash, which is #21766. Also, if the pref extensions.torbutton.launch_warning is set to false, neither this bug nor #21766 occur.

I wanted to mention our experience in case someone is seeing different behavior.

No, this ticket got filed before we had e10s enabled in Tor Browser. And I only see the bug once I have e10s off.

Last edited 2 years ago by gk (previous) (diff)

comment:11 Changed 2 years ago by gk

Keywords: TorBrowserTeam201705 added

Moving more tickets on our May 2015 radar.

comment:12 in reply to:  7 ; Changed 2 years ago by gk

Keywords: GeorgKoppen201705 added

Replying to mcs:

Replying to gk:

After looking a bit closer at it I think this is highly related to #21766. mcs/brade: would it be helpful to deal with that bug by looking closer at this issue?

We will look at this one at the same time as #21766.

Or do you feel you have a solution for #21766? If so you could test whether my assumption is right and this bug gets fixed with it, too.

We do not have a solution yet but are planning to work on that bug tomorrow.

mcs/brade: This bug is not related to #21766 it seems. I'll put it back on my plate, feel free to focus (again) on #21766. I think I can figure our what is going wrong here tomorrow.

comment:13 Changed 2 years ago by gk

Owner: changed from mcs to gk

comment:14 in reply to:  12 Changed 2 years ago by mcs

Replying to gk:

mcs/brade: This bug is not related to #21766 it seems. I'll put it back on my plate, feel free to focus (again) on #21766. I think I can figure our what is going wrong here tomorrow.

That's fine. Kathy and I will be interested to see what you have learned. We just came up with a patch that seems to fix this ticket but not #21766. I will post it as an attachment for reference, although you may have a different fix in mind.

Changed 2 years ago by mcs

Attachment: tor-21886.patch added

a possible fix (work in progress)

comment:15 Changed 2 years ago by mcs

Update: Kathy and I have a patch that fixes both this ticket and #21766, at least on OSX. I am going to test it on Linux before I post it here. We will also need to test it on Windows, although I will probably wait for a new nightly build.

comment:16 Changed 2 years ago by mcs

Unfortunately, the patch does not seem to fix either problem on Linux, at least not without introducing other problems :(
Here it is in case someone else wants to take a look:

https://gitweb.torproject.org/user/brade/tor-browser.git/commit/?h=bug21766-01&id=e8ed33bbb5ad632acc173ab5be3880fc08eab681

comment:17 Changed 2 years ago by cypherpunks

FWIW: modal dialogs are forbidden in e10s. And what's happened with https://dxr.mozilla.org/mozilla-esr52/source/uriloader/exthandler/nsExternalHelperAppService.cpp#753?

Last edited 2 years ago by cypherpunks (previous) (diff)

comment:18 Changed 2 years ago by gk

Just to give some small update as this is a surprisingly big PITA: A crucial part of the issue plays NoScript (although I don't think this is a NoScript bug): if one disables it the problem in this bug goes away. If one leaves it enabled then it is safe to say the problem got introduced between FIREFOX_AURORA_52_BASE and FIREFOX_BETA_52_BASE. The exact commit is not clear yet as I get hard to reproduce crashes in the area of interest which are related to the problem I think.

Last edited 2 years ago by gk (previous) (diff)

comment:19 Changed 2 years ago by gk

Status: assignedneeds_information

Alright, I think I have figured out all the pieces involved. The problem starts with https://bugzilla.mozilla.org/show_bug.cgi?id=1321528 being fixed in https://hg.mozilla.org/releases/mozilla-beta/rev/1c4863a4ff98 AND NoScript 2.9.5.2rc3 (commit 9eab569b1d443456325e2b253f2cca65eb453c30 in the git repo at https://github.com/avian2/noscript). The whole suspend/resume mechanism got introduced there to fix the mediasource blocking for us. ;)

While I don't have assembled all of the pieces to a coherent picture yet, the proposed attached fix seems to be a good idea to me. mcs/brade: That one works across different platforms, right? If so, do you think you want to propose that one for review?

Last edited 2 years ago by gk (previous) (diff)

comment:20 in reply to:  19 ; Changed 2 years ago by mcs

Replying to gk:

Alright, I think I have figured out all the pieces involved. The problem starts with https://bugzilla.mozilla.org/show_bug.cgi?id=1321528 being fixed in https://hg.mozilla.org/releases/mozilla-beta/rev/1c4863a4ff98 AND NoScript 2.9.5.2rc3 (commit 9eab569b1d443456325e2b253f2cca65eb453c30 in the git repo at https://github.com/avian2/noscript). The whole suspend/resume mechanism got introduced there to fix the mediasource blocking for us. ;)

Interesting. NoScript's role is confusing to me. Are you saying that NoScript relies on the fix for https://bugzilla.mozilla.org/show_bug.cgi?id=1321528 and that the same Mozilla fix causes this bug?

While I don't have assembled all of the pieces to a coherent picture yet, the proposed attached fix seems to be a good idea to me. mcs/brade: That one works across different platforms, right? If so, do you think you want to propose that one for review?

No, the fix we proposed does not work on Linux (at least not for us). See comment:16.

comment:21 in reply to:  20 ; Changed 2 years ago by gk

Replying to mcs:

Replying to gk:

Alright, I think I have figured out all the pieces involved. The problem starts with https://bugzilla.mozilla.org/show_bug.cgi?id=1321528 being fixed in https://hg.mozilla.org/releases/mozilla-beta/rev/1c4863a4ff98 AND NoScript 2.9.5.2rc3 (commit 9eab569b1d443456325e2b253f2cca65eb453c30 in the git repo at https://github.com/avian2/noscript). The whole suspend/resume mechanism got introduced there to fix the mediasource blocking for us. ;)

Interesting. NoScript's role is confusing to me. Are you saying that NoScript relies on the fix for https://bugzilla.mozilla.org/show_bug.cgi?id=1321528 and that the same Mozilla fix causes this bug?

No, NoScript does not rely on the fix. But it does channel suspense/resumse things starting with 2.9.5 (and the fixup in 2.9.5.2rc3) which is causing the problems. The Mozilla bugfix is the necessery bit on Mozilla's side to trigger the problem: if either that or Noscript is missing (or NoScript before 2.9.5.2rc3 is installed) the problem in this bug goes away.

While I don't have assembled all of the pieces to a coherent picture yet, the proposed attached fix seems to be a good idea to me. mcs/brade: That one works across different platforms, right? If so, do you think you want to propose that one for review?

No, the fix we proposed does not work on Linux (at least not for us). See comment:16.

Well, that's not the same as in the attachment which is why I was asking.

comment:22 in reply to:  21 ; Changed 2 years ago by mcs

Replying to gk:

Well, that's not the same as in the attachment which is why I was asking.

Ah. I forgot that we posted a patch in an attachment (work in progress) and then later we posted a pointer to what we thought was a better patch. The main difference between the two patches is where our hook is added (nsExternalHelperAppService::DoContent() vs. nsExternalAppHandler::OnStartRequest()). But you probably already knew that...

Do you know if the patch we posted in the attachment fixes this bug on Linux? We did not try it because we were hoping to find a fix for #21766 as well, on all platforms.

comment:23 in reply to:  22 Changed 2 years ago by gk

Replying to mcs:

Replying to gk:

Well, that's not the same as in the attachment which is why I was asking.

Ah. I forgot that we posted a patch in an attachment (work in progress) and then later we posted a pointer to what we thought was a better patch. The main difference between the two patches is where our hook is added (nsExternalHelperAppService::DoContent() vs. nsExternalAppHandler::OnStartRequest()). But you probably already knew that...

Do you know if the patch we posted in the attachment fixes this bug on Linux? We did not try it because we were hoping to find a fix for #21766 as well, on all platforms.

Yes, it works for me on Linux with a custom build. And your patch in the attachment makes the difference between a working and broken download (all other things are equal). And, probably of equal importance, it stops the "http channel Listener OnDataAvailable contract violation" message in the console as well. So, I am slightly optimistic for this ticket... I am doing some nightlies right now for better testing, having your fix on top of tor-browser-52.1.0esr-7.0-2-build1.

comment:24 Changed 2 years ago by gk

I uploaded testbuilds to people-tpo. I ran the Linux builds on three different systems and the Windows build on a Windows 7 machine. Everything worked fine. I had no chance to test macOS yet.

https://people.torproject.org/~gk/testbuilds/21886/ has the bundles.

I am still trying to understand why the fix is working and what the underlying issue of this ticket is.

comment:25 Changed 2 years ago by gk

Cc: arthuredelstein added
Keywords: TorBrowserTeam201705R added; TorBrowserTeam201705 removed
Status: needs_informationneeds_review

Okay, I am not fully done here as I seem to still have some details missing (I need more detailed logs at that point) but so far the patch in the attachment fixes things for me and looks good according to my logs I took for the debugging purposes. I have a branch bug_21886_v3 (https://gitweb.torproject.org/user/gk/tor-browser.git/commit/?h=bug_21886_v3&id=96a9fe10689345b0410b6ee25134de4e27c7c883) for review.

The problematic NoScript part is:

  if (IOUtil.isMediaDocOrFrame(req, contentType)) {
    IOUtil.suspendChannel(req);
    Thread.delay(() => IOUtil.resumeParentChannel(req), 100);
  }

  isMediaDocOrFrame(channel, contentType) {
    try {
      let cpType = channel.loadInfo.externalContentPolicyType;
      if (cpType === 7 || (cpType === 6 &&
          /^(?:video|audio|application)\//i.test(contentType === undefined ? channel.contentType : contentType))) {
        try {
          return !/^attachment\b/i.test(req.getResponseHeader("Content-disposition"));
        } catch(e) {
        }
        return true;
      }
    } catch (e) {
    }
    return false;
  },

I don't really understand what hread.delay(() => IOUtil.resumeParentChannel(req), 100); but I suspect that the problems we are seeing are caused by it.

What happens deeper down in the stack is the request being closed after about 32768 bytes are handled because calling onDataAvailable is resulting in an error that gets back to OnStateStop (NS_ERROR_UNEXPECTED):

[Main Thread]: D/nsStreamPump nsInputStreamPump::OnInputStreamReady [this=7f4591466180]
[Main Thread]: D/nsStreamPump   OnStateTransfer [this=7f4591466180]
[Main Thread]: D/nsStreamPump   Available returned [stream=89904d00 rv=0 avail=32768]
[Main Thread]: D/nsStreamPump   calling OnDataAvailable [offset=0 count=32768(32768)]
[Main Thread]: D/nsHttp nsHttpChannel::OnDataAvailable [this=7f459a4da000 request=7f4591466180 offset=0 count=32768]
[Main Thread]: D/nsHttp sending progress and status notification [this=7f459a4da000 status=804b0006 progress=32768/60562619]
[Main Thread]: D/nsStreamPump   OnStateStop [this=7f4591466180 status=8000ffff]
[Main Thread]: D/nsHttp nsHttpChannel::OnStopRequest [this=7f459a4da000 request=7f4591466180 status=8000ffff]

I'd like to keep this ticket open until we understand in detail what is going wrong but I think having the patch in the alpha can't hurt.

comment:26 Changed 2 years ago by mcs

On OSX, the patch seems to fix the problem of this ticket (e10s off). However, after I cancel from the prompt that is presented by Torbutton the page is left in the loading state (spinner still going inside the tab area). This is a nuisance but probably something we can live with for now.

A much worse problem is that when e10s is enabled this patch causes the main browser process to crash due to an assertion failure (I am testing with a debug build). I will attach a stack, but if we cannot find a solution for #21766 we should consider skipping the prompt if e10s is enabled.

Changed 2 years ago by mcs

stack with e10s enabled and patch applied (gzip)

comment:27 Changed 2 years ago by gk

Keywords: TorBrowserTeam201705 added; TorBrowserTeam201705R removed
Status: needs_reviewneeds_revision

Good finding. I did not test with e10s enabled as I was only focused on the non-e10s case. :(

comment:28 Changed 2 years ago by gk

Keywords: tbb-7.0-must added; tbb-7.0-must-alpha removed

We are beyond the alpha testing. Moving tickets for tbb-7.0-must.

comment:29 Changed 2 years ago by gk

Resolution: fixed
Status: needs_revisionclosed

I tested this on different machines and operating systems. It seems the patch for #21766 has fixed this one as well. Closing.

Note: See TracTickets for help on using tickets.