Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#21766 closed defect (fixed)

Tor Browser based on ESR52 with e10s enabled crashed while trying to download a file

Reported by: gk Owned by: mcs
Priority: Very High Milestone:
Component: Applications/Tor Browser Version:
Severity: Critical Keywords: ff52-esr, tbb-e10s, tbb-crash, TorBrowserTeam201706, tbb-7.0-must
Cc: mcs, brade, arthuredelstein, qbi, torland Actual Points:
Parent ID: Points:
Reviewer: Sponsor: Sponsor4

Description

I tried to download https://dist.torproject.org/torbrowser/6.5.1/TorBrowser-6.5.1-osx64_ar.dmg and got the notice that my tab crashed. The console has the following output:

[03-17 08:35:52] Torbutton INFO: External app requested
###!!! [Parent][DispatchAsyncMessage] Error: PExternalHelperApp::Msg_DivertToParentUsing Route error: message sent to unknown actor ID
[Parent 3552] WARNING: pipe error (51): Connection reset by peer: file /home/debian/build/tor-browser/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 322
[Parent 3552] WARNING: pipe error (53): Connection reset by peer: file /home/debian/build/tor-browser/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 322
[Parent 3552] WARNING: pipe error (47): Connection reset by peer: file /home/debian/build/tor-browser/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 322
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0074,name=PBrowser::Msg_SynthMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x46001C,name=PContent::Msg_PreferenceUpdate) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C006E,name=PBrowser::Msg_ParentActivated) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C006D,name=PBrowser::Msg_Deactivate) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0070,name=PBrowser::Msg_StopIMEStateManagement) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0075,name=PBrowser::Msg_RealMouseButtonEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0075,name=PBrowser::Msg_RealMouseButtonEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0073,name=PBrowser::Msg_RealMouseMoveEvent) Channel error: cannot send/recv

...

###!!! [Parent][MessageChannel] Error: (msgtype=0x2C0085,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv

Might be related to our external app handler patch.

Child Tickets

Attachments (1)

stack.txt.gz (4.0 KB) - added by mcs 3 years ago.
call stack at the point where the chrome process kills the content process

Download all attachments as: .zip

Change History (51)

comment:1 Changed 3 years ago by gk

Cc: mcs brade arthuredelstein added
Priority: HighVery High
Severity: MajorCritical

Oh, and this is reproducible it seems: I am using the up-to-date branch in #20680 and compile it for Linux64 bit. I use Torbutton with all the e10s fixes that are currently available and enable e10s by setting browser.tabs.remote.autostart and browser.tabs.remote.force-enable to true.

Last edited 3 years ago by gk (previous) (diff)

comment:2 Changed 3 years ago by arthuredelstein

Might be related to our external app handler patch.

I tried reverting our #19273 patch in tor-browser.git and the crash indeed went away. So it looks like something in that patch or the torbutton patch needs to be changed to be compatible with e10s.

Version 0, edited 3 years ago by arthuredelstein (next)

comment:3 Changed 3 years ago by arthuredelstein

If I comment out the lines

    var result = prompts.confirmEx(chrome, title, app+note+suggest+" ",
                                   flags, launch, cancel, "", dontask, check);

from external-app-blocker.js then there is no crash (and no dialog of course).

I'm not sure why this is crashing, but I have a hunch it is running in the child process and wouldn't crash in the parent process.

comment:4 Changed 3 years ago by gk

Keywords: tbb-7.0-must-nightly added

We want those tickets for our first ESR52 nightlies.

comment:5 Changed 3 years ago by mcs

Kathy and I tried to debug this problem. The main thing we learned is that we are not yet good at understanding e10s-related problems :)

What we do know is that the main (chrome) process is killing the content process (which is associated with the tab where the download is initiated). Code inside the KillProcess() function within ipc/chromium/src/base/process_util_posix.cc sends a SIGTERM to the content process . But why? The key hint seems to be this log message:

###!!! [Parent][DispatchAsyncMessage] Error: PExternalHelperApp::Msg_DivertToParentUsing Route error: message sent to unknown actor ID

I will attach a call stack that shows how the code gets there, but Kathy and I don't know the answer to the really important question: "Why?". Our best guess is that we should not be trying to use the prompt service from the chrome process at all, but that seems surprising.

Does anyone know how we can learn more about what APIs can safely be used from which kind of process? Or maybe we should just ask some Mozilla engineers?

comment:6 in reply to:  5 ; Changed 3 years ago by arthuredelstein

Replying to mcs:

For some reason I can't open the stack.txt.gz. It unzips but I don't get readable text.

comment:7 in reply to:  6 ; Changed 3 years ago by cypherpunks

Replying to arthuredelstein:

Replying to mcs:

For some reason I can't open the stack.txt.gz. It unzips but I don't get readable text.

It's an April Fool's day joke ;)
Trac's preview of gz too :))
Actually, it's an archive in archive, so you need some tricks to "decypher" it ;)))
Unfortunately, Trac's captcha fails with error when trying to submit this call stack in comment, because Tor Browser is broken :(

comment:8 Changed 3 years ago by cypherpunks

Process 26651 stopped
* thread #1: tid = 0x7801d1, 0x0000000108aecfa3 XUL`base::KillProcess(process_id=26709, exit_code=1, wait=false) + 35 at process_util_posix.cc:71, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
    frame #0: 0x0000000108aecfa3 XUL`base::KillProcess(process_id=26709, exit_code=1, wait=false) + 35 at process_util_posix.cc:71
   68  	// entry structure.  Ignores specified exit_code; posix can't force that.
   69  	// Returns true if this is successful, false otherwise.
   70  	bool KillProcess(ProcessHandle process_id, int exit_code, bool wait) {
-> 71  	fprintf(stderr, "\n\nKillProcess %d\n\n", process_id);
   72  	  bool result = kill(process_id, SIGTERM) == 0;
   73  	
   74  	  if (result && wait) {

comment:9 Changed 3 years ago by cypherpunks

Hmm, one spam template can be overcome.

Changed 3 years ago by mcs

Attachment: stack.txt.gz added

call stack at the point where the chrome process kills the content process

comment:10 in reply to:  7 ; Changed 3 years ago by mcs

Replying to cypherpunks:

Actually, it's an archive in archive, so you need some tricks to "decypher" it ;)))

You are right. I uploaded the attachment again, but the result is the same. Must be a Trac "feature."

comment:11 in reply to:  10 Changed 3 years ago by cypherpunks

Replying to mcs:

Replying to cypherpunks:

Actually, it's an archive in archive, so you need some tricks to "decypher" it ;)))

You are right. I uploaded the attachment again, but the result is the same. Must be a Trac "feature."

for cypherpunks only ;)

comment:12 Changed 3 years ago by gk

Keywords: TorBrowserTeam201704 added; TorBrowserTeam201703 removed

Moving tickets over to April

comment:13 Changed 3 years ago by gk

Keywords: tbb-7.0-must removed

No need to use somewhat duplicated keywords.

comment:14 Changed 3 years ago by gk

Keywords: tbb-7.0-must-alpha added; tbb-7.0-must-nightly removed

Things to tag for the alphas (as the remaining tasks in the tickets are no nightly bloclers).

comment:15 Changed 3 years ago by gk

Keywords: tbb-e10s added

Adding e10s keyword to track multiprocess related bugs.

comment:16 Changed 3 years ago by gk

Summary: Tor Browser based on ESR52 with e10s enabled crashed whle trying to download a fileTor Browser based on ESR52 with e10s enabled crashed while trying to download a file

comment:17 in reply to:  description Changed 2 years ago by cypherpunks

Replying to gk:

I tried to download [...] and got the notice that my tab crashed.

And to get buttons on that page work you need to allow JS on about:tabcrashed.

comment:18 Changed 2 years ago by gk

Keywords: tbb-crash added

comment:19 Changed 2 years ago by qbi

Cc: qbi added

comment:20 Changed 2 years ago by mcs

Owner: changed from tbb-team to mcs
Status: newassigned

comment:21 Changed 2 years ago by gk

Keywords: TorBrowserTeam201705 added; TorBrowserTeam201704 removed

Moving our tickets to May 2017.

comment:22 Changed 2 years ago by mcs

see ticket:21886#comment:16 for a patch that tries but fails to fix this problem.

comment:23 Changed 2 years ago by mcs

Cc: torland added

#22195 is a duplicate.

comment:24 Changed 2 years ago by gk

While pondering #21886 I was wondering whether it would help to just get rid of the observer idea and leave the extension out of the picture: could we just show a dialog at places where we are currently using the observer mechanism?

comment:25 in reply to:  24 Changed 2 years ago by mcs

Replying to gk:

While pondering #21886 I was wondering whether it would help to just get rid of the observer idea and leave the extension out of the picture: could we just show a dialog at places where we are currently using the observer mechanism?

We could try that, but I think the main problem is that a modal dialog is displayed (as a cypherpunk mentioned somewhere). With e10s enabled, it seems that while a modal dialog is open events related to the IPC mechanism are not handled correctly. This causes the chrome process to kill the content process (to the user, this appears as a tab crash). Therefore I think we will still have a problem if we move the dialog into the browser/C++ code.

Mozilla does show dialogs during a download, but they are not modal and they do not block the network activity (so, for example, the download proceeds while the user is answering the "What would you like to do with this file? open/save" prompt. I assume we want to stop the network activity as in Tor Browser 6.5.2.

The most promising approach seems to be to suspend and then resume the request, but when e10s is enabled that does not work reliably when we try it from any of the obvious locations, e.g., not from DoContent() inside uriloader/exthandler/nsExternalHelperAppService.cpp, not from OnStartRequest(), not inside the first call to OnDataAvailable().

comment:26 Changed 2 years ago by mcs

Another small tidbit of information: this bug occurs even when NoScript is disabled (which is not true for #21886).

comment:27 Changed 2 years ago by gk

As time is running short we might want to get creative and think about ways to solve this bug by working around this bug.

One option proposed is to get rid of our external app blocker as it is today assuming e10s enabled is more valuable. Another one is to fix #21886 properly and disable e10s until we find a good solution for this bug. A third one would be...

comment:28 in reply to:  27 Changed 2 years ago by mcs

Replying to gk:

A third one would be...

  1. Move the Tor Browser specific warning text into one of Mozilla's existing dialogs. A problem with this is that their dialogs do not pause the download, so bytes are being written to temporary files even though the user probably thinks everything is paused.
  1. Something else that is more clever.

comment:29 Changed 2 years ago by gk

Keywords: tbb-7.0-must added; tbb-7.0-must-alpha removed

We are beyond the alpha testing. Moving tickets for tbb-7.0-must.

comment:30 in reply to:  22 ; Changed 2 years ago by arthuredelstein

Replying to mcs:

see ticket:21886#comment:16 for a patch that tries but fails to fix this problem.

I cherry-picked this patch onto tor-browser-52.1.1esr-7.0-1-build1 (454a231f2b76a8857748d93f666bb93199f4b963), built tor-browser.git, and added NoScript and torbutton. The patch worked for me (with e10s active) -- I saw the confirmation prompt and was able to cancel or download files on Linux. I used gdb to check the return values of Suspend() and Resume() and they both returned NS_OK.

So I'm wondering if something might have been patched in the latest version of 52ESR? I haven't checked on Windows yet but it seems like this patch might actually be OK.

Last edited 2 years ago by arthuredelstein (previous) (diff)

comment:31 in reply to:  30 ; Changed 2 years ago by mcs

Replying to arthuredelstein:

Replying to mcs:

see ticket:21886#comment:16 for a patch that tries but fails to fix this problem.

I cherry-picked this patch onto tor-browser-52.1.1esr-7.0-1-build1 (454a231f2b76a8857748d93f666bb93199f4b963), built tor-browser.git, and added NoScript and torbutton. The patch worked for me (with e10s active) -- I saw the confirmation prompt and was able to cancel or download files on Linux. I used gdb to check the return values of Suspend() and Resume() and they both returned NS_OK.

Interesting. Did you test with an optimized build or a debug build?

I am creating a debug build now with the patch applied to the tip of tor-browser-52.1.1esr-7.0-1 (7bbd69956bd1a30051599a1dd4bf615bc40e55e8, aka tor-browser-52.1.1esr-7.0-1-build2). I want to be sure to test with the #22254 fix in place. But you must have fixed that manually?

comment:32 Changed 2 years ago by mcs

Update: The fix works for Kathy and me on Linux as well, unless the file being downloaded is so small that the network traffic is finished before we try to call Suspend() (Kathy and I happen to have a tiny 344 byte zip file that we were testing with among other larger files). We are now trying to modify the patch so that it only suspends if necessary.

Any takers for testing on Windows, or should we simply wait for a nightly build? Kathy and I don't have a good standlone (non-gitian) Windows build setup, or at least we have not tried to build an ESR52-based browser there. But we should be able to do a gitian-based build if necessary.

comment:33 in reply to:  31 Changed 2 years ago by arthuredelstein

Replying to mcs:

I cherry-picked this patch onto tor-browser-52.1.1esr-7.0-1-build1 (454a231f2b76a8857748d93f666bb93199f4b963), built tor-browser.git, and added NoScript and torbutton. The patch worked for me (with e10s active) -- I saw the confirmation prompt and was able to cancel or download files on Linux. I used gdb to check the return values of Suspend() and Resume() and they both returned NS_OK.

Interesting. Did you test with an optimized build or a debug build?

This was a debug build. I can try an optimized build, but it sounds like you have maybe tracked down the issue in comment:32.

I am creating a debug build now with the patch applied to the tip of tor-browser-52.1.1esr-7.0-1 (7bbd69956bd1a30051599a1dd4bf615bc40e55e8, aka tor-browser-52.1.1esr-7.0-1-build2). I want to be sure to test with the #22254 fix in place. But you must have fixed that manually?

Actually, I didn't. I don't know offhand why I didn't run into that problem. But I was able to manually install torbutton and NoScript. I didn't do a full rbm build though.

comment:34 in reply to:  32 Changed 2 years ago by arthuredelstein

Replying to mcs:

Update: The fix works for Kathy and me on Linux as well, unless the file being downloaded is so small that the network traffic is finished before we try to call Suspend() (Kathy and I happen to have a tiny 344 byte zip file that we were testing with among other larger files). We are now trying to modify the patch so that it only suspends if necessary.

Any takers for testing on Windows, or should we simply wait for a nightly build? Kathy and I don't have a good standlone (non-gitian) Windows build setup, or at least we have not tried to build an ESR52-based browser there. But we should be able to do a gitian-based build if necessary.

I can run a test on Windows today -- let me know when you have the modified patch.

comment:35 Changed 2 years ago by mcs

Keywords: TorBrowserTeam201705R added; TorBrowserTeam201705 removed
Status: assignedneeds_review

Determining when to call Suspend() was a little more difficult than we expected, but here is the new patch:
https://gitweb.torproject.org/user/brade/tor-browser.git/commit/?h=bug21766-02&id=ead6d27079482708ddd3a1c748effe408a817a97

Kathy and I tested it on Linux64 only. Based on previous experience, I expect it to work fine on OSX (we are out of time for today to test it there, unfortunately). Windows is the big unknown, but I am optimistic!

comment:36 Changed 2 years ago by mcs

This patch seems to fix both this ticket and #21886 on OSX as well. One caveat is that sometimes after starting to download a very small file and clicking "Cancel" on the Torbutton-provided warning prompt, subsequent attempts to download the very small file hang (the page stays in the loading state). That could be a different problem though, so for now Kathy and I are going to wait for feedback about Windows and for a code review.

comment:37 Changed 2 years ago by arthuredelstein

I built and manually tested on Windows (with both large files and very small files) and it works! I wasn't able to reproduce the problem seen in OSX in comment:36.

In the patch, I notice there are a few calls in nsExternalAppHandler::OnStartRequest where you get a return value, assign it to rv and ignore it. I wonder what should be done if Suspend() or ShouldCancel() or Resume() returns an error code? Or potentially the Unused << Function(); construct could be used instead:
https://dxr.mozilla.org/mozilla-esr52/source/mfbt/Unused.h

Otherwise the patch looks good to me.

comment:38 in reply to:  37 ; Changed 2 years ago by mcs

Replying to arthuredelstein:

I built and manually tested on Windows (with both large files and very small files) and it works! I wasn't able to reproduce the problem seen in OSX in comment:36.

Thanks a lot for testing on Windows.

In the patch, I notice there are a few calls in nsExternalAppHandler::OnStartRequest where you get a return value, assign it to rv and ignore it. I wonder what should be done if Suspend() or ShouldCancel() or Resume() returns an error code? Or potentially the Unused << Function(); construct could be used instead:
https://dxr.mozilla.org/mozilla-esr52/source/mfbt/Unused.h

Kathy and I intended to ignore failures when calling Suspend() and Resume(). We had the rv = there to make debugging easier, so we will try using Unused <<.

I think we should add restore the NS_ENSURE_SUCCESS(rv, rv); after the call to shouldCance() though; that was removed by mistake.

Georg, do you have any comments on the patch?

comment:39 in reply to:  38 ; Changed 2 years ago by gk

Replying to mcs:

Kathy and I intended to ignore failures when calling Suspend() and Resume(). We had the rv = there to make debugging easier, so we will try using Unused <<.

I think we should add restore the NS_ENSURE_SUCCESS(rv, rv); after the call to shouldCance() though; that was removed by mistake.

Both sounds good to me. Do we understand why your previous patch (bug21766-01) was not enough?

comment:40 in reply to:  39 Changed 2 years ago by mcs

Replying to gk:

Both sounds good to me. Do we understand why your previous patch (bug21766-01) was not enough?

Because the first test I ran on Linux with that patch was an attempt to download a very small file (344 bytes). Basically, comment:32.

We will create a revised patch that fixes the issues raised in comment:37.

comment:42 Changed 2 years ago by gk

Status: needs_reviewneeds_information

Another thought: Do we care about FTP downloads? It seems your patch does not address those but they are affected as well (just tested with ftp://ftp.gnu.org/gnu/gcc/). Leaving this ticket open for that question for now. We can solve this in a fixup commit if needed.

I applied the latest patch to tor-browser-52.1.1esr-7.0-1 (commit 4a4285cbd06a825e5f2277d12a2e5165eab59837) and tor-browser-52.1.0esr-7.0-2 (commit 1d1fdea85218586354a1294cb94026279e74a0e3).

comment:43 in reply to:  42 ; Changed 2 years ago by mcs

Replying to gk:

Another thought: Do we care about FTP downloads? It seems your patch does not address those but they are affected as well (just tested with ftp://ftp.gnu.org/gnu/gcc/). Leaving this ticket open for that question for now. We can solve this in a fixup commit if needed.

I think we should care about FTP, at least a little. So far I cannot connect to ftp://ftp.gnu.org/gnu/gcc/ over Tor (I consistently get a "425 Bad IP Connecting" error). Kathy and I did test the patch last week with a large FTP download (ftp://speedtest.tele2.net/50MB.zip). Aside from sometimes getting a "Bad IP" error with that server as well, it worked okay. But I just tried with a small file (ftp://speedtest.tele2.net/1KB.zip) and got an assertion failure while running a debug build on OSX:

Assertion failure: ((bool)(__builtin_expect(!!(!NS_FAILED_impl(rv)), 1))) || rv == NS_ERROR_NOT_AVAILABLE, at /.../netwerk/protocol/ftp/FTPChannelParent.cpp:646
#01: mozilla::net::FTPChannelParent::SuspendForDiversion()[/Users/.../Desktop/tb.app/Contents/MacOS/XUL +0x80503e]
...

We will need to debug this. Georg, what behavior do you see?

comment:44 in reply to:  43 Changed 2 years ago by gk

Replying to mcs:

Replying to gk:

Another thought: Do we care about FTP downloads? It seems your patch does not address those but they are affected as well (just tested with ftp://ftp.gnu.org/gnu/gcc/). Leaving this ticket open for that question for now. We can solve this in a fixup commit if needed.

I think we should care about FTP, at least a little. So far I cannot connect to ftp://ftp.gnu.org/gnu/gcc/ over Tor (I consistently get a "425 Bad IP Connecting" error). Kathy and I did test the patch last week with a large FTP download (ftp://speedtest.tele2.net/50MB.zip). Aside from sometimes getting a "Bad IP" error with that server as well, it worked okay. But I just tried with a small file (ftp://speedtest.tele2.net/1KB.zip) and got an assertion failure while running a debug build on OSX:

Assertion failure: ((bool)(__builtin_expect(!!(!NS_FAILED_impl(rv)), 1))) || rv == NS_ERROR_NOT_AVAILABLE, at /.../netwerk/protocol/ftp/FTPChannelParent.cpp:646
#01: mozilla::net::FTPChannelParent::SuspendForDiversion()[/Users/.../Desktop/tb.app/Contents/MacOS/XUL +0x80503e]
...

We will need to debug this. Georg, what behavior do you see?

I had not tested a build with your patch applied. I just encountered a crash while testing a 7.0a4 bundle downloading binutils via FTP and realized that your code is not touching anything in netwerk/protocol/ftp while it does in a bunch of other protocols. So, it's more a hunch/question I had. :)

comment:45 Changed 2 years ago by gk

I thought about testing #17933 for #20685 and I realized that the former is affected by this bug as well: while your patch is working for me when downloading the *.dmg mentioned in the description I can't download PDF files by clicking on the download button in the PDF viewer. While the tab is not crashing it shows the same symptoms as #21886 (but e10s is enabled).

comment:46 Changed 2 years ago by gk

Keywords: TorBrowserTeam201706 added; TorBrowserTeam201705R removed
Status: needs_informationneeds_revision

comment:47 Changed 2 years ago by gk

FWIW: the patch in comment:41 made it into tor-browser-52.1.1esr-7.0-1 as commit 4a4285cbd06a825e5f2277d12a2e5165eab59837 and into tor-browser-52.1.0est-7.0-2 as commit 1d1fdea85218586354a1294cb94026279e74a0e3. The items mentioned in comment:45 could be separate patches.

comment:48 Changed 2 years ago by mcs

We have not found a fix for the pdfjs download problem, but I think the reason our earlier fix is ineffective is because a blobURI is used for the download, and therefore the channel associated with the download cannot be suspended (suspend and resume are only implemented for some channel types). So in this case the suspend/resume solution won't work.

comment:49 in reply to:  48 Changed 2 years ago by gk

Resolution: fixed
Status: needs_revisionclosed

Replying to mcs:

We have not found a fix for the pdfjs download problem, but I think the reason our earlier fix is ineffective is because a blobURI is used for the download, and therefore the channel associated with the download cannot be suspended (suspend and resume are only implemented for some channel types). So in this case the suspend/resume solution won't work.

Okay, this makes sense. Let's close this bug and #21886 and file remaining issues on new tickets.

comment:50 Changed 2 years ago by gk

Those tickets are #22471 and #22472.

Note: See TracTickets for help on using tickets.