#9531 closed defect (fixed)
More Torbutton hangs on New Identity control port access
Reported by: | mikeperry | Owned by: | gk |
---|---|---|---|
Priority: | High | Milestone: | |
Component: | Applications/Tor Browser | Version: | |
Severity: | Keywords: | tbb-usability, tbb-newnym, TorBrowserTeam201408, tbb-torbutton, MikePerry201408R | |
Cc: | gk, mcs, brade, rex_4539 | Actual Points: | |
Parent ID: | Points: | ||
Reviewer: | Sponsor: |
Description (last modified by )
skruffy is reporting hangs on Torbutton's New Identity. Oddly, the hangs seem to be more common if you are loading a bunch of websites at the time of hitting New Identity, but that is not a full repro case.
The hang actually appears to be happening while attempting to open a control port connection. Perhaps another race introduced by our SOCKS optimistic data hack in nsITransport? Or perhaps our hack for #8642 is insufficient in some cases, and/or during heavy network activity.
Child Tickets
Change History (48)
comment:1 Changed 6 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 6 years ago by
comment:3 Changed 6 years ago by
Modified torbutton_do_new_identity() so it calls only torbutton_send_ctrl_cmd() and nothing else. Hangs still happens in the same place.
comment:4 Changed 6 years ago by
Socket is never alive during repro case that leads to hangs. Tested with modified torbutton_send_ctrl_cmd() so it waited alive socket after opening stream, like so:
var input = socket.openInputStream(3, 1, 1); // 3 == OPEN_BLOCKING|OPEN_UNBUFFERED var output = socket.openOutputStream(3, 1, 1); // 3 == OPEN_BLOCKING|OPEN_UNBUFFERED while (!socket.isAlive()) {} torbutton_log(4, "Alive!");
If hangs happens then it never reaches "Alive!" and browser ask about unresponsive script later.
comment:5 Changed 6 years ago by
I got reproducable hang using modified torbutton_send_ctrl_cmd(), on startup without any pages loaded.
var socketTransportService = Components.classes["@mozilla.org/network/socket-transport-service;1"] .getService(Components.interfaces.nsISocketTransportService); var sockets = new Array(); var max=540; // 550 hangs, 545 bypasses for(var i = 0; i < max; i++) { var s = socketTransportService.createTransport(null, 0, m_tb_control_host, m_tb_control_port, null); var input = s.openInputStream(3, 1, 1); // 3 == OPEN_BLOCKING|OPEN_UNBUFFERED var output = s.openOutputStream(3, 1, 1); // 3 == OPEN_BLOCKING|OPEN_UNBUFFERED sockets.push(s); } for(var i = 0; i < max-1; i++) { sockets[i].close; } var socket = sockets[max-2]; torbutton_log(4, "Browser will hang now!");
Those number for tested box, it's depends platform in some way.
I was unable to find any correlation with local limits like max descriptors or something. Probably missed something.
comment:6 Changed 6 years ago by
var socket = sockets[max-2];
Should be var socket = sockets[max-1];
comment:9 Changed 6 years ago by
Cc: | g.koppen@… added |
---|
comment:10 Changed 6 years ago by
Status: | new → needs_review |
---|
This code helps to avoid conditions that leads to hang if no connection able to create.
--- torbutton.js.original +++ torbutton.js @@ -1385,6 +1385,20 @@ } } +function torbutton_process_even_queue() { + // Get the current thread. + var thread = Components.classes["@mozilla.org/thread-manager;1"]. + getService(Components.interfaces.nsIThreadManager).currentThread; +/* +Warning: Calling nsIThread.processNextEvent allows network +and UI events to run which can modify data structures that +your code isn't expecting to be modified during a synchronous +method call. This can cause random crashes and other bugs that +may be hard to find and fix. Consider use asynchronous callbacks instead. +*/ + while (thread.processNextEvent(false)) {} +} + /* The "New Identity" implementation does the following: * 1. Disables Javascript and plugins on all tabs * 2. Clears state: @@ -1580,7 +1594,8 @@ var prefService = Components.classes["@mozilla.org/preferences-service;1"] .getService(Components.interfaces.nsIPrefService); prefService.savePrefFile(null); - + + torbutton_process_even_queue(); torbutton_log(3, "New Identity: Sending NEWNYM"); // We only support TBB for newnym.
comment:12 follow-up: 13 Changed 6 years ago by
Why do you think that warning can be ignored in the current case?
comment:13 Changed 6 years ago by
Replying to gk:
Why do you think that warning can be ignored in the current case?
Do you think that warning can't be ignored?
No any modified data should prevent: access control port, open new window, close current window. Before "Sending NEWNYM" it actually assumed everything finished: tabs closed, conns destroyed, prefs synced, etc. With processing all events in queue you can guarantee that.
comment:14 Changed 6 years ago by
The only thing is UI. User can open new tab, or press "NEWNYM" yet while it process queue? or something evil yet.
Ok lets fix browser code instead, it's bug anyway. Or migrate away.
comment:15 Changed 6 years ago by
Status: | needs_review → new |
---|
comment:16 Changed 6 years ago by
https://bugzilla.mozilla.org/show_bug.cgi?id=698882
Is it reason of hangs?
comment:17 Changed 6 years ago by
Here comes some gdb output as a starting point:
#4 0xb628579a in mozilla::ReentrantMonitor::Wait (this=0x9b2814e0, interval=4294967295) at /home/ubuntu/build/tor-browser/obj-i686-pc-linux-gnu/xpcom/build/BlockingResourceBase.cpp:313 #5 0xb4a8f352 in mozilla::ReentrantMonitorAutoEnter::Wait (this=0xbfffb7ac, interval=4294967295) at ../../dist/include/mozilla/ReentrantMonitor.h:192 #6 0xb62caa37 in nsPipeOutputStream::Wait (this=0x9b2814bc) at /home/ubuntu/build/tor-browser/xpcom/io/nsPipe3.cpp:980 #7 0xb62cae45 in nsPipeOutputStream::WriteSegments (this=0x9b2814bc, reader=0xb62cafea <nsReadFromRawBuffer(nsIOutputStream*, void*, char*, uint32_t, uint32_t, uint32_t*)>, closure=0x9b5e0be0, count=46, writeCount=0xbfffb898) at /home/ubuntu/build/tor-browser/xpcom/io/nsPipe3.cpp:1090 #8 0xb62cb077 in nsPipeOutputStream::Write (this=0x9b2814bc, fromBuf=0x9b5e0be0 "AUTHENTICATE 422f587d45483e5b737b2f7863234042\r\n", bufLen=47, writeCount=0xbfffb898) at /home/ubuntu/build/tor-browser/xpcom/io/nsPipe3.cpp:1150 #9 0xb62bc86d in nsBinaryOutputStream::Write (this=0x9b6d1980, aBuf=0x9b5e0be0 "AUTHENTICATE 422f587d45483e5b737b2f7863234042\r\n", aCount=47, aActualBytes=0xbfffb898) at /home/ubuntu/build/tor-browser/xpcom/io/nsBinaryStream.cpp:51 #10 0xb62bd0e7 in nsBinaryOutputStream::WriteBytes (this=0x9b6d1980, aString=0x9b5e0be0 "AUTHENTICATE 422f587d45483e5b737b2f7863234042\r\n", aLength=47) at /home/ubuntu/build/tor-browser/xpcom/io/nsBinaryStream.cpp:216
It is pretty reliably reproducible in my debug build if I open a bunch of boomkarks (say 5) simultaneously and choose the "New Identity" option immediately afterwards.
comment:18 Changed 6 years ago by
Keywords: | tbb-newnym added |
---|
comment:19 Changed 6 years ago by
Cc: | gk mcs brade added; g.koppen@… removed |
---|
comment:21 Changed 6 years ago by
Another user reported similar symptoms (beginning with 3.5.2): https://blog.torproject.org/blog/tor-browser-3521-released#comment-47522
comment:22 Changed 6 years ago by
Okay, I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=978690 to get things started. Let's see what Mozilla folks are saying.
comment:23 Changed 6 years ago by
TL;DR: We can't use blocking sockets the way we are doing that right now. They are responsible for the hangs.
If we only have non-TLS connections everything is fine. The picture changes as soon as TLS connections are processed while we click on New Identity or start the browser. Then the following may happen (this is the case when we start-up; hitting New Identity gives a slightly different stacktrace which boils down to the same problem, though):
#1 0xb772cc5b in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:187 #2 0xb71ec899 in PR_WaitCondVar (cvar=0xa1a07d00, timeout=4294967295) at /home/ubuntu/build/tor-browser/nsprpub/pr/src/pthreads/ptsynch.c:385 #3 0xb5e53402 in Wait (this=<optimized out>, interval=<optimized out>) at ../../../../dist/include/mozilla/CondVar.h:71 #4 Wait (interval=<optimized out>, this=<optimized out>) at ../../../../dist/include/mozilla/Monitor.h:47 #5 Wait (interval=<optimized out>, this=<optimized out>) at ../../../../dist/include/mozilla/Monitor.h:102 #6 mozilla::psm::SyncRunnableBase::DispatchToMainThreadAndWait ( this=0xa1a2f6e0) at /home/ubuntu/build/tor-browser/security/manager/ssl/src/PSMRunnable.cpp:25 #7 0xb5e45b6d in nsNSSSocketInfo::GetPreviousCert (this=0xab3c8580, _result=0xb28feef4) at /home/ubuntu/build/tor-browser/security/manager/ssl/src/nsNSSIOLayer.cpp:427 #8 0xb5e31670 in HandshakeCallback (fd=0xab318280, client_data=0xab3c8580) at /home/ubuntu/build/tor-browser/security/manager/ssl/src/nsNSSCallbacks.cpp:901 #9 0xb709f871 in ssl_FinishHandshake (ss=0xfffffe00) at sslsecur.c:141 #10 0xb708cd65 in ssl3_FinishHandshake (ss=0xaa08a000) at ssl3con.c:10637 #11 0xb708e609 in ssl3_AuthCertificateComplete (ss=0xaa08a000, error=0) at ssl3con.c:10040 #12 0xb709faec in SSL_AuthCertificateComplete (fd=0xaa0760e0, error=0) at sslsecur.c:1566 #13 0xb5e486ea in nsNSSSocketInfo::SetCertVerificationResult (this=0xab3c8580, errorCode=0, errorMessageType=mozilla::psm::PlainErrorMessage) at /home/ubuntu/build/tor-browser/security/manager/ssl/src/nsNSSIOLayer.cpp:456 #14 0xb5e53ecf in mozilla::psm::(anonymous namespace)::SSLServerCertVerificationResult::Run (this=0xa1a2f6a0) at /home/ubuntu/build/tor-browser/security/manager/ssl/src/SSLServerCertVerification.cpp:1357 #15 0xb6366498 in nsThread::ProcessNextEvent (this=0xb2907a60, mayWait=true, result=0xb28ff1ef) at /home/ubuntu/build/tor-browser/xpcom/threads/nsThread.cpp:626 #16 0xb63307a7 in NS_ProcessNextEvent (thread=0xfffffe00, mayWait=true) at /home/ubuntu/build/tor-browser/obj-i686-pc-linux-gnu/xpcom/build/nsThreadUtils.cpp:238 #17 0xb543f43e in nsSocketTransportService::Run (this=0xb29079a0) at /home/ubuntu/build/tor-browser/netwerk/base/src/nsSocketTransportService2.cpp:648
A SSLServerCertVerificationResult is dispatched to the socket transport thread (see: https://mxr.mozilla.org/mozilla-esr24/source/security/manager/ssl/src/SSLServerCertVerification.cpp#88 for a reason why this has to happen). Processing this event leads to dispatching PreviousCertRunnable to the main thread. Now, if that is processed on the main thread *before* our
outputStream.writeBytes(command, command.length);
everything is fine. If not we see something like the following in the log:
-1344275648[b722a540]: nsSocketOutputStream::Write [this=9d128794 count=0] -1344275648[b722a540]: calling PR_Write [count=0] -1399850176[acb3b680]: STS dispatch [ab6821f0] -1220368640[b722a300]: STS dispatch [a9dca700] -1220368640[b722a300]: STS dispatch [aa76cce0] -1220368640[b722a300]: STS dispatch [aa76cd00] -1220368640[b722a300]: creating nsSocketTransport @9d128e00 -1220368640[b722a300]: nsSocketTransport::Init [this=9d128e00 host=127.0.0.1:9151 proxy=:0] -1220368640[b722a300]: nsSocketTransport::PostEvent [this=9d128e00 type=3 status=0 param=0] -1220368640[b722a300]: STS dispatch [aa76cde0] -1220368640[b722a300]: nsSocketTransport::OpenInputStream [this=9d128e00 flags=3] -1220368640[b722a300]: STS dispatch [aa46c2e8] -1220368640[b722a300]: nsSocketTransport::PostEvent [this=9d128e00 type=0 status=0 param=0] -1220368640[b722a300]: STS dispatch [aa76ce00] -1220368640[b722a300]: nsSocketTransport::OpenOutputStream [this=9d128e00 flags=3] -1220368640[b722a300]: STS dispatch [aa46c338] -1220368640[b722a300]: nsSocketTransport::PostEvent [this=9d128e00 type=0 status=0 param=0] -1220368640[b722a300]: STS dispatch [aa76ce40] -1220368640[b722a300]: OOO WriteSegments [this=a18ed2ac count=47] -1220368640[b722a300]: OOO appended new segment -1220368640[b722a300]: OOO rolling back write cursor 0 bytes -1220368640[b722a300]: OOO SegmentLen is 1 -1220368640[b722a300]: OOO ReadCount is 1 -1220368640[b722a300]: OOO SegmentLen is now 0 and originalLen is 1 -1220368640[b722a300]: OOO advancing write cursor by 1 -1220368640[b722a300]: OOO pipe output: waiting for space -1690313920[aa456940]: STS dispatch [aa49a3a0] -1673528512[aa456640]: STS dispatch [aa48fac0] -1681921216[aa4567c0]: STS dispatch [aa496ec0]
Calling PR_Write actually leads to dispatching the TLS stuff to the socket thread as explained above (note though that this log documents a hang on "New Identity"). While the TLS code is running on that thread Torbutton creates a new socket transport and is opening an input and an output stream which leads to dispatching things to the socket thread, too. But this one is till blocked by the TLS code, thus, the stream setup is not properly done yet. Nevertheless, execution of Torbutton code is proceeding with
outputStream.writeBytes(command, command.length);
which results in a blocking main thread with a still blocked socket thread: we have a deadlock.
comment:24 Changed 6 years ago by
blocking sockets
I tried to use non blocking for test purpose in that code months ago, without any luck. Connection just never launched. No deadlock and no connection.
You can to try too, maybe it was my wrong, and it will work.
comment:25 follow-up: 26 Changed 6 years ago by
Great work tracking this down. To some extent this is still a Mozilla problem in that users of the socket transport service need to avoid using it in certain ways.
Does the deadlock still occur if we use buffered sockets? The combination of blocking + unbuffered is unusual. I don't remember why buffering causes problems, but Mike probably does.
comment:26 Changed 6 years ago by
Replying to mcs:
Does the deadlock still occur if we use buffered sockets? The combination of blocking + unbuffered is unusual. I don't remember why buffering causes problems, but Mike probably does.
https://trac.torproject.org/projects/tor/ticket/8642#comment:1 ? It seems we get buffering for free if we have blocking sockets :) So, the comments in torbuttons.js are wrong. Well, they are not wrong but they just don't match reality.
comment:27 Changed 6 years ago by
Keywords: | MikePerry201404R added |
---|
Delaying the execution with setTimout() and friends does not help as we are still on the same thread and using workers seems not ideal either but I have not tried the latter yet. So, we need probably a smarter approach.
comment:28 follow-up: 29 Changed 6 years ago by
Keywords: | MikePerry201405R added; MikePerry201404R removed |
---|
What am I supposed to review, test, or investigate here?
comment:29 Changed 6 years ago by
Replying to mikeperry:
What am I supposed to review, test, or investigate here?
Knowing whether you tested (and dismissed) other options than using blocking sockets for communicating with tor back then when you wrote the code might be helpful. And, of course, any ideas on how to solve this bug are appreciated as well. :)
comment:30 follow-up: 32 Changed 6 years ago by
We went with blocking sockets because New Identity needed to ensure that it actually got a new circuit before allowing the user to do anything else with the browser.
I am wondering if we should try clearing the event queue as cypherpunks suggested. It may be a good idea for a few reasons. We may be able to use https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIDOMWindowUtils#suppressEventHandling%28%29 to suppress UI events in the meantime, if need be.
Alternatively, we could attempt to do an async New Identity and also use the event suppressor to suppress UI event handling until the New Identity onOutputStreamReady+onInputStream callbacks come back...
comment:31 Changed 6 years ago by
Keywords: | MikePerry201405R removed |
---|
comment:32 follow-up: 37 Changed 6 years ago by
Replying to mikeperry:
We went with blocking sockets because New Identity needed to ensure that it actually got a new circuit before allowing the user to do anything else with the browser.
I am wondering if we should try clearing the event queue as cypherpunks suggested. It may be a good idea for a few reasons. We may be able to use https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIDOMWindowUtils#suppressEventHandling%28%29 to suppress UI events in the meantime, if need be.
Yes, sounds reasonable. Alas, this idea is not working for the hang during start-up. There a blank Tor Launcher window is visible for a short amount of time until the usual messages appear giving a weird user experience. So I wonder whether we should separate the two problems and treat them differently: fixing the hang on New Identity with the processNextEvent/suppressEventHandling idea and trying the other one for the hang during start-up.
comment:34 Changed 5 years ago by
Keywords: | TorBrowserTeam201406 added |
---|
comment:35 Changed 5 years ago by
Keywords: | TorBrowserTeam201407 added; TorBrowserTeam201406 removed |
---|
comment:36 Changed 5 years ago by
Component: | TorBrowserButton → Tor Browser |
---|---|
Keywords: | tbb-torbutton added |
Owner: | changed from mikeperry to tbb-team |
comment:37 Changed 5 years ago by
Keywords: | MikePerry201407R added |
---|---|
Status: | new → needs_review |
Replying to gk:
So I wonder whether we should separate the two problems and treat them differently: fixing the hang on New Identity with the processNextEvent/suppressEventHandling idea and trying the other one for the hang during start-up.
I convinced myself that this might be a good approach and have a fix for this bug in my bug_9531_v2 branch in my public Torbutton repo.
comment:38 Changed 5 years ago by
Keywords: | TorBrowserTeam201408 MikePerry201408R added; TorBrowserTeam201407 MikePerry201407R removed |
---|
comment:39 Changed 5 years ago by
Owner: | changed from tbb-team to gk |
---|---|
Status: | needs_review → assigned |
comment:40 Changed 5 years ago by
Status: | assigned → needs_review |
---|
Trac cannot change assignment without resetting ticket state...
comment:41 Changed 5 years ago by
Resolution: | → fixed |
---|---|
Status: | needs_review → closed |
Ok, this is merged for Torbutton 1.6.12.0, which we'll put in 4.0-alpha-1. TBB 3.6 will stay on Torbutton 1.6.11.1.
I am going to tempt fate and call this fixed.
comment:42 follow-up: 45 Changed 5 years ago by
Then the following may happen (this is the case when we start-up; hitting New Identity gives a slightly different stacktrace which boils down to the same problem, though)
If not we see something like the following in the log
Was it log for start-up case or for hitting New Identity?
comment:43 follow-up: 46 Changed 5 years ago by
There a blank Tor Launcher window is visible for a short amount of time until the usual messages appear giving a weird user experience.
Blank windows with your patch?
comment:44 Changed 5 years ago by
I think it's better to place this patch to torbutton_send_ctrl_cmd
. Which is called yet by torbutton_local_tor_check
on start-up, it's better to exclude any possible hangs could be triggered by Torbutton, and investigate #10804 from scratch.
comment:45 Changed 5 years ago by
Replying to cypherpunks:
Then the following may happen (this is the case when we start-up; hitting New Identity gives a slightly different stacktrace which boils down to the same problem, though)
If not we see something like the following in the log
Was it log for start-up case or for hitting New Identity?
start-up case. The issue when hitting New Identity shows basically the same log with minor differences. I'd say it is definitely the same underlying bug we hit here.
comment:46 Changed 5 years ago by
Replying to cypherpunks:
There a blank Tor Launcher window is visible for a short amount of time until the usual messages appear giving a weird user experience.
Blank windows with your patch?
If I apply the patch idea to the Tor Launcher related issue, yes.
comment:47 Changed 5 years ago by
There a blank Tor Launcher window is visible for a short amount of time until the usual messages appear giving a weird user experience.
Tor Launcher could to finish every stuff for progress document before starting socket transport operations.
This specially crafted workaround reducing negative user experience.
diff --git a/src/components/tl-process.js b/src/components/tl-process.js index cc5d654..31d4e45 100644 --- a/src/components/tl-process.js +++ b/src/components/tl-process.js @@ -91,6 +91,7 @@ TorProcessService.prototype = if ("profile-after-change" == aTopic) { this.mObsSvc.addObserver(this, "quit-application-granted", false); + this.mObsSvc.addObserver(this, "document-shown", false); this.mObsSvc.addObserver(this, kOpenNetworkSettingsTopic, false); this.mObsSvc.addObserver(this, kUserQuitTopic, false); this.mObsSvc.addObserver(this, kBootstrapStatusTopic, false); @@ -169,6 +170,17 @@ TorProcessService.prototype = } } } + else if ("document-shown" == aTopic) + { + if (aSubject.getElementById("progressMeter")) + ++this.mDocumentProgressShown; + else + this.mDocumentProgressShown = 0; + if (!this.mDocumentProgressShown || this.mDocumentProgressShown == 3) { + this._monitorTorProcessStartup(); + this.mObsSvc.removeObserver(this, "document-shown"); + } + } else if ("timer-callback" == aTopic) { if (aSubject == this.mControlConnTimer) @@ -296,6 +308,7 @@ TorProcessService.prototype = mRestartWithQuit: false, mLastTorWarningPhase: null, mLastTorWarningReason: null, + mDocumentProgressShown: 0, // Private Methods ///////////////////////////////////////////////////////// @@ -426,7 +439,6 @@ TorProcessService.prototype = { try { - this._monitorTorProcessStartup(); var bridgeConfigIsBad = (this._defaultBridgesStatus == this.kDefaultBridgesStatus_BadConfig);
With more knowledge it could be less kludgy fix.
comment:48 Changed 5 years ago by
Torbutton 1.6.12.3 should be free of such hangs, it draining queue for every attempts to access Tor's control port . Next patches and research of Tor Launcher's hangs going to #10804.
Hang happens exactly after call of "outputStream.writeBytes(auth_cmd, auth_cmd.length)".
Tcpdump shows no one packet, no syn, no connection established.