So far I haven't been able to isolate this problem, but basically:
Use Tor Browser 5.0.2 (current at time of writing), GNU/Linux,
64-bit.
Stock, no additional extensions or addons of any sort; security
slider set to high; very strict/restrictive NoScript configuration;
some other minor preference tweaks.
Have Tor Browser running for a relatively long period, say days.
"Eventually," the circuit visualization is no longer there.
The TorButton menu looks and acts normally except for the fact that
the right half of it no longer shows the circuit graph, the whole
menu is now made only of the 5 buttons (from "New identity" to "Check
for Tor Browser update").
This is the case for any web page, regardless of domain, scheme, etc.
From this point on, the only thing that seems to restore the circuit \
visualization functionality is restarting the browser. I tried "New \
identity" and "New Tor circuit for this site" with no change.
I'll try to follow up with a comment when/if I manage to identify more
precisely "when/how" the issue manifests itself.
BTW, I'm pretty sure I read a comment on the blog about this (may have
been months ago), though I cannot find it now.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I've seen this, too, exactly once while debugging #15493 (moved) (and it is mentioned in that ticket). While I have a debug log it did not go deep enough to find the culprit. :(
I think I've tracked that one down. The bug is in info.stringToValue() in tor-control-port.js. More exactly
string.match(/250[ -].+?=(.*?)$/mi)
is wrong. There are (or could be) exit relays out there whose nickname ends with "250", e.g. https://atlas.torproject.org/#details/B486925DC901969CCE2B371E93740CF98C30539D. When this is happening and someone is picking this relay as an exit relay the above mentioned code kicks in (although it is not supposed to do that) and something like
gets passed to utils.listMapData() (via info.circuitStatusParser()) resulting in circID being NEED_CAPACITY. This in turn lets getCircuitStatusByID() return null leading to the missing circuit display.
r=mcs
I also looked for other similar errors in the Torbutton and Tor Launcher code, but I did not find any. Very good debugging work to find this problem!
Why the non-greedy globbing (the "?" modifier added to "+" or "*") when you are matching all the rest of the line anyway?
Compare:
520 - let matchResult = string.match(/^250[ -].+?=(.*?)$/mi) ||520 + let matchResult = string.match(/^250[ -].+?=(.*)$/mi) ||
You do this in several other places in this file. Just wondering, I don't think it makes a difference.
Also, based on the regex above, shouldn't the comment in line 515 be like so?
515 - // or `250 circuit-status...`515 + // or `250 circuit-status=...`
To match the rest of the comment.
Also, I do not know what the assumptions/pre-conditions are on the input string, but you do not check that the first line of the multiline case ("!^250+") ends with "=", as exemplified in the commentary for "info.keyValueStringsFromMessage".
Anyway, sorry for wasting your time, I'm not familiar with this code.
Why the non-greedy globbing (the "?" modifier added to "+" or "*") when you are matching all the rest of the line anyway?
Compare:
{{{
520 - let matchResult = string.match(/^250[ -].+?=(.?)$/mi) ||
520 + let matchResult = string.match(/^250[ -].+?=(.)$/mi) ||
}}}
You do this in several other places in this file. Just wondering, I don't think it makes a difference.
Also, based on the regex above, shouldn't the comment in line 515 be like so?
{{{
515 - // or 250 circuit-status...
515 + // or 250 circuit-status=...
}}}
To match the rest of the comment.
Also, I do not know what the assumptions/pre-conditions are on the input string, but you do not check that the first line of the multiline case ("!^250+") ends with "=", as exemplified in the commentary for "info.keyValueStringsFromMessage".
Anyway, sorry for wasting your time, I'm not familiar with this code.
Thanks for the very helpful comments. I added them to #17568 (moved).
Click on Detailed Bug Information and enter [tor] in the Whiteboard row
Click on Search and after you got your results checking the circuit display does not show any curcuit anymore
Reloading does not help either. I am not sure if this is caused by our fixup done in this bug yet or whether that is an additional issue we overlooked so far.
Click on Detailed Bug Information and enter [tor] in the Whiteboard row
Click on Search and after you got your results checking the circuit display does not show any curcuit anymore
Reloading does not help either. I am not sure if this is caused by our fixup done in this bug yet or whether that is an additional issue we overlooked so far.
For each loaded document, the tor circuit display looks up the SOCKS user name and password. Normally it calls getSOCKSCredentials(browser) which does the equivalent of:
It turns out that that the offending URL causes an nsIMultiPartChannel instead of an nsIHttpChannel to be associated with the document, so that currentDocumentChannel.QueryInterface(Ci.nsIProxiedChannel) fails.
But the main nsIHttpChannel can be found at nsIMultiPartChannel.baseChannel. So in that situation we need to use something like
Click on Detailed Bug Information and enter [tor] in the Whiteboard row
Click on Search and after you got your results checking the circuit display does not show any curcuit anymore
Reloading does not help either. I am not sure if this is caused by our fixup done in this bug yet or whether that is an additional issue we overlooked so far.
This is interesting, it is 100% reproducible for any site that serves Content-Type "multipart/x-mixed-replace", like bugzilla does for the "loading" animation.
However, note that in this case the circuit display doesn't get screwed up for the whole browser, only for that single document on that tab. But I think what I observed in 5.0.6 was the same behavior described in the OP (though now I'm doubting). Unfortunately I wouldn't know how to reproduce that.
Why getSOCKSCredentialsForBrowser returns 2 values as a single string, "user:pass" i.e. "host:nonce"?
That host can include a port which then results in "domain:port:nonce". Afterwards there's a
let domain = credentials.split(":")[0];
to get the domain, which may be deliberate because later there's
domain.endsWith(".onion")
so whoever wrote that expects not to have the ":port" part in "domain". But before there's
which results in the UI only showing the "domain" part and not the ":port" part. But the port is part of the isolation! So shouldn't the user see "domain:port" in the UI?
In any case I think it would be clearer to just return a structure not a compound string from getSOCKSCredentialsForBrowser.
Unrelated to that: the comment for setupDisplay says "Returns a function..." but that's not true, it doesn't return anything. I guess that's documentation bitrot.
Why getSOCKSCredentialsForBrowser returns 2 values as a single string, "user:pass" i.e. "host:nonce"?
That host can include a port which then results in "domain:port:nonce". Afterwards there's a
While I don't think the host here will include a port, I agree it's safer not to use a colon. So I have changed it to a "|" character.
In any case I think it would be clearer to just return a structure not a compound string from getSOCKSCredentialsForBrowser.
Agreed.
Unrelated to that: the comment for setupDisplay says "Returns a function..." but that's not true, it doesn't return anything. I guess that's documentation bitrot.
Yes, fixing that.
Here's an additional "code cleanup" patch with these changes, on the same branch:
While I don't think the host here will include a port, [...]
You are right, sorry about the stupid rant.
PS: I now tried it and of course it doesn't include the port. Why was I so sure? I read the C++ definition of getFirstPartyHostForIsolation and saw GetHost being called, this accessor seems to be automatically generated by Mozilla's C++-JS magic glue. So I read the documentation for "host" here: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIURI. Now, I usually prefer to browse with CSS disabled, especially when reading. If you disable CSS in that page you loose the table borders and so I ended up reading the description for "hostPort" instead. -_-"
ln5 mentions that just using "New Tor Circuit for this Site" is working fine for him to reproduce the problem in the description. I wonder whether we overlooked something in our fix, hrm...
ln5 mentions that just using "New Tor Circuit for this Site" is working fine for him to reproduce the problem in the description. I wonder whether we overlooked something in our fix, hrm...
I tried this several times but I couldn't reproduce it. ln5, what platform and TBB version are you using? Any other clues?
Trac: Cc: gk, arthuredelstein to gk, arthuredelstein, ln5
This is interesting, it is 100% reproducible for any site that serves Content-Type "multipart/x-mixed-replace", like bugzilla does for the "loading" animation.
However, note that in this case the circuit display doesn't get screwed up for the whole browser, only for that single document on that tab. But I think what I observed in 5.0.6 was the same behavior described in the OP (though now I'm doubting). Unfortunately I wouldn't know how to reproduce that.
Alright, no more doubts: Right now on 5.0.7 the circuit display is borked exactly like described in the OP: For all tabs, for any document, "New identity" doesn't help, neither does "New circuit for this site".
Edit: I forgot to say, again I have no idea how it happened.
So the same problem exists and the one related to "multipart/x-mixed-replace" discovered by gk is a different one.
I will leave this browser like this in case you want me to try some live experiments. Note, though, that this is not a debug build nor anything like that. It's a normal 5.0.7 release version which (I think) has been auto-updated since 5.0.4 (or maybe even earlier).
I will leave this browser like this in case you want me to try some live experiments. Note, though, that this is not a debug build nor anything like that. It's a normal 5.0.7 release version which (I think) has been auto-updated since 5.0.4 (or maybe even earlier).
Could you set extensions.torbutton.loglevel to 3 and open a simple site like https://people.torproject.org/~gk/ and send me the output of the browser console (Ctrl + Shift + J)? My mail address is gk [at] torproject.org. I fear, though, that we need much more detailed debug logs in Torbutton. Btw. what is the Torbutton version you have in this 5.0.7 instance (about:addons should show it).
Just tried to reproduce this bug on 5.5a6 (High) to log usual disappearing, but caught rare error (Secure connection failed), which was hard to classify (for making separate ticket).
Log:
[01-17 20:51:21] Torbutton INFO: tor SOCKS: https://bugzilla.mozilla.org/buglist.cgi?a via mozilla.org:0
getFirstPartyURI failed for chrome://browser/content/browser.xul: 0x80070057
[01-17 20:51:21] Torbutton INFO: controlPort >> 650 STREAM 27 NEW 0 bugzilla.mozilla.org:443 SOURCE_ADDR=127.0.0.1:53786 PURPOSE=USER
[01-17 20:51:21] Torbutton INFO: controlPort >> 650 STREAM 27 SENTCONNECT 8 bugzilla.mozilla.org:443
getFirstPartyURI failed for chrome://browser/content/browser.xul: 0x80070057
[01-17 20:51:25] Torbutton INFO: controlPort >> 650 STREAM 27 REMAP 8 63.245.215.80:443 SOURCE=EXIT
[01-17 20:51:25] Torbutton INFO: controlPort >> 650 STREAM 27 SUCCEEDED 8 63.245.215.80:443
[01-17 20:51:25] Torbutton INFO: controlPort >> 650 STREAM 27 CLOSED 8 63.245.215.80:443 REASON=END REMOTE_REASON=DONE
NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIMIMEService.getTypeFromExtension] external-app-blocker.js line 131 > eval:1:0
What was showed:
Secure connection failed
Send report to Mozilla and bla bla...
When tried to open source of this js:
File not found
Firefox can't find the file at view-jar:file:///C:/%REMOVED%/Tor Browser/Browser/TorBrowser/Data/Browser/profile.default/extensions/torbutton@torproject.org.xpi!/components/external-app-blocker.js line 131 > eval.
Log:
getFirstPartyURI failed for chrome://browser/content/browser.xul: 0x80070057
getFirstPartyURI failed for chrome://global/content/viewSource.xul: 0x80070057
NS_ERROR_FILE_NOT_FOUND: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIWebNavigation.loadURI] viewSource.js:178:0
getFirstPartyURI failed for about:blank: 0x80070057
The same config on 5.0.7 gives the same results.
P.S. usually loading this page leads only to disappearing of visual circuit.
Log shows unexpected renewal of circuit (?) with this:
[01-17 21:48:42] Torbutton INFO: streamEvent.CircuitID: 6
I applied c3cb05997d0ac0f54f431d71ae8852d79b865756. Wrt 41ce4aa0ba211f610a7a64315d3abf95cddaf16f. Looks mostly good. I just stumbled over "and stopped when it set to false". "is set to false"? Oh, you carefully worked above and below
// Check if we have anything to show for these credentials.
without adapting the indentation of this comment. :) Could you fix that, too, while fixing the typo above?
I applied c3cb05997d0ac0f54f431d71ae8852d79b865756. Wrt 41ce4aa0ba211f610a7a64315d3abf95cddaf16f. Looks mostly good. I just stumbled over "and stopped when it set to false". "is set to false"? Oh, you carefully worked above and below
{{{
// Check if we have anything to show for these credentials.
}}}
without adapting the indentation of this comment. :) Could you fix that, too, while fixing the typo above?
I applied c3cb05997d0ac0f54f431d71ae8852d79b865756. Wrt 41ce4aa0ba211f610a7a64315d3abf95cddaf16f. Looks mostly good. I just stumbled over "and stopped when it set to false". "is set to false"? Oh, you carefully worked above and below
{{{
// Check if we have anything to show for these credentials.
}}}
without adapting the indentation of this comment. :) Could you fix that, too, while fixing the typo above?
But I'll remember to grab the log next time it happens.
So it happened again. I have a log (*) of visiting blog.tp.o while the display is borked but it doesn't look too useful to me, relevant messages (if there were any) occurred earlier.
However something is interesting: The display seems to have shut itself down. Notice that there are no control port messages in the log. ISTM as if TorButton entered one of the codepaths that lead to stop() in setupDisplay() and never start()ed again.
(*) I actually set loglevel to 0, so as to try and grab everything.
Worse news: Tor project programmers apparently can't write parsers for their own protocols.
Better news: I think I discovered at least one trigger for this bug.
I you don't want me to spoil your bug hunting, then just have this and go dig at the code:
[01-29 09:48:25] Torbutton INFO: controlPort << getinfo circuit-status[01-29 09:48:25] Torbutton INFO: controlPort >> 250+circuit-status=243 BUILT $xxxxxx~nnnn,$xxxxxx~nnnn,$xxxxxx~nnnn BUNCH=OF KEY=VALUE PAIRS=HERE249 BUILT $xxxxxx~nnnn,$xxxxxx~nnnn,$xxxxxx~nnnn BUNCH=OF KEY=VALUE PAIRS=HERE247 BUILT $xxxxxx~nnnn,$xxxxxx~nnnn,$xxxxxx~nnnn BUNCH=OF KEY=VALUE PAIRS=HERE248 BUILT $xxxxxx~nnnn,$xxxxxx~nnnn,$xxxxxx~nnnn BUNCH=OF KEY=VALUE PAIRS=HERE250 BUILT $xxxxxx~nnnn BUNCH=OF KEY=VALUE PAIRS=HERE
I found that in my logs sometime after I noticed that the display was gone. Why, yes those are 2 complete log messages, I didn't fuck up my copypaste. You are right though, the second looks incomplete. Aaaand now you see it, yes it starts and ends with 250.
This is the last message in the logs from the circuit display controller. It went silent after this.
I had some time yesterday so I looked at the code. Narrating my adventures would take too much so I'm just going to paste a diff with the notes I took while reading. I point at a few places where I think bugs exist; some of the comments are there just to keep track of the lovely chained chains of functional filigrane arthuredelstein seems so infatuated with (I understand you arthur, I like functional programming too, but man, this is javascript, you know, it's kinda disgusting), so breadcrumbs basically; also I comment on a couple of unrelated issues. All my comments are prefixed with "punk:". The diff is against Torbutton as shipped in 5.5, so 1.9.4.3.
Notice that I didn't live-debug the code, just read, so take what I say with a boulder of salt.
--- a/torbutton/modules/tor-control-port.js+++ b/torbutton/modules/tor-control-port.js@@ -113,7 +113,10 @@ io.asyncSocket = function (host, port, o let totalString = pendingWrites.join(""); try { outputStream.write(totalString, totalString.length);- log("controlPort << " + aString + "\n");+ // punk: unrelated: for Firefox's console output, log()+ // will write each message in a separate line; for stderr+ // output, log() already appends '\n'+ log("controlPort << " + aString.trim()); } catch (err) { onError(err); }@@ -162,6 +165,21 @@ io.onLineFromOnMessage = function (onMes // Add to the list of pending lines. pendingLines.push(line); // If line is the last in a message, then pass on the full multiline message.+ //+ // punk: bug: this will cut the message short; it happens when one the+ // reply lines starts with the same 3 digits as the first one; in the case+ // of a successful "circuit-status" command, this happens when one of the+ // circuits in the reply has a circuit id of 250+ //+ // punk: bug: "pendingLines" can grow indefinitely if the lines pushed into+ // it do not align with the expected pattern: think what happens if the+ // first line pushed is ".", as could happen thanks to the other bug just+ // mentioned; this effectively disables the whole control-port dispatcher+ // contraption: no more messages will be received through this controller+ //+ // -> how dangerous is this? what else is this controller used for?+ //+ // -> apparently nothing else; good if (line.match(/^\d\d\d /) && (pendingLines.length === 1 || pendingLines[0].substring(0,3) === line.substring(0,3))) {@@ -218,6 +236,21 @@ io.matchRepliesToCommands = function (as }; // Watch for responses (replies or error messages) dispatcher.addCallback(/^[245]\d\d/, function (message) {+ // punk: bug: underflow if the callback triggers before the sendCommand()+ // in the following Promise is called (and, therefore, a triple is pushed+ // into "commandQueue")+ //+ // -> wait! apparently this is not actually an error (wat?): [].shift()+ // just returns undefined+ //+ // -> but then the _assignment_ fails: throws TypeError+ //+ // -> this may be happening in the case of the circuit display fuck-up: the+ // reply message is cut short, but the rest of the message (some circuit+ // status lines and then '.' and then '250 OK') will still be delivered by+ // Firefox through the socket, will be received by the smattering of+ // chained funclets, and could then be made into a new message that will be+ // passed into this function let [command, replyCallback, errorCallback] = commandQueue.shift(); if (message.match(/^2/) && replyCallback) replyCallback(message); if (message.match(/^[45]/) && errorCallback) {@@ -273,6 +306,9 @@ io.controlSocket = function (host, port, // ## utils // A namespace for utility functions+//+// punk: unrelated: why the fuck are these here? there's even a "utils.js" over+// there... let utils = utils || {}; // __utils.identity(x)__.@@ -398,6 +434,7 @@ info.keyValueStringsFromMessage = utils. // and applies transformFunction to each line. info.applyPerLine = function (transformFunction) { return function (text) {+ // punk: text is null, null.trim() throws TypeError return utils.splitLines(text.trim()).map(transformFunction); }; };@@ -433,6 +470,7 @@ info.routerStatusParser = function (valu // __info.circuitStatusParser(line)__. // Parse the output of a circuit status line. info.circuitStatusParser = function (line) {+ // punk: unrelated: "circuit" here is called "path" in the spec let data = utils.listMapData(line, ["id","status","circuit"]), circuit = data.circuit; // Parse out the individual circuit IDs and names.@@ -505,6 +543,8 @@ info.stringToValue = function (string) { if (key === null) return null; // matchResult finds a single-line result for `250-` or `250 `, // or a multi-line one for `250+`.+ //+ // punk: won't match any of these let matchResult = string.match(/^250[ -].+?=(.*)$/) || string.match(/^250\+.+?=([\s\S]*?)^\.$/m), // Retrieve the captured group (the text of the value in the key-value pair)@@ -521,6 +561,9 @@ info.stringToValue = function (string) { // __info.getMultipleResponseValues(message)__. // Process multiple responses to a GETINFO or GETCONF request. info.getMultipleResponseValues = function (message) {+ // punk: keyValueStringsFromMessage() won't match the message, the result+ // will be an empty array, so all thre rest is a no-op, and the return value+ // is also an empty array return info.keyValueStringsFromMessage(message) .map(info.stringToValue) .filter(utils.identity);@@ -534,6 +577,12 @@ info.getInfo = function (aControlSocket, } return aControlSocket .sendCommand("getinfo " + key)+ // punk: overflow; getMultipleResponseValues returns an empty array+ //+ // -> lol, apparently an overflow is not an error in this ridiculous+ // language, the result is just undefined+ //+ // -> so then info.getInfo returns undefined .then(response => info.getMultipleResponseValues(response)[0]); };@@ -639,4 +688,3 @@ let controller = function (host, port, p // Export the controller function for external use. var EXPORTED_SYMBOLS = ["controller"];---- a/torbutton/chrome/content/tor-circuit-display.js+++ b/torbutton/chrome/content/tor-circuit-display.js@@ -113,6 +113,7 @@ let nodeDataForCircuit = function* (cont // Returns the circuit status for the circuit with the given ID. let getCircuitStatusByID = function* (aController, circuitID) { let circuitStatuses = yield aController.getInfo("circuit-status");+ // punk: this test fails: circuitStatuses is undefined if (circuitStatuses) { for (let circuitStatus of circuitStatuses) { if (circuitStatus.id === circuitID) {@@ -120,6 +121,7 @@ let getCircuitStatusByID = function* (aC } } }+ // punk: so null is returned return null; };@@ -139,16 +141,20 @@ let collectIsolationData = function (aCo if (!knownCircuitIDs[streamEvent.CircuitID]) { logger.eclog(3, "streamEvent.CircuitID: " + streamEvent.CircuitID); knownCircuitIDs[streamEvent.CircuitID] = true;+ // punk: circuitStatus is null, so credentials is null as well let circuitStatus = yield getCircuitStatusByID(aController, streamEvent.CircuitID), credentials = circuitStatus ? (trimQuotes(circuitStatus.SOCKS_USERNAME) + "|" + trimQuotes(circuitStatus.SOCKS_PASSWORD)) : null;+ // punk: so the update doesn't happen if (credentials) { let nodeData = yield nodeDataForCircuit(aController, circuitStatus); credentialsToNodeDataMap[credentials] = nodeData; updateUI(); }+ // punk: but then nothing else, no other visible breakage... shouldn't+ // you be screaming something here? } }).then(null, Cu.reportError)); };@@ -356,4 +362,3 @@ return setupDisplay; // Finish createTorCircuitDisplay() })();-
Take-away points:
The control protocol doesn't have a formal grammar (no, control-spec.txt doesn't qualify, it, at the very least, is incomplete). Result: objectively-correct parsers can't be generated.
Tor and the controller in Torbutton do not speak the same protocol. Expect more breakage.
Compartmentalising controllers is a good idea. Keep doing that.
I found that in my logs sometime after I noticed that the display was gone. Why, yes those are 2 complete log messages, I didn't fuck up my copypaste. You are right though, the second looks incomplete. Aaaand now you see it, yes it starts and ends with 250.
Thanks for catching this bug. I'm posting a fix for that specific issue here, for review:
Your "multilineCommandInProgress" is really a multi line response. Since multi line commands (the commands you are sending to Tor) are allowed in the control spec, your variable is poorly named.
Your "multilineCommandInProgress" is really a multi line response. Since multi line commands (the commands you are sending to Tor) are allowed in the control spec, your variable is poorly named.
Maybe you actually want it like that, but then you should notice that this behaves differently than the complementary ">>" log. You removed the suggested .trim() but this is basically what the ">>" log does, in effect. It actually splits the message in lines, only to join them together again, but the result is the same: the logged message does not contain the trailing "\r\n". Here you do include the trailing "\r\n". The result is that in stderr you will see an extra line feed (so 2; currently without this fix there are 3!).
So I've modified the log function to remove trailing whitespace as well as converting "\r\n" -> "\n". This hopefully removes all the extraneous extra lines we see when we copy/paste the text.
+ // punk: bug: underflow if the callback triggers before the sendCommand()+ // in the following Promise is called (and, therefore, a triple is pushed+ // into "commandQueue")+ //+ // -> wait! apparently this is not actually an error (wat?): [].shift()+ // just returns undefined+ //+ // -> but then the _assignment_ fails: throws TypeError+ //+ // -> this may be happening in the case of the circuit display fuck-up: the+ // reply message is cut short, but the rest of the message (some circuit+ // status lines and then '.' and then '250 OK') will still be delivered by+ // Firefox through the socket, will be received by the smattering of+ // chained funclets, and could then be made into a new message that will be+ // passed into this function}}}
You were right.
My display disappeared and I found this log
{{{
[02-14 12:23:18] Torbutton INFO: controlPort << getinfo circuit-status
[02-14 12:23:18] Torbutton INFO: controlPort >> 250+circuit-status=
248 BUILT [redacted]
247 BUILT [redacted]
249 BUILT [redacted]
250 BUILT [redacted]
[02-14 12:23:18] Torbutton INFO: controlPort >> 251 EXTENDED [redacted]
[02-14 12:23:18] Torbutton WARN: TypeError: commandQueue.shift(...) is undefined
[02-14 12:23:18] Torbutton WARN: Disabling tor display circuit because of an error.