#25600 closed defect (fixed)
Tor eventually hits "No running bridges" with Snowflake
Reported by: | cypherpunks | Owned by: | |
---|---|---|---|
Priority: | Medium | Milestone: | |
Component: | Circumvention/Snowflake | Version: | |
Severity: | Normal | Keywords: | |
Cc: | dcf, arlolra | Actual Points: | |
Parent ID: | Points: | ||
Reviewer: | Sponsor: |
Description
Steps to reproduce:
- Use the fresh testbuilds by dcf
https://people.torproject.org/~dcf/pt-bundle/snowflake/20180321-8.0a4-4a5889af2891/
- Start a download
https://mirrors.edge.kernel.org/centos/7.4.1708/isos/x86_64/CentOS-7-x86_64-Everything-1708.iso
and watchabout:downloads
or your Task Manager. - Open another instance of
snowflake-client
(for example in Ricochet edit thetorrc
in./local/share/Ricochet/ricochet/tor
with the relevant config). - Watch the download failing, or the receiving/s in the Task Manager go down to 0.
Doesn't seem to be reproducible with the earlier version of snowflake-client
.
Child Tickets
Attachments (2)
Change History (23)
comment:1 Changed 21 months ago by
comment:2 follow-up: 3 Changed 21 months ago by
Thanks for this.
I tried it, but I wasn't able to reproduce it. On Debian, I used a browser from https://people.torproject.org/~dcf/pt-bundle/snowflake/20180321-8.0a4-4a5889af2891/ and a command line snowflake-client from 07291a0136.
- Started a download of https://mirrors.edge.kernel.org/centos/7.4.1708/isos/x86_64/CentOS-7-x86_64-Everything-1708.iso. Was getting about 400 KB/s.
- In the snowflake/client directory, ran
tor -f torrc SOCKSPort 10000
. It finished bootstrapping and the Browser download was still running. - Download through the command-line client:
curl -O --proxy socks5h://127.0.0.1:10000/ https://mirrors.edge.kernel.org/centos/7.4.1708/isos/x86_64/CentOS-7-x86_64-Everything-1708.iso
. I was getting about 500 KB/s on this download and the browser was still about 400 KB/s.
Maybe it was a coincidence that one of the clients died when the other one started? I'm currently testing some proxy changes from comment:63:ticket:21312, and it seems there is still a problem with proxies hanging up occasionally. Does it happen for you repeatedly?
Or, there could be some weird interaction, like maybe it only happens when both of the clients are using the same proxy instance.
comment:3 Changed 21 months ago by
Resolution: | → not a bug |
---|---|
Status: | new → closed |
Replying to dcf:
Thanks for this.
I tried it, but I wasn't able to reproduce it. On Debian, I used a browser from https://people.torproject.org/~dcf/pt-bundle/snowflake/20180321-8.0a4-4a5889af2891/ and a command line snowflake-client from 07291a0136.
- Started a download of https://mirrors.edge.kernel.org/centos/7.4.1708/isos/x86_64/CentOS-7-x86_64-Everything-1708.iso. Was getting about 400 KB/s.
- In the snowflake/client directory, ran
tor -f torrc SOCKSPort 10000
. It finished bootstrapping and the Browser download was still running.- Download through the command-line client:
curl -O --proxy socks5h://127.0.0.1:10000/ https://mirrors.edge.kernel.org/centos/7.4.1708/isos/x86_64/CentOS-7-x86_64-Everything-1708.iso
. I was getting about 500 KB/s on this download and the browser was still about 400 KB/s.Maybe it was a coincidence that one of the clients died when the other one started? I'm currently testing some proxy changes from comment:63:ticket:21312, and it seems there is still a problem with proxies hanging up occasionally. Does it happen for you repeatedly?
Interesting, I had actually repeated the steps twice at different times (I left the download going for quiet some bit at one time, probably 7 min) and I always had the same result. I was experiencing repeated shortages but I thought that this may have been due to the other snowflake-client process that I had running. But the good news is that I can't reproduce it now which means you're right :)
FWIW the hangs still happen for me from time to time.
comment:4 follow-up: 5 Changed 21 months ago by
I'm definitely noticing that after dozens of minutes of inactivity tor eventually hits:
Delaying directory fetches: No running bridges Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.
Restarting the affected program solves the issue. But I'm not sure if this is a consequence of the proxy changes that you were mentioning.
comment:5 follow-up: 6 Changed 21 months ago by
Replying to cypherpunks:
I'm definitely noticing that after dozens of minutes of inactivity tor eventually hits:
Delaying directory fetches: No running bridges Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.Restarting the affected program solves the issue. But I'm not sure if this is a consequence of the proxy changes that you were mentioning.
I've seen this too, not only with snowflake. I suspect it's actually a tor bug. Sometimes a loading tab in Tor Browser will stall, and I'll see this message. Then, I open another tab and go to a domain that doesn't have an existing circuit, and it will "un-stall" the first tab. It's like tor sometimes starts believing that none of its bridges are working, and refuses to even try again, until you ask for a new circuit.
It may happen more often with snowflake because of #25429 (disconnects after 30 s). Some other tor-related tickets are #20379 (fixed in November 2016) and #24392 (fixed in November 2018). I'm not sure if #24392 is fixed in a released Tor Browser yet.
comment:6 Changed 21 months ago by
Replying to dcf:
I've seen this too, not only with snowflake. I suspect it's actually a tor bug. Sometimes a loading tab in Tor Browser will stall, and I'll see this message. Then, I open another tab and go to a domain that doesn't have an existing circuit, and it will "un-stall" the first tab. It's like tor sometimes starts believing that none of its bridges are working, and refuses to even try again, until you ask for a new circuit.
It may happen more often with snowflake because of #25429 (disconnects after 30 s). Some other tor-related tickets are #20379 (fixed in November 2016) and #24392 (fixed in November 2018). I'm not sure if #24392 is fixed in a released Tor Browser yet.
So I did a methodology to know for sure whether it's due to the switch to the newer snowflake-client
or not (in which case it would primarily be an issue with tor, FWIW I'm using 0.3.3.x-a in all of this).
- First, I switched everything back to the older
snowflake-client
(sha256sum ff94f6ed041263eb4e3bc8c215d9469a3a8cca7957ba4bccea617dc220294a74
), including the one in my/usr/bin
. - I restarted the two instances of
tor
that I had running, and I opened the Tor Browser and went to two sites then clicked on New Identity. - I stopped from any activity for 40 minutes.
- I went back and first looked at the task manager and saw that the receiving/sending rates were varying around
0.9-4KiB/s - 2-5KiB/s
with no dips and can go higher than that when snowflake rotates to another snowflake proxy (I don't have any other significant application running that could affect the network activity). Then, I opened check.torproject.org in the Tor Browser, it took a bit of time but it eventually worked. I didapt-get update
(I'm torifying my apt-gets so it uses thedebian-tor
service which was setup to use the oldersnowflake-client
) and it worked as well.
Afterwards I switched everything back to the newer snowflake-client
(44ab0a493573b4f95a3b01f7495676cca6cd7ea2ae2afe1b69c4591ae4dbba2a
), and I did the same step (2) above.
- I stopped from any activity for 30 minutes.
- Looking at the task manager one clearly sees that everything stopped working,
And looking at wireshark one mostly sees STUN packets with
Binding Request
andBinding Success Response XOR-MAPPED-ADDRESS: ...
Opening the console of the Tor Browser one seesDelaying directory fetches: No running bridges
and opening up check.torproject.org eventually times out. Going to another site yields another timeout. Same for runningapt-get update
, and one seesDelaying directory fetches: No running bridges
as well in the/var/log/syslog
.
I hope you can reproduce this so that you can then narrow down the problem to a specific commit. (Note that in my testing I had 3 instances of tor running in total all of them with snowflake, but the issue may be present with just one running it's just that I didn't have time to test)
Edit: By the way this is reproducible with the latest fa2b6d0581df756f700c53d79ea40a141f6a96962b8de52c65d6817e6456af15
from TB 8.0a5.
#25347 might also be related.
Changed 21 months ago by
Attachment: | newer-snowflake-client-task-manager-test.png added |
---|
comment:7 Changed 21 months ago by
Since #25347 might be related, I'll wait until it's fixed and there's a new TB alpha release with it to see if solves the issue.
Edit: I built my own tor with asn's patch in #25347#comment:15 (./autogen.sh && ./configure --disable-asciidoc && make
, sha256sum 197af294a5abd93fcf9ed9f58f9093f1dfee0a6413bf445f478af306255a0091
) but, alas, the problem still exists.
Edit: Reproducible with tor nightly 0.3.4.0-alpha-dev-20180404T175310Z
.
comment:8 Changed 20 months ago by
Resolution: | not a bug |
---|---|
Status: | closed → reopened |
comment:9 Changed 20 months ago by
Update: Unfortunately it still happens (used the new domains that dcf made in #22782 to get it working),
Apr 21 08:39:23 debian Tor[1274]: Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit) Apr 21 08:39:23 debian Tor[1274]: Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for rendezvous desc) Apr 21 08:39:23 debian Tor[1274]: Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up. (waiting for circuit) Apr 21 08:39:23 debian Tor[1274]: Tried for 120 seconds to get a connection to [scrubbed]:443. Giving up. (waiting for circuit) Apr 21 08:39:25 debian Tor[1274]: Delaying directory fetches: No running bridges
Another new thing, I see a Tor has not observed any network activity
notice when this happens (it doesn't appear always).
05:20.000 [notice] Tor has not observed any network activity for the past 74 seconds. Disabling circuit build timeout recording. 05:21.000 [notice] Delaying directory fetches: No running bridges 05:39.000 [notice] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.
comment:10 Changed 19 months ago by
Summary: | Running 2 instances of snowflake-client leads to the former one stopping → Tor eventually hits "No running bridges" with Snowflake |
---|
comment:11 follow-up: 12 Changed 19 months ago by
I was able to record detailed debug logs of tor when this happens,
https://privatebin.net/?55627cf4b879756c#E5s4afjMzPUnZ7tMMdWtvyKGdfdYbMF+jxyMPcyQtbw=
(I sent dcf the password from a throwaway guerrillamail since I don't have time to redact all the sensitive info from the logs)
comment:12 follow-up: 13 Changed 19 months ago by
Replying to cypherpunks:
I was able to record detailed debug logs of tor when this happens,
Thanks, I got it.
I have encountered this "No running bridges" problem too. I wonder if it happens when a snowflake proxy comes online but it behind a kind of NAT, or something, such that it can't actually do WebRTC. One way to debug this would be to activate the snowflake log (-log snowflake.log
) and see what is the final proxy it connects to before "No running bridges". 37.218.242.151 is the fallback proxies; anything else is a wild proxy.
comment:13 follow-up: 14 Changed 19 months ago by
Replying to dcf:
I have encountered this "No running bridges" problem too. I wonder if it happens when a snowflake proxy comes online but it behind a kind of NAT, or something, such that it can't actually do WebRTC.
I don't think that's the case since I almost always hit the fallback proxy (the 37... one) and this problem always happens when I use snowflake.
But I think I finally nailed it: When I followed your suggestion to use -log snowflake.log
I no longer had that problem (it's been a couple of hours now of testing, hopefully there will be no other surprise). So I think there's definitely some issue in the patch for #25471. This also confirms my results in comment:6 where I didn't have this problem with the earlier build of snowflake.
comment:14 follow-up: 15 Changed 19 months ago by
Status: | reopened → needs_review |
---|
Replying to cypherpunks:
Replying to dcf:
I have encountered this "No running bridges" problem too. I wonder if it happens when a snowflake proxy comes online but it behind a kind of NAT, or something, such that it can't actually do WebRTC.
I don't think that's the case since I almost always hit the fallback proxy (the 37... one) and this problem always happens when I use snowflake.
But I think I finally nailed it: When I followed your suggestion to use
-log snowflake.log
I no longer had that problem (it's been a couple of hours now of testing, hopefully there will be no other surprise). So I think there's definitely some issue in the patch for #25471. This also confirms my results in comment:6 where I didn't have this problem with the earlier build of snowflake.
My guess: tor is not reading from the client transport plugin's stderr pipe. In the absence of the -log
option, snowflake-client writes log messages to stderr. Eventually the pipe buffer fills up, and snowflake-client hangs while writing a log message.
The reason it didn't happen in earlier versions of snowflake-client is that earlier versions always logged to a file by default. 12922a232b made it log to stderr by default and added a -log
option. The change would have happened in Tor Browser 8.0a5, because that's when #21312 caused an upgrade to the version of snowflake-client that doesn't log by default.
Why this affects snowflake and not other PTs: probably just because snowflake-client writes more to the log (because of #25429 and other causes). Other PTs are probably affected; they just don't fill up the pipe buffer as fast.
Could you try this patch for snowflake-client?
--- a/client/snowflake.go +++ b/client/snowflake.go @@ -150,6 +150,8 @@ func main() { } defer logFile.Close() log.SetOutput(logFile) + } else { + log.SetOutput(ioutil.Discard) } log.Println("\n\n\n --- Starting Snowflake Client ---")
comment:15 Changed 19 months ago by
Replying to dcf:
Could you try this patch for snowflake-client?
I can't for now but I'll definitely inform you when I test the next alpha release when this gets merged :) (By the way many thanks for investigating the issue!)
Changed 18 months ago by
Attachment: | 0001-Log-to-io.Discard-if-no-log-file-is-set.patch added |
---|
comment:16 Changed 18 months ago by
Status: | needs_review → merge_ready |
---|
The hunch and patch seem reasonable to me.
comment:17 follow-up: 19 Changed 18 months ago by
Resolution: | → fixed |
---|---|
Status: | merge_ready → closed |
I tested the patch myself, copying the patched snowflake-client into a Tor Browser 8.0a8.
The first time I tried, I got "No running bridges" after 11 minutes :/
06/11/18 06:07:56.600 [NOTICE] DisableNetwork is set. Tor will not make or accept non-control network connections. Shutting down all existing connections. 06/11/18 06:07:56.600 [NOTICE] Opening Socks listener on 127.0.0.1:9150 06/11/18 06:07:57.500 [NOTICE] Bootstrapped 80%: Connecting to the Tor network 06/11/18 06:07:57.500 [NOTICE] Bootstrapped 85%: Finishing handshake with first hop 06/11/18 06:07:59.700 [NOTICE] Bootstrapped 90%: Establishing a Tor circuit 06/11/18 06:08:00.600 [NOTICE] new bridge descriptor 'flakey' (fresh): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 0.0.3.0 06/11/18 06:08:01.200 [WARN] Your Guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72) is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 56/251. Use counts are 44/44. 247 circuits completed, 0 were unusable, 191 collapsed, and 2 timed out. For reference, your timeout cutoff is 60 seconds. 06/11/18 06:08:01.800 [NOTICE] Tor has successfully opened a circuit. Looks like client functionality is working. 06/11/18 06:08:01.800 [NOTICE] Bootstrapped 100%: Done 06/11/18 06:08:03.900 [NOTICE] New control connection opened from 127.0.0.1. 06/11/18 06:08:04.100 [NOTICE] New control connection opened from 127.0.0.1. 06/11/18 06:18:37.900 [WARN] Your Guard $2B280B23E1107BB62ABFC40DDCC8824814F80A72 ($2B280B23E1107BB62ABFC40DDCC8824814F80A72) is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 87/288. Use counts are 48/48. 284 circuits completed, 0 were unusable, 197 collapsed, and 2 timed out. For reference, your timeout cutoff is 60 seconds. 06/11/18 06:18:59.400 [NOTICE] Delaying directory fetches: No running bridges 06/11/18 06:18:59.400 [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again. 06/11/18 06:24:02.100 [NOTICE] Delaying directory fetches: No running bridges 06/11/18 06:24:22.200 [NOTICE] Application request when we haven't received a consensus with exits. Optimistically trying known bridges again.
But I tried again, left it running overnight, restarted the next day, and didn't have another problem. So I'm going to go ahead with the theory that the first failure was a fluke or is some other bug.
Anyway, I merged the patch in 6077141f4affdab9b7ce97a9b1c6859825eaaa29.
comment:19 Changed 18 months ago by
Replying to dcf:
The first time I tried, I got "No running bridges" after 11 minutes :/
There's still an unfixed little-t-tor bug that leads to that, so that may be it.
comment:20 follow-up: 21 Changed 18 months ago by
Will there be a "Bump snowflake/go-webrtc again for trac 25600" ticket for getting this fix into TB?
comment:21 Changed 17 months ago by
Replying to cypherpunks:
Will there be a "Bump snowflake/go-webrtc again for trac 25600" ticket for getting this fix into TB?
Some more details: This was tested on Debian. Also the
snowflake-client
used for Ricochet is a copy placed in/usr/bin
.