I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:
After a timeout occurs, the tor client closes the circuit, builds a new circuit and attempts to connect to the same site again. This repeats at least 3 times.
I did an experiment where I connected to https://people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be promptly reported back to the client, maybe we could avoid the waiting for the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout in a double or triple is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Trac: Description: I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command:
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what it looks like:
I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because they are using different circuits. Could this mean the connection error is caused by the client or guard?
to
I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:
I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because they are using different circuits. Could this mean the connection error is caused by the client or guard?
Trac: Description: I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:
I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because they are using different circuits. Could this mean the connection error is caused by the client or guard?
to
I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:
I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?
Trac: Description: I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:
I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?
to
I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:
I did an experiment where I connected to https://people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be promptly reported back to the client, maybe we could avoid the waiting for the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout in a double or triple is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?
Trac: Description: I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:
I did an experiment where I connected to https://people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be promptly reported back to the client, maybe we could avoid the waiting for the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout in a double or triple is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?
to
I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):
telnet localhost 9151 | ts
I open the browser console and get the tor password by entering
m_tb_control_pass
And then I paste the result like this:
authenticate [value of m_tb_control_pass]
Finally I enter
setevents circ stream.
What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:
After a timeout occurs, the tor client closes the circuit, builds a new circuit and attempts to connect to the same site again. This repeats at least 3 times.
I did an experiment where I connected to https://people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.
Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded.
20020000000000000000002010000000000001000100000103
In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).
I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.
(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)
So here are some questions for further investigation:
Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be promptly reported back to the client, maybe we could avoid the waiting for the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout in a double or triple is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
This one is straightforward: if it's 5 seconds or 2 seconds, then people on crummy (slow, lossy) internet connections will forever be giving up on the circuit before they even get the connected cell.
The underlying problem is that this is a static number for all users, not an adaptive number like the CircuitBuildTimeout.
I feel like I had a ticket long ago for making the stream timeout adaptive too, but maybe that maybe never found its way into being a ticket.
(Even 10 seconds is too short for some people on crummy networks, which causes them to forever be abandoning circuits right before they work, and moving to new ones which they then abandon, and their Tor experience is no fun.)
I had a conversation with arma on IRC and he made many good suggestions on how to go about investigating this further (reprinted with permission):
16:49 < arthuredelstein> In general, do connection timeout errors come from the exit node, or from the client?
16:50 < armadev> it means you sent your begin cell, and then you didn't get an end cell or a connected cell after 10 seconds
16:50 < armadev> it could be that you don't really have a tls connection to your guard at all, you just think you do
16:51 < armadev> it could be that the exit receives the begin cell and quietly drops it
16:51 < armadev> or maybe it gets the begin cell and starts its dns resolve and that takes a while
16:51 < armadev> one way to investigate further might be to see if you ever get a connected or end cell if you waited longer
16:52 < arthuredelstein> Ah, that's a good idea.
16:54 < arthuredelstein> Do you have an hypothesis why there are so many timeouts? Do you think exits are dropping cells?
16:54 < armadev> i am wondering if it has to do with the ipv6 thing
16:54 < armadev> we have a bunch of bugs in ipv6 handling
16:55 < arthuredelstein> that's interesting
16:56 < arthuredelstein> in other words, handling at the exit?
16:57 < armadev> yes
16:57 < armadev> is there some pattern with which exits are on problem circuits?
16:57 < armadev> you have the circuit events i hope so you can do the stats?
16:57 < armadev> it is also possible that some exits, or even really just a few but really big ones, and running out of file descriptors or something
16:58 < arthuredelstein> another good idea. I will look into that.
16:58 < armadev> s/and running/are running/
16:59 < armadev> people.tp.o has an ipv4 and ipv6 address. can you pick something simple and static that's only v4, and is that different?
17:01 < arthuredelstein> makes sense
17:02 < arthuredelstein> Something that made me wonder if it's something closer to the client or guard is that in my first batch of tests (to people.torproject.org) half of the attempted connections were double timeouts, meaning two circuits with different exits failed before a successful connection was made.
17:03 < arthuredelstein> it's -> the cause of the timeouts is
17:08 < armadev> another thing to explore is sending cells end-to-end on the circuit that we know should elicit an immediate response
17:08 < armadev> like a begin to 127.0.0.1
17:08 < armadev> which should immediately reply with 'end, exitpolicy'
17:08 < armadev> and bypass any attempts by the exit to do a dns resolve, open a socket, make a tcp connection, etc
17:16 < arthuredelstein> What's easiest way to send a begin cell?
17:17 < armadev> make a socks request?
17:17 < armadev> there might be something on the client side that tries to block a request to a destination it knows will fail
17:17 < armadev> and also tor browser does isolation by socks parameters so the new socks request will be isolated to a different circuit
17:18 < armadev> but i bet fixing those will still be more fun than my other answer, which is to check out how to call connection_ap_handshake_send_begin()
17:19 < arthuredelstein> Right. I think Tor Browser is blocking connections to 127.0.0.1.
17:19 < armadev> heck, the browser itself might be blocking those too
17:19 < arthuredelstein> or possibly not making a socks connection
17:19 < armadev> and the tor client will be blocking them even if the browser isn't
17:19 < armadev> i guess that's yet another experiment:
17:19 < armadev> do this same experiment with your tor client, no browser involved
17:20 < armadev> and no weird socks isolation
17:20 < arthuredelstein> Yes.
17:20 < armadev> and no weird preferipv6 socksport flag
17:21 < arthuredelstein> aha
17:24 < arthuredelstein> I guess I can also try connecting to port 80 of the exit's IP address as an alternative to 127.0.0.1.
17:25 < armadev> good idea
17:25 < armadev> (though then you have to guess the exit already)
17:25 < arthuredelstein> Yeah, I would need to turn off socks isolation.
17:25 < arthuredelstein> Or maybe do this outside the browser
17:26 < arthuredelstein> maybe I need to get acquainted with stem so I can automate these tests
17:27 < arthuredelstein> assuming the browser isn't causing the problem somehow
17:29 < armadev> having it automated would be extra cool because then it could be done again later without redoing all the work
17:33 < armadev> let me hunt down a ticket you'll find fun and related (though alas not the same)
17:35 < armadev> #5830 (moved)
17:40 < arthuredelstein> And I see you also mention the possibility of instrumenting a browser.
19:49 < armadev> yet another thought: if this happens pretty consistently, can you collude with an exit relay to get debug-level logs at the time of the failure? to see what it sees and what it doesn't see? safelogging might make that harder.
19:49 < arthuredelstein> yeah, that would be great
19:50 < armadev> the precursor to that idea is: can you induce this behavior in a chutney network?
19:50 < armadev> i would assume no, because it requires real users, real load, real broken exits. but who knows!
19:50 < armadev> oh, and another: if you're curious if it's your guard, do the experiment again with a different guard!
19:51 < arthuredelstein> yeah, I should definitely do that!
19:52 < armadev> if your guard is overloaded, you could easily be seeing a delay there
19:52 < armadev> or the intermediate node too, for that matter
19:52 < arthuredelstein> right
19:52 < armadev> where you have to wait for somebody's freight train of packets to move before you can get your connected cell
19:54 < armadev> i guess category 1 of problem, you send your begin and it vanishes. you'll never get an answer.
19:55 < armadev> category 2, everything's working, it's just slow/congested, and you need more patience than the hard-coded 10s timeout.
19:55 < armadev> cranking up the timeout should help distinguish, for starters.
19:55 < arthuredelstein> yes
20:03 < arthuredelstein> Are there cases where a properly-behaving exit is expected to have category 1 behavior? Or should it always return an error message to the client if a tcp connection fails?
20:10 < armadev> every non-response is a bug
20:10 < armadev> are there bugs? there used to be! we don't know of any now.
20:11 < armadev> but of course, weird tcp stacks, and firewalls with rules that drop packets, can induce long timeouts
This happened to me with Tor Browser 6.5.1 after my machine was asleep for 5 minutes. Then, after it went to sleep and woke again, everything worked fine.
Guys, the number of timeouts in recent versions of Tor is really unacceptable.
It's more bad that even succeeded circuits show timeout after a period of time (e.g. when browsing Trac tickets slowly).
So we need to do something with it now.
where each digit represents an attempt to connect, and the number of timeouts before the connection succeeded. So to me it doesn't look like IPv6 is the (only) problem. 15/50 attempts included at least one timeout. About half had more than one timeout.
So I thought I should check IPv6 as well. I found the IPv6 address for perdulce.torproject.org:
and the timeouts returned (20/50). That made me think this has something to do with the DNS resolve. To check this, I tried another site, example.com, including https with domain, http with domain, bare IPv4, and bare IPv6:
Indeed I got 9/50 timeouts for the domain with http or https, but no timeouts for IPv4 and only a single timeout for IPv6.
Does this ring any bells for Tor core experts? What might be happening with DNS here? Again I think the multiple timeouts are a little suspicious, and I don't quite understand how that jibes with it being a (pure) exit node problem.