connection timeouts are affecting Tor Browser usability

changed milestone to %Tor: 0.2.9.x-final

added 029-backport 031-unreached-backport 032-unreached-backport 033-included-20180320 033-triage-20180320 component::core tor/tor milestone::Tor: 0.2.9.x-final performance priority::very high resolution::fixed reviewer::mikeperry severity::normal status::closed tbb-needs tbb-performance tbb-usability type::defect labels

Trac:
Description: I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command:

telnet localhost 9151 | ts

I open the browser console and get the tor password by entering m_tb_control_pass And then I paste the result like this: authenticate [value of m_tb_control_pass] Finally I enter setevents circ stream.

What I noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what it looks like:

Feb 03 19:00:03 650 STREAM 868 NEW 0 people.torproject.org:443 SOURCE_ADDR=127.0.0.1:50318 PURPOSE=USER
Feb 03 19:00:03 650 CIRC 149 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597
Feb 03 19:00:03 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 BUILT [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 STREAM 868 SENTCONNECT 149 people.torproject.org:443
Feb 03 19:00:13 650 CIRC 80 CLOSED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T02:49:06.591888 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="5e171046f4d18a1138280b0199e5ccbd" REASON=FINISHED
Feb 03 19:00:14 650 STREAM 868 DETACHED 149 people.torproject.org:443 REASON=TIMEOUT
Feb 03 19:00:14 650 CIRC 150 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:14.588714

I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.

Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded. 20020000000000000000002010000000000001000100000103

In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).

I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.

(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)

So here are some questions for further investigation:

Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because they are using different circuits. Could this mean the connection error is caused by the client or guard?

to

I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):

telnet localhost 9151 | ts

I open the browser console and get the tor password by entering m_tb_control_pass And then I paste the result like this: authenticate [value of m_tb_control_pass] Finally I enter setevents circ stream.

What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:

Feb 03 19:00:03 650 STREAM 868 NEW 0 people.torproject.org:443 SOURCE_ADDR=127.0.0.1:50318 PURPOSE=USER
Feb 03 19:00:03 650 CIRC 149 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597
Feb 03 19:00:03 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 BUILT [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 STREAM 868 SENTCONNECT 149 people.torproject.org:443
[...]
Feb 03 19:00:14 650 STREAM 868 DETACHED 149 people.torproject.org:443 REASON=TIMEOUT
Feb 03 19:00:14 650 CIRC 150 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:14.588714

I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.

Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded. 20020000000000000000002010000000000001000100000103

In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).

I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.

(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)

So here are some questions for further investigation:

Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because they are using different circuits. Could this mean the connection error is caused by the client or guard?

Trac:
Description: I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):

telnet localhost 9151 | ts

I open the browser console and get the tor password by entering m_tb_control_pass And then I paste the result like this: authenticate [value of m_tb_control_pass] Finally I enter setevents circ stream.

What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:

Feb 03 19:00:03 650 STREAM 868 NEW 0 people.torproject.org:443 SOURCE_ADDR=127.0.0.1:50318 PURPOSE=USER
Feb 03 19:00:03 650 CIRC 149 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597
Feb 03 19:00:03 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 BUILT [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 STREAM 868 SENTCONNECT 149 people.torproject.org:443
[...]
Feb 03 19:00:14 650 STREAM 868 DETACHED 149 people.torproject.org:443 REASON=TIMEOUT
Feb 03 19:00:14 650 CIRC 150 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:14.588714

I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.

Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded. 20020000000000000000002010000000000001000100000103

In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).

I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.

(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)

So here are some questions for further investigation:

Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because they are using different circuits. Could this mean the connection error is caused by the client or guard?

to

I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):

telnet localhost 9151 | ts

I open the browser console and get the tor password by entering m_tb_control_pass And then I paste the result like this: authenticate [value of m_tb_control_pass] Finally I enter setevents circ stream.

What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:

Feb 03 19:00:03 650 STREAM 868 NEW 0 people.torproject.org:443 SOURCE_ADDR=127.0.0.1:50318 PURPOSE=USER
Feb 03 19:00:03 650 CIRC 149 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597
Feb 03 19:00:03 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 BUILT [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 STREAM 868 SENTCONNECT 149 people.torproject.org:443
[...]
Feb 03 19:00:14 650 STREAM 868 DETACHED 149 people.torproject.org:443 REASON=TIMEOUT
Feb 03 19:00:14 650 CIRC 150 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:14.588714

I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.

Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded. 20020000000000000000002010000000000001000100000103

In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).

I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.

(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)

So here are some questions for further investigation:

Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?

Trac:
Description: I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):

telnet localhost 9151 | ts

I open the browser console and get the tor password by entering m_tb_control_pass And then I paste the result like this: authenticate [value of m_tb_control_pass] Finally I enter setevents circ stream.

What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:

Feb 03 19:00:03 650 STREAM 868 NEW 0 people.torproject.org:443 SOURCE_ADDR=127.0.0.1:50318 PURPOSE=USER
Feb 03 19:00:03 650 CIRC 149 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597
Feb 03 19:00:03 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 BUILT [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 STREAM 868 SENTCONNECT 149 people.torproject.org:443
[...]
Feb 03 19:00:14 650 STREAM 868 DETACHED 149 people.torproject.org:443 REASON=TIMEOUT
Feb 03 19:00:14 650 CIRC 150 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:14.588714

I did an experiment where I connected to people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.

Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded. 20020000000000000000002010000000000001000100000103

In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).

I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.

(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)

So here are some questions for further investigation:

Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be reported, maybe we could avoid the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?

to

I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):

telnet localhost 9151 | ts

I open the browser console and get the tor password by entering m_tb_control_pass And then I paste the result like this: authenticate [value of m_tb_control_pass] Finally I enter setevents circ stream.

What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:

Feb 03 19:00:03 650 STREAM 868 NEW 0 people.torproject.org:443 SOURCE_ADDR=127.0.0.1:50318 PURPOSE=USER
Feb 03 19:00:03 650 CIRC 149 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597
Feb 03 19:00:03 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 BUILT [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 STREAM 868 SENTCONNECT 149 people.torproject.org:443
[...]
Feb 03 19:00:14 650 STREAM 868 DETACHED 149 people.torproject.org:443 REASON=TIMEOUT
Feb 03 19:00:14 650 CIRC 150 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:14.588714

I did an experiment where I connected to https://people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.

Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded. 20020000000000000000002010000000000001000100000103

In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).

I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.

(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)

So here are some questions for further investigation:

Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be promptly reported back to the client, maybe we could avoid the waiting for the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout in a double or triple is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?

Trac:
Cc: N/A to gk

Trac:
Description: I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):

telnet localhost 9151 | ts

I open the browser console and get the tor password by entering m_tb_control_pass And then I paste the result like this: authenticate [value of m_tb_control_pass] Finally I enter setevents circ stream.

What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:

Feb 03 19:00:03 650 STREAM 868 NEW 0 people.torproject.org:443 SOURCE_ADDR=127.0.0.1:50318 PURPOSE=USER
Feb 03 19:00:03 650 CIRC 149 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597
Feb 03 19:00:03 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 BUILT [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 STREAM 868 SENTCONNECT 149 people.torproject.org:443
[...]
Feb 03 19:00:14 650 STREAM 868 DETACHED 149 people.torproject.org:443 REASON=TIMEOUT
Feb 03 19:00:14 650 CIRC 150 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:14.588714

I did an experiment where I connected to https://people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.

Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded. 20020000000000000000002010000000000001000100000103

In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).

I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.

(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)

So here are some questions for further investigation:

Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be promptly reported back to the client, maybe we could avoid the waiting for the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout in a double or triple is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?

to

I have spent some time watching circuit and stream events while connecting to different sites. I telnet into tor's config port using the following command (using ts to give time stamps):

telnet localhost 9151 | ts

I open the browser console and get the tor password by entering m_tb_control_pass And then I paste the result like this: authenticate [value of m_tb_control_pass] Finally I enter setevents circ stream.

What I have noticed is that a significant fraction of new site connections result in at least 1 timeout of 10 seconds. (Tor Browser's CircuitStreamTimeout is set to 0, which results in a timeout equal to MIN_CIRCUIT_STREAM_TIMEOUT, or 10 seconds.) Here's what a timeout looks like:

Feb 03 19:00:03 650 STREAM 868 NEW 0 people.torproject.org:443 SOURCE_ADDR=127.0.0.1:50318 PURPOSE=USER
Feb 03 19:00:03 650 CIRC 149 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597
Feb 03 19:00:03 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 EXTENDED [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 CIRC 149 BUILT [...] BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:03.738597 SOCKS_USERNAME="torproject.org" SOCKS_PASSWORD="7d8ea4ccf4ba6345846e0fccacd4d941"
Feb 03 19:00:04 650 STREAM 868 SENTCONNECT 149 people.torproject.org:443
[...]
Feb 03 19:00:14 650 STREAM 868 DETACHED 149 people.torproject.org:443 REASON=TIMEOUT
Feb 03 19:00:14 650 CIRC 150 LAUNCHED BUILD_FLAGS=NEED_CAPACITY PURPOSE=GENERAL TIME_CREATED=2017-02-04T03:00:14.588714

After a timeout occurs, the tor client closes the circuit, builds a new circuit and attempts to connect to the same site again. This repeats at least 3 times.

I did an experiment where I connected to https://people.torproject.org/~arthuredelstein (a page with hardly any content) and then repeatedly selected "New Tor Circuit for this Site" 50 times.

Here are the results for 50 reloads. Each digit represents the number of 10-second stream timeouts observed before a given connection succeeded. 20020000000000000000002010000000000001000100000103

In other words 8 out of 50 connections showed a timeout. And interestingly, four of these connections exhibited a double or triple timeout (20 or 30 seconds delay).

I think this may be a big part of the perception of Tor Browser as "slow". Actual loading of pages doesn't seem drastically slow to me, and once I have successfully connected to a new site, following links to other pages on the same site (i.e., the same circuit) is usually acceptable.

(I also did another quick test on another site and 5/25 connections had at least 1 timeout.)

So here are some questions for further investigation:

Why are there so many timeouts? Are any of these timeouts due to silent errors in a Tor node? (If such errors could be promptly reported back to the client, maybe we could avoid the waiting for the long timeout.)
What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?
So many double or triple timeouts are suspicious, because each timeout in a double or triple is reported for a different circuit. Could this mean the connection error is caused by the client or guard rather than a connection failure at the exit node?

Trac:
Cc: gk to gk, brade, mcs

Replying to arthuredelstein:

What's the reason for MIN_CIRCUIT_STREAM_TIMEOUT being 10 seconds? Would it do any harm to make this shorter, say 5 seconds or 2 seconds?

This one is straightforward: if it's 5 seconds or 2 seconds, then people on crummy (slow, lossy) internet connections will forever be giving up on the circuit before they even get the connected cell.

The underlying problem is that this is a static number for all users, not an adaptive number like the CircuitBuildTimeout.

I feel like I had a ticket long ago for making the stream timeout adaptive too, but maybe that maybe never found its way into being a ticket.

(Even 10 seconds is too short for some people on crummy networks, which causes them to forever be abandoning circuits right before they work, and moving to new ones which they then abandon, and their Tor experience is no fun.)

I had a conversation with arma on IRC and he made many good suggestions on how to go about investigating this further (reprinted with permission):

16:49 < arthuredelstein> In general, do connection timeout errors come from the exit node, or from the client? 16:50 < armadev> it means you sent your begin cell, and then you didn't get an end cell or a connected cell after 10 seconds 16:50 < armadev> it could be that you don't really have a tls connection to your guard at all, you just think you do 16:51 < armadev> it could be that the exit receives the begin cell and quietly drops it 16:51 < armadev> or maybe it gets the begin cell and starts its dns resolve and that takes a while 16:51 < armadev> one way to investigate further might be to see if you ever get a connected or end cell if you waited longer 16:52 < arthuredelstein> Ah, that's a good idea. 16:54 < arthuredelstein> Do you have an hypothesis why there are so many timeouts? Do you think exits are dropping cells? 16:54 < armadev> i am wondering if it has to do with the ipv6 thing 16:54 < armadev> we have a bunch of bugs in ipv6 handling 16:55 < arthuredelstein> that's interesting 16:56 < arthuredelstein> in other words, handling at the exit? 16:57 < armadev> yes 16:57 < armadev> is there some pattern with which exits are on problem circuits? 16:57 < armadev> you have the circuit events i hope so you can do the stats? 16:57 < armadev> it is also possible that some exits, or even really just a few but really big ones, and running out of file descriptors or something 16:58 < arthuredelstein> another good idea. I will look into that. 16:58 < armadev> s/and running/are running/ 16:59 < armadev> people.tp.o has an ipv4 and ipv6 address. can you pick something simple and static that's only v4, and is that different? 17:01 < arthuredelstein> makes sense 17:02 < arthuredelstein> Something that made me wonder if it's something closer to the client or guard is that in my first batch of tests (to people.torproject.org) half of the attempted connections were double timeouts, meaning two circuits with different exits failed before a successful connection was made. 17:03 < arthuredelstein> it's -> the cause of the timeouts is 17:08 < armadev> another thing to explore is sending cells end-to-end on the circuit that we know should elicit an immediate response 17:08 < armadev> like a begin to 127.0.0.1 17:08 < armadev> which should immediately reply with 'end, exitpolicy' 17:08 < armadev> and bypass any attempts by the exit to do a dns resolve, open a socket, make a tcp connection, etc 17:16 < arthuredelstein> What's easiest way to send a begin cell? 17:17 < armadev> make a socks request? 17:17 < armadev> there might be something on the client side that tries to block a request to a destination it knows will fail 17:17 < armadev> and also tor browser does isolation by socks parameters so the new socks request will be isolated to a different circuit 17:18 < armadev> but i bet fixing those will still be more fun than my other answer, which is to check out how to call connection_ap_handshake_send_begin() 17:19 < arthuredelstein> Right. I think Tor Browser is blocking connections to 127.0.0.1. 17:19 < armadev> heck, the browser itself might be blocking those too 17:19 < arthuredelstein> or possibly not making a socks connection 17:19 < armadev> and the tor client will be blocking them even if the browser isn't 17:19 < armadev> i guess that's yet another experiment: 17:19 < armadev> do this same experiment with your tor client, no browser involved 17:20 < armadev> and no weird socks isolation 17:20 < arthuredelstein> Yes. 17:20 < armadev> and no weird preferipv6 socksport flag 17:21 < arthuredelstein> aha 17:24 < arthuredelstein> I guess I can also try connecting to port 80 of the exit's IP address as an alternative to 127.0.0.1. 17:25 < armadev> good idea 17:25 < armadev> (though then you have to guess the exit already) 17:25 < arthuredelstein> Yeah, I would need to turn off socks isolation. 17:25 < arthuredelstein> Or maybe do this outside the browser 17:26 < arthuredelstein> maybe I need to get acquainted with stem so I can automate these tests 17:27 < arthuredelstein> assuming the browser isn't causing the problem somehow 17:29 < armadev> having it automated would be extra cool because then it could be done again later without redoing all the work 17:33 < armadev> let me hunt down a ticket you'll find fun and related (though alas not the same) 17:35 < armadev> #5830 (moved) 17:40 < arthuredelstein> And I see you also mention the possibility of instrumenting a browser. 19:49 < armadev> yet another thought: if this happens pretty consistently, can you collude with an exit relay to get debug-level logs at the time of the failure? to see what it sees and what it doesn't see? safelogging might make that harder. 19:49 < arthuredelstein> yeah, that would be great 19:50 < armadev> the precursor to that idea is: can you induce this behavior in a chutney network? 19:50 < armadev> i would assume no, because it requires real users, real load, real broken exits. but who knows! 19:50 < armadev> oh, and another: if you're curious if it's your guard, do the experiment again with a different guard! 19:51 < arthuredelstein> yeah, I should definitely do that! 19:52 < armadev> if your guard is overloaded, you could easily be seeing a delay there 19:52 < armadev> or the intermediate node too, for that matter 19:52 < arthuredelstein> right 19:52 < armadev> where you have to wait for somebody's freight train of packets to move before you can get your connected cell 19:54 < armadev> i guess category 1 of problem, you send your begin and it vanishes. you'll never get an answer. 19:55 < armadev> category 2, everything's working, it's just slow/congested, and you need more patience than the hard-coded 10s timeout. 19:55 < armadev> cranking up the timeout should help distinguish, for starters. 19:55 < arthuredelstein> yes 20:03 < arthuredelstein> Are there cases where a properly-behaving exit is expected to have category 1 behavior? Or should it always return an error message to the client if a tcp connection fails? 20:10 < armadev> every non-response is a bug 20:10 < armadev> are there bugs? there used to be! we don't know of any now. 20:11 < armadev> but of course, weird tcp stacks, and firewalls with rules that drop packets, can induce long timeouts

Trac:
Milestone: N/A to Tor: unspecified

This happened to me with Tor Browser 6.5.1 after my machine was asleep for 5 minutes. Then, after it went to sleep and woke again, everything worked fine.

16:51 < armadev> or maybe it gets the begin cell and starts its dns resolve and that takes a while

This.

Opened: 2017-02-04 Changes in version 0.3.0.2-alpha - 2017-01-23

Guys, the number of timeouts in recent versions of Tor is really unacceptable. It's more bad that even succeeded circuits show timeout after a period of time (e.g. when browsing Trac tickets slowly). So we need to do something with it now.

Trac:
Milestone: Tor: unspecified to Tor: 0.3.1.x-final
Keywords: N/A deleted, performance added

Trac:
Milestone: Tor: 0.3.1.x-final to Tor: 0.3.2.x-final

Trac:
Priority: Medium to Very High

[08-04 17:35:24] Torbutton INFO: controlPort >> 650 STREAM 6240 DETACHED 798 trac.torproject.org:443 REASON=TIMEOUT
[08-04 17:35:31] Torbutton INFO: controlPort >> 650 STREAM 6240 DETACHED 802 trac.torproject.org:443 REASON=END REMOTE_REASON=RESOLVEFAILED
[08-04 17:35:32] Torbutton INFO: controlPort >> 650 STREAM 6241 DETACHED 798 trac.torproject.org:443 REASON=TIMEOUT
[08-04 17:35:34] Torbutton INFO: controlPort >> 650 STREAM 6241 REMAP 804 [2a01:4f8:172:39ca:0:dad3:3:1]:443 SOURCE=EXIT

It looks like trac.tpo is always resolved as IPv6 address, and Tor switches through the exit nodes until one with IPv6 support found.

Replying to cypherpunks:

{{{ [08-04 17:35:24] Torbutton INFO: controlPort >> 650 STREAM 6240 DETACHED 798 trac.torproject.org:443 REASON=TIMEOUT [08-04 17:35:31] Torbutton INFO: controlPort >> 650 STREAM 6240 DETACHED 802 trac.torproject.org:443 REASON=END REMOTE_REASON=RESOLVEFAILED [08-04 17:35:32] Torbutton INFO: controlPort >> 650 STREAM 6241 DETACHED 798 trac.torproject.org:443 REASON=TIMEOUT [08-04 17:35:34] Torbutton INFO: controlPort >> 650 STREAM 6241 REMAP 804 [2a01:4f8:172:39ca:0:dad3:3:1]:443 SOURCE=EXIT }}} It looks like trac.tpo is always resolved as IPv6 address, and Tor switches through the exit nodes until one with IPv6 support found.

#21310 (moved) and #21311 (moved) may fix this, but we might also need a fix on the client side.

I did some more experiments:

First I tried https://torpat.ch. This does not have an IPv6 address, as far as I can tell. I got the following result:

https://torpat.ch:
00000000040000400022002100020100002100000102011100

where each digit represents an attempt to connect, and the number of timeouts before the connection succeeded. So to me it doesn't look like IPv6 is the (only) problem. 15/50 attempts included at least one timeout. About half had more than one timeout.

So I thought I should check IPv6 as well. I found the IPv6 address for perdulce.torproject.org:

http://[2a01:4f8:172:1b46:0:abba:11:1]/ (perdulce.torproject.org ipv6): 
000000000000000000000000000000000000000000000000000000000000000

To my surprise, there were no timeouts at all. So I tried IPv4:

http://138.201.14.203/ (perdulce.torproject.org ipv4)
000000000000000000000000000000000000000000000000000000000000000

Again no timeouts. Then I tried a site on perdulce:

https://people.torproject.org/~arthuredelstein/ :
011000310000000111110000000010130100110122001010000

and the timeouts returned (20/50). That made me think this has something to do with the DNS resolve. To check this, I tried another site, example.com, including https with domain, http with domain, bare IPv4, and bare IPv6:

https://example.com
00000010000010000000020010010210100000000000000003

http://example.com
00100000021000002001001001000000001000000000001000

http://93.184.216.34/
00000000000000000000000000000000000000000000000000

http://[2606:2800:220:1:248:1893:25c8:1946]/
00000000000000000000000000000000000000000000000010

Indeed I got 9/50 timeouts for the domain with http or https, but no timeouts for IPv4 and only a single timeout for IPv6.

Does this ring any bells for Tor core experts? What might be happening with DNS here? Again I think the multiple timeouts are a little suspicious, and I don't quite understand how that jibes with it being a (pure) exit node problem.

connection timeouts are affecting Tor Browser usability

Child items ...

Activity