In #22341 (moved) we are thinking about picking up LZMA- and Zstandard-support for consensus diffs.
For LZMA-compressed diffs I often encounter boostrap delays of at least 20s like this:
Dec 26 21:01:18.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as LZMA compressed, and it seems to be LZMA compressed.Dec 26 21:01:38.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 482897) from server 'XX.XX.XX.XX:9001', resulting in a new consensus document (size 1903167).
But, worse, it might even break Tor Browser bootstrap entirely when blocking a Tor Launcher GETCONF call (throwing an exception as Tor Launcher thinks tor is not working) like so:
Dec 30 23:54:54.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as LZMA compressed, and it seems to be LZMA compressed.Illegal AddressMatcher: [xpconnect wrapped nsIPrefBranch] -- TypeError: s.split is not a functionIllegal AddressMatcher: [xpconnect wrapped nsIPrefBranch] -- TypeError: s.split is not a function[12-30 22:54:55] TorLauncher DBUG: readTorSettings ----------------------------------------------[12-30 22:54:55] TorLauncher DBUG: Sending Tor command: GETCONF UseBridges[12-30 22:55:10] TorLauncher NOTE: Exception on control port [Exception... "Component returned failure code: 0x804b000e (NS_ERROR_NET_TIMEOUT) [nsIBinaryInputStream.readBytes]" nsresult: "0x804b000e (NS_ERROR_NET_TIMEOUT)" location: "JS frame :: jar:file:///home/thomas/Arbeit/TBB/tor-browser_en-US/Browser/TorBrowser/Data/Browser/profile.default/extensions/tor-launcher@torproject.org.xpi!/components/tl-protocol.js :: TorProtocolService.prototype._torReadLine :: line 920" data: no]
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I've observed such load on my relays. Depending on the available CPU, this can take quite a bit of time, I've seen more than a minute on a very loaded machine.
7.0.11 sometimes gives
{{{
[01-09 05:25:15] Torbutton NOTE: Exception on control port [Exception... "Component returned failure code: 0x804b000e (NS_ERROR_NET_TIMEOUT) [nsIBinaryInputStream.readBytes]" nsresult: "0x804b000e (NS_ERROR_NET_TIMEOUT)" location: "JS frame :: chrome://torbutton/content/torbutton.js :: torbutton_socket_readline :: line 824" data: no]
}}}
on a slow Windows machine (didn't you forget to set the compression thread's priority below normal?).
That's a different issue. There is no released Tor Browser that supports LZMA compression at the moment.
7.0.11 sometimes gives
{{{
[01-09 05:25:15] Torbutton NOTE: Exception on control port [Exception... "Component returned failure code: 0x804b000e (NS_ERROR_NET_TIMEOUT) [nsIBinaryInputStream.readBytes]" nsresult: "0x804b000e (NS_ERROR_NET_TIMEOUT)" location: "JS frame :: chrome://torbutton/content/torbutton.js :: torbutton_socket_readline :: line 824" data: no]
}}}
on a slow Windows machine (didn't you forget to set the compression thread's priority below normal?).
That's a different issue. There is no released Tor Browser that supports LZMA compression at the moment.
Not so different as you think. Without LZMA it stalls for ~15 sec there and breaks Torbutton when more. And it's not funny that Tor periodically causes glitches in all other programs (check with 1 CPU VM). BTW, shouldn't Tor respond to Tor Launcher even when it's parsing diffs in the background? (Is timeout set to 15 sec as your logs show?)
I've observed such load on my relays. Depending on the available CPU, this can take quite a bit of time, I've seen more than a minute on a very loaded machine.
Wuh!
Was this in generating the diff, or in applying it?
The control port read timeout that is used by Tor Launcher is 15 seconds (used when reading Tor configuration via the control port, to issue commands during startup such as TAKEOWNERSHIP, and so on).
The control port read timeout that is used by Torbutton is 2 seconds (used when sending control port commands for the local SOCKS port check during browser startup and for new identity).
Should these timeouts be increased? What value should we use?
As someone pointed out, in an ideal world tor would be responsive to control port commands at all times.
Turns out this is affecting Zstandard-compressed diffs as well:
Jan 10 07:55:33.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as Zstandard compressed, and it seems to be Zstandard compressed.Jan 10 07:55:53.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 522839) from server 'XX.XX.XX.XX:9001', resulting in a new consensus document (size 1866898).
This time it hit the 2 second Torbutton timeout breaking the local SOCKS port check.
Trac: Summary: LZMA-compressed consensus diffs stall Tor Browser launch for at least 20s or break it entirely to LZMA- and Zstandard compressed consensus diffs stall Tor Browser launch for at least 20s or break it entirely Description: In #22341 (moved) we are thinking about picking up LZMA- and zstd-support for consensus diffs.
For LZMA-compressed diffs I often encounter boostrap delays of at least 20s like this:
Dec 26 21:01:18.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as LZMA compressed, and it seems to be LZMA compressed.Dec 26 21:01:38.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 482897) from server 'XX.XX.XX.XX:9001', resulting in a new consensus document (size 1903167).
But, worse, it might even break Tor Browser bootstrap entirely when blocking a Tor Launcher GETCONF call (throwing an exception as Tor Launcher thinks tor is not working) like so:
Dec 30 23:54:54.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as LZMA compressed, and it seems to be LZMA compressed.Illegal AddressMatcher: [xpconnect wrapped nsIPrefBranch] -- TypeError: s.split is not a functionIllegal AddressMatcher: [xpconnect wrapped nsIPrefBranch] -- TypeError: s.split is not a function[12-30 22:54:55] TorLauncher DBUG: readTorSettings ----------------------------------------------[12-30 22:54:55] TorLauncher DBUG: Sending Tor command: GETCONF UseBridges[12-30 22:55:10] TorLauncher NOTE: Exception on control port [Exception... "Component returned failure code: 0x804b000e (NS_ERROR_NET_TIMEOUT) [nsIBinaryInputStream.readBytes]" nsresult: "0x804b000e (NS_ERROR_NET_TIMEOUT)" location: "JS frame :: jar:file:///home/thomas/Arbeit/TBB/tor-browser_en-US/Browser/TorBrowser/Data/Browser/profile.default/extensions/tor-launcher@torproject.org.xpi!/components/tl-protocol.js :: TorProtocolService.prototype._torReadLine :: line 920" data: no]
to
In #22341 (moved) we are thinking about picking up LZMA- and Zstandard-support for consensus diffs.
For LZMA-compressed diffs I often encounter boostrap delays of at least 20s like this:
Dec 26 21:01:18.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as LZMA compressed, and it seems to be LZMA compressed.Dec 26 21:01:38.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 482897) from server 'XX.XX.XX.XX:9001', resulting in a new consensus document (size 1903167).
But, worse, it might even break Tor Browser bootstrap entirely when blocking a Tor Launcher GETCONF call (throwing an exception as Tor Launcher thinks tor is not working) like so:
Dec 30 23:54:54.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as LZMA compressed, and it seems to be LZMA compressed.Illegal AddressMatcher: [xpconnect wrapped nsIPrefBranch] -- TypeError: s.split is not a functionIllegal AddressMatcher: [xpconnect wrapped nsIPrefBranch] -- TypeError: s.split is not a function[12-30 22:54:55] TorLauncher DBUG: readTorSettings ----------------------------------------------[12-30 22:54:55] TorLauncher DBUG: Sending Tor command: GETCONF UseBridges[12-30 22:55:10] TorLauncher NOTE: Exception on control port [Exception... "Component returned failure code: 0x804b000e (NS_ERROR_NET_TIMEOUT) [nsIBinaryInputStream.readBytes]" nsresult: "0x804b000e (NS_ERROR_NET_TIMEOUT)" location: "JS frame :: jar:file:///home/thomas/Arbeit/TBB/tor-browser_en-US/Browser/TorBrowser/Data/Browser/profile.default/extensions/tor-launcher@torproject.org.xpi!/components/tl-protocol.js :: TorProtocolService.prototype._torReadLine :: line 920" data: no]
Jan 10 14:48:55.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as gzipped, and it seems to be gzipped.Jan 10 14:49:12.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 179832) from server 'XX.XX.XX.XX:9001', resulting in a new consensus document (size 1885627).
This was on an idle laptop which is usually pretty fast with 8GB RAM. So, I wonder whether something else changed that is causing these delays...
If you'd like to try too, I've put the extra log messages in a branch extra_consdiff_logs against maint-0.3.2. They're mostly at debug level, and it's mostly the timestamps that should be interesting.
Jan 10 10:17:51.000 [debug] HTTP body from server '154.35.175.225:443' was labeled as LZMA compressed, and it seems to be LZMA compressed.Jan 10 10:17:51.000 [debug] dir_client_decompress_response_body(): Successfully decompressed body using LZMA compressedJan 10 10:17:51.000 [debug] consensus_diff_apply(): About to apply consensus diff.Jan 10 10:17:51.000 [debug] consensus_diff_apply(): Computed digest-as-signed for applying consensus diff.Jan 10 10:17:51.000 [debug] consdiff_apply_diff(): Extracted digests for consensus diff.Jan 10 10:17:51.000 [debug] consdiff_apply_diff(): Digests match; applying diff.Jan 10 10:17:51.000 [debug] consdiff_apply_diff(): Diff applied.Jan 10 10:17:51.000 [debug] consdiff_apply_diff(): Computing digest of resulting consensus.Jan 10 10:17:51.000 [debug] consdiff_apply_diff(): Digests are equal; diff application succeeded.Jan 10 10:17:51.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 101910) from server '154.35.175.225:443', resulting in a new consensus document (size 1884843).
Jan 10 20:20:31.000 [debug] HTTP body from server 'XX.XX.XX.XX:9001' was labeled as LZMA compressed, and it seems to be LZMA compressed.Jan 10 20:20:31.000 [debug] dir_client_decompress_response_body(): Successfully decompressed body using LZMA compressedJan 10 20:20:31.000 [debug] consensus_diff_apply(): About to apply consensus diff.Jan 10 20:20:31.000 [debug] consensus_diff_apply(): Computed digest-as-signed for applying consensus diff.Jan 10 20:20:48.000 [debug] consdiff_apply_diff(): Extracted digests for consensus diff.Jan 10 20:20:48.000 [debug] consdiff_apply_diff(): Digests match; applying diff.Jan 10 20:20:48.000 [debug] consdiff_apply_diff(): Diff applied.Jan 10 20:20:48.000 [debug] consdiff_apply_diff(): Computing digest of resulting consensus.Jan 10 20:20:48.000 [debug] consdiff_apply_diff(): Digests are equal; diff application succeeded.Jan 10 20:20:48.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 235301) from server 'XX.XX.XX.XX:9001', resulting in a new consensus document (size 1897672).
I compile tor with --enable-fragile-hardening but it seems to me that does not explain the 17 seconds for extracting digests for consensus diff. But I might be wrong.
After some testing it does indeed seem to be the case that --enable-expensive-hardening is causing the issue here.
Trac: Summary: LZMA- and Zstandard compressed consensus diffs stall Tor Browser launch for at least 20s or break it entirely to LZMA- and Zstandard compressed consensus diffs stall Tor Browser launch for at least 20s or break it entirely if compiled with --enable-expensive-hardening
Okay, the plot grows thicker: It seems that we already are compiling the sha3 module without the hardening flags. So maybe the problem here isn't the hardening in sha3, but somewhere else instead.
It could be either in consensus_split_lines, or in consdiff_get_digests. I'll try them both out. I suspect it might be split_lines, though.
In my testing, it helped to change the strchr() in that function to use memchr() instead. I don't know why, but maybe it will help you too. Branch is bug24826_031.
In my testing, it helped to change the strchr() in that function to use memchr() instead. I don't know why, but maybe it will help you too. Branch is bug24826_031.
Trac: Summary: LZMA- and Zstandard compressed consensus diffs stall Tor Browser launch for at least 20s or break it entirely if compiled with --enable-expensive-hardening to consensus diffs stall Tor Browser launch for at least 20s or break it entirely if compiled with --enable-expensive-hardening