running the tor deamon with static openssl 1.0.1d led to masses of
[warn] 45 connections have failed:
[warn] 32 connections died in state handshaking (Tor, v3 handshake) with SSL state SSL negotiation finished successfully in OPEN
[warn] 13 connections died in state renegotiating (TLS, v2 handshake) with SSL state SSLv3 read server hello A in RENEGOTIATE
while bootstraping the node. please see attached excerpt of the debug-log.
what's odd looking to my untrained eye there is:
[debug] tor_tls_debug_state_callback(): SSL 0x7f70e1390720 is now in state before accept initialization [type=16,val=1].
[debug] tor_tls_debug_state_callback(): SSL 0x7f70e1390720 is now in state before accept initialization [type=8193,val=1].
[debug] tor_tls_debug_state_callback(): SSL 0x7f70e1390720 is now in state SSLv3 read client hello B [type=16392,val=522].
[debug] tor_tls_debug_state_callback(): SSL 0x7f70e1390720 is now in state SSLv3 read client hello B [type=8194,val=-1].
[debug] TLS error while reading with [scrubbed]: unexpected message (in SSL routines:SSL3_GET_MESSAGE:SSLv3 read client hello B)
[debug] tor_tls_read(): read returned r=-1, err=-9
[debug] connection_read_to_buf(): tls error [misc error]. breaking (nickname not set, address xx.xxx.xx.xx).
compiling tor with 1.0.0k seems to fix this.
Trac: Username: ruebezahl
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
I've attached a patch that instruments tor_tls_read to dump the buffers returned by SSL_read. Check this out:
Feb 06 17:37:37.000 [notice] Read 1/16384: 05Feb 06 17:37:37.000 [notice] Read 1479/16384: 000007000200030000810391020101C3308201BF30820128A003020102020900E269B986FFE675BE300D06092A864886F70D0101050500302431223020060355040313197777772E677A766169796C6D34356B6273627370752E636F6D301E170D3133303230363232333231315A170D3134303230363232333231315A301F311D301B060355040313147777772E34766D35347776326A6B686D2E6E657430819F300D06092A864886F70D010101050003818D0030818902818100DECB9CE4EEFF12A6DC0B823E63ACE01E2A82655A860C40F96E742BD7656F6B4EABCA5B76BC6B71581D19D46F52EEF2E0038F7E53A59EBB856C22AF3335A45347A355ADFA1B1CFC16F90DB85AC7B186FDD97554F254D9431E11238DB2DB0B9DF3D4CE8E5D43B53D71D085CD0B01FE5AC89566742D74A5D211005AEA6039E30D310203010001300D06092A864886F70D010105050003818100C14C5C4B8621C8200EFD77183CFAD1D01CF948E21EB6DE8B002693061CD9F8B9B54819A1D0F512126D50C681FF38C908D50538223272F32C5C98D08AC9DDE085F92AC430B0FDDDED6E101AD592487E0A591CA66F98C4DE043847F01163295CE0DC7E8293778495A648615C93454EA2E85417FF694B0505E6413687B44BDF09B60201C7308201C33082012CA003020102020814D49014F3BB0D84300D06092A864886F70D0101050500302431223020060355040313197777772E677A766169796C6D34356B6273627370752E636F6D301E170D3133303230363232333231315A170D3134303230363232333231315A302431223020060355040313197777772E677A766169796C6D34356B6273627370752E636F6D30819F300D06092A864886F70D010101050003818D0030818902818100C8CF869D1000DF1641E3FCB0383E9A733B8673C4C5C5D339FB26B7B4A5DA4DDF1DC1DE0099D8BB5D15FE16FA44186E6E2D47460E9F605E4FC1958EA69C518FDD165F79F624471087CF3EE2DB90350DA33B15D0FE1064C9FD30925AF1E8587C6A76B5557F905C89DF456E73DDE680357AAF742BC31B98DF52BCFAA63F1618FAF10203010001300D06092A864886F70D0101050500038181007F82439B33424FB325B4436B12FCB7BC1C1EBDC679A048414888260F5E64008C7F78DE48BF366EEECC6B4A4B33EECCCD94203687EAA6002B8939D1A189DFF5935699E47B479462B311B7DCDE100F0D1B79EA578A88E004C3BEA2AB8C86B07EF473246515495BAD0475380AF1E535D3F219D33CAE6A4249FC228FD3B19EC8FF7A00008200248CC93EEF558A84E1D4B763DED473FE693BE09485EA03A6C0594D1CB599D54A4F000100010000085112DB3104046C07E048010404D470F5AA00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000015Feb 06 17:37:37.000 [info] channel_tls_handle_cell(): Received unexpected cell command 0 in chan state opening / conn state handshaking (Tor, v3 handshake); closing the connection.
It looks like there's an extraneous byte coming out of the first SSL_read there -- the next bytes are a perfectly good VERSIONS cell, followed by what seems to be at least the start of a good CERTS cell.
I note that the extraneous byte is different each time, and it appears that openssl really is setting it (since a memset before the ssl_read has no obvious effect on the extraneous byte)
If I tell it to drop the first 1-byte read on each connection, it gets up to "bootstrapped 15%" but no farther. So this is affecting more than the first byte.
ruebezahl and I have both been testing this on 64-bit hosts; I wonder if the architecture matters.
The SSL-Errors seem to disapear when building with no-asm, and bootstraping is 100% complete.
But now I got this after the first Bootstrap:
[notice] Bootstrapped 100%: Done.
[warn] Your Guard AccessNowKromyon03=6557396CF0EE5B72563A22BCAA0FF26E77FA3D08 is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 62/177. Use counts are 0/0. 62 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 131 seconds.
[warn] Your Guard jalopy=35BDC6486420EFD442C985D8D3C074988BFE544B is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 51/192. Use counts are 0/0. 51 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 131 seconds.
[warn] Your Guard lilith=6BE0C165B88EBE0371597F9E2230D3F253A299EF is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 48/195. Use counts are 0/0. 48 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 131 seconds.
[notice] Self-testing indicates your DirPort is reachable from the outside. Excellent.
I restarted the daemon two more times and did not get this warning again. Not sure if related.
The SSL-Errors seem to disapear when building with no-asm, and bootstraping is 100% complete.
But now I got this after the first Bootstrap:
[notice] Bootstrapped 100%: Done.
[warn] Your Guard AccessNowKromyon03=6557396CF0EE5B72563A22BCAA0FF26E77FA3D08 is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 62/177. Use counts are 0/0. 62 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 131 seconds.
I wouldn't worry about this one in this case: Tor is just complaining about all of the circuits it tried to launch to your guard node and failed to do so back when it was building with a busted openssl.
Does your CPU have aesni instruction? ("cat /proc/cpuinfo |grep aes" will tell you on Linuxy systems.)
Does your CPU have aesni instruction? ("cat /proc/cpuinfo |grep aes" will tell you on Linuxy systems.)
Yes, but I don't use it in my static OpenSSL/TOR Setup.
(Had some issues with the dynamic loading of the AESNI Engine at tor startup with static linked OpenSSL - since CPU is not (yet) the bottleneck on the Nodes, I didn't bother so far. This may be specific to my Setup)
Does your CPU have aesni instruction? ("cat /proc/cpuinfo |grep aes" will tell you on Linuxy systems.)
Yes, but I don't use it in my static OpenSSL/TOR Setup.
(Had some issues with the dynamic loading of the AESNI Engine at tor startup with static linked OpenSSL - since CPU is not (yet) the bottleneck on the Nodes, I didn't bother so far. This may be specific to my Setup)
I don't think aesni is a separate engine in openssl 1.0.1 -- I think you get it by default unless it's specifically disabled.
I don't think aesni is a separate engine in openssl 1.0.1 -- I think you get it by default unless it's specifically disabled.
after setting "AccelName aesni" in torrc and starting the no-asm build:
[notice] Tor v0.2.4.10-alpha (git-36abdfe7dcff3c65) running on Linux with Libevent 2.1.2-alpha and OpenSSL 1.0.1d.
[notice] Tor 0.2.4.10-alpha (git-36abdfe7dcff3c65) opening log file.
[warn] Unable to load dynamic OpenSSL engine "aesni".
I believe that this is a bug in the AES-NI, stitched code. I'm looking at the diff now but I didn't write this code and it's complex. I'll let the OpenSSL team know about the problem tonight, whether or not I'm able to pin down the issue.
I can confirm that, for me, the attached "disable-stitched-aes.diff" patch makes Tor work again, by disabling those ciphersuites.
This isn't the right fix though, since:
It's an openssl bug.
We should try to detect when stitched AES is broken and disable it then, and not just do it by trying to detect whether the version is 1.0.1d specifically.
Okay, there's a possible set of workaround in my repository as branches "bug8179_022", "bug8179_023", and "bug8179_024". They don't fix the underlying problem -- they just tell OpenSSL it's not allowed to use those stitched ciphers.
It would be great to have a fix in openssl; some understanding of how this broke without getting caught; and some code Tor can call at runtime to see whether it has a broken openssl or not.
Adam Langley has investigated and encouraged the OpenSSL team to do so as well: It appears that the code for using AEAD CBC ciphers with TLS is broken in OpenSSL 1.0.1d. Right now, the stitched aesni-cbc-hmac-sha1 cipher is the only such cipher.
Since this is a pretty bad problem (and will break all commonly used AES ciphers when used with AESNI), I'd hope that a fix will come out soon. To detect this at runtime, we'll have to try doing a TLS connection with ourselves: testing the cipher implementation itself won't work.
It appears that openssl 1.0.1e solves these issues. Cool! With any luck, nobody will try to build Tor with 1.0.1d at this point. If they do, we can add workaround code.
Trac: Status: new to closed Resolution: N/Ato worksforme