Opened 6 years ago

Closed 6 years ago

#8179 closed defect (worksforme)

stitched aes-ni ciphers in openssl 1.0.1d seems to break SSL Handshakes/Renegotiations

Reported by: ruebezahl Owned by:
Priority: Very High Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.10-alpha
Severity: Keywords: openssl tor-client backport-022 backport-023
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

running the tor deamon with static openssl 1.0.1d led to masses of

[warn] 45 connections have failed:
[warn] 32 connections died in state handshaking (Tor, v3 handshake) with SSL state SSL negotiation finished successfully in OPEN
[warn] 13 connections died in state renegotiating (TLS, v2 handshake) with SSL state SSLv3 read server hello A in RENEGOTIATE

while bootstraping the node. please see attached excerpt of the debug-log.

what's odd looking to my untrained eye there is:

[debug] tor_tls_debug_state_callback(): SSL 0x7f70e1390720 is now in state before accept initialization [type=16,val=1].
[debug] tor_tls_debug_state_callback(): SSL 0x7f70e1390720 is now in state before accept initialization [type=8193,val=1].
[debug] tor_tls_debug_state_callback(): SSL 0x7f70e1390720 is now in state SSLv3 read client hello B [type=16392,val=522].
[debug] tor_tls_debug_state_callback(): SSL 0x7f70e1390720 is now in state SSLv3 read client hello B [type=8194,val=-1].
[debug] TLS error while reading with [scrubbed]: unexpected message (in SSL routines:SSL3_GET_MESSAGE:SSLv3 read client hello B)
[debug] tor_tls_read(): read returned r=-1, err=-9
[debug] connection_read_to_buf(): tls error [misc error]. breaking (nickname not set, address xx.xxx.xx.xx).

compiling tor with 1.0.0k seems to fix this.

Child Tickets

Attachments (3)

debuglog.txt (7.6 KB) - added by ruebezahl 6 years ago.
debug output
dump_plaintext.patch (868 bytes) - added by nickm 6 years ago.
debugging patch for tortls.c
disable-stitched-aes.diff (2.1 KB) - added by nickm 6 years ago.

Download all attachments as: .zip

Change History (24)

Changed 6 years ago by ruebezahl

Attachment: debuglog.txt added

debug output

Changed 6 years ago by nickm

Attachment: dump_plaintext.patch added

debugging patch for tortls.c

comment:1 Changed 6 years ago by nickm

I've attached a patch that instruments tor_tls_read to dump the buffers returned by SSL_read. Check this out:

Feb 06 17:37:37.000 [notice] Read 1/16384: 05
Feb 06 17:37:37.000 [notice] Read 1479/16384: 000007000200030000810391020101C3308201BF30820128A003020102020900E269B986FFE675BE300D06092A864886F70D0101050500302431223020060355040313197777772E677A766169796C6D34356B6273627370752E636F6D301E170D3133303230363232333231315A170D3134303230363232333231315A301F311D301B060355040313147777772E34766D35347776326A6B686D2E6E657430819F300D06092A864886F70D010101050003818D0030818902818100DECB9CE4EEFF12A6DC0B823E63ACE01E2A82655A860C40F96E742BD7656F6B4EABCA5B76BC6B71581D19D46F52EEF2E0038F7E53A59EBB856C22AF3335A45347A355ADFA1B1CFC16F90DB85AC7B186FDD97554F254D9431E11238DB2DB0B9DF3D4CE8E5D43B53D71D085CD0B01FE5AC89566742D74A5D211005AEA6039E30D310203010001300D06092A864886F70D010105050003818100C14C5C4B8621C8200EFD77183CFAD1D01CF948E21EB6DE8B002693061CD9F8B9B54819A1D0F512126D50C681FF38C908D50538223272F32C5C98D08AC9DDE085F92AC430B0FDDDED6E101AD592487E0A591CA66F98C4DE043847F01163295CE0DC7E8293778495A648615C93454EA2E85417FF694B0505E6413687B44BDF09B60201C7308201C33082012CA003020102020814D49014F3BB0D84300D06092A864886F70D0101050500302431223020060355040313197777772E677A766169796C6D34356B6273627370752E636F6D301E170D3133303230363232333231315A170D3134303230363232333231315A302431223020060355040313197777772E677A766169796C6D34356B6273627370752E636F6D30819F300D06092A864886F70D010101050003818D0030818902818100C8CF869D1000DF1641E3FCB0383E9A733B8673C4C5C5D339FB26B7B4A5DA4DDF1DC1DE0099D8BB5D15FE16FA44186E6E2D47460E9F605E4FC1958EA69C518FDD165F79F624471087CF3EE2DB90350DA33B15D0FE1064C9FD30925AF1E8587C6A76B5557F905C89DF456E73DDE680357AAF742BC31B98DF52BCFAA63F1618FAF10203010001300D06092A864886F70D0101050500038181007F82439B33424FB325B4436B12FCB7BC1C1EBDC679A048414888260F5E64008C7F78DE48BF366EEECC6B4A4B33EECCCD94203687EAA6002B8939D1A189DFF5935699E47B479462B311B7DCDE100F0D1B79EA578A88E004C3BEA2AB8C86B07EF473246515495BAD0475380AF1E535D3F219D33CAE6A4249FC228FD3B19EC8FF7A00008200248CC93EEF558A84E1D4B763DED473FE693BE09485EA03A6C0594D1CB599D54A4F000100010000085112DB3104046C07E048010404D470F5AA00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000015
Feb 06 17:37:37.000 [info] channel_tls_handle_cell(): Received unexpected cell command 0 in chan state opening / conn state handshaking (Tor, v3 handshake); closing the connection.

It looks like there's an extraneous byte coming out of the first SSL_read there -- the next bytes are a perfectly good VERSIONS cell, followed by what seems to be at least the start of a good CERTS cell.

comment:2 Changed 6 years ago by nickm

Keywords: tor-client backport-022 backport-023 added
Milestone: Tor: 0.2.4.x-final
Priority: normalcritical

comment:3 Changed 6 years ago by nickm

I note that the extraneous byte is different each time, and it appears that openssl really is setting it (since a memset before the ssl_read has no obvious effect on the extraneous byte)

comment:4 Changed 6 years ago by nickm

If I tell it to drop the first 1-byte read on each connection, it gets up to "bootstrapped 15%" but no farther. So this is affecting more than the first byte.

ruebezahl and I have both been testing this on 64-bit hosts; I wonder if the architecture matters.

comment:5 Changed 6 years ago by nickm

The plot grows thicker!

When I build openssl 1.0.1d with the no-asm Configure option, it appears to work.

comment:6 Changed 6 years ago by nickm

(could somebody confirm that building 1.0.1d with no-asm fixes it for them when 1.0.1d without no-asm works?)

comment:7 Changed 6 years ago by nickm

To be clear, openssl 1.0.1c also works fine.

comment:8 Changed 6 years ago by ruebezahl

The SSL-Errors seem to disapear when building with no-asm, and bootstraping is 100% complete.

But now I got this after the first Bootstrap:

[notice] Bootstrapped 100%: Done.
[warn] Your Guard AccessNowKromyon03=6557396CF0EE5B72563A22BCAA0FF26E77FA3D08 is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 62/177. Use counts are 0/0. 62 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 131 seconds.
[warn] Your Guard jalopy=35BDC6486420EFD442C985D8D3C074988BFE544B is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 51/192. Use counts are 0/0. 51 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 131 seconds.
[warn] Your Guard lilith=6BE0C165B88EBE0371597F9E2230D3F253A299EF is failing an extremely large amount of circuits. This could indicate a route manipulation attack, extreme network overload, or a bug. Success counts are 48/195. Use counts are 0/0. 48 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 131 seconds.
[notice] Self-testing indicates your DirPort is reachable from the outside. Excellent.

I restarted the daemon two more times and did not get this warning again. Not sure if related.

comment:9 in reply to:  8 ; Changed 6 years ago by nickm

Replying to ruebezahl:

The SSL-Errors seem to disapear when building with no-asm, and bootstraping is 100% complete.

But now I got this after the first Bootstrap:

[notice] Bootstrapped 100%: Done.
[warn] Your Guard AccessNowKromyon03=6557396CF0EE5B72563A22BCAA0FF26E77FA3D08 is failing a very large amount of circuits. Most likely this means the Tor network is overloaded, but it could also mean an attack against you or potentially the guard itself. Success counts are 62/177. Use counts are 0/0. 62 circuits completed, 0 were unusable, 0 collapsed, and 0 timed out. For reference, your timeout cutoff is 131 seconds.

I wouldn't worry about this one in this case: Tor is just complaining about all of the circuits it tried to launch to your guard node and failed to do so back when it was building with a busted openssl.

Does your CPU have aesni instruction? ("cat /proc/cpuinfo |grep aes" will tell you on Linuxy systems.)

comment:10 in reply to:  9 ; Changed 6 years ago by ruebezahl

Replying to nickm:

Does your CPU have aesni instruction? ("cat /proc/cpuinfo |grep aes" will tell you on Linuxy systems.)

Yes, but I don't use it in my static OpenSSL/TOR Setup.

(Had some issues with the dynamic loading of the AESNI Engine at tor startup with static linked OpenSSL - since CPU is not (yet) the bottleneck on the Nodes, I didn't bother so far. This may be specific to my Setup)

comment:11 in reply to:  10 ; Changed 6 years ago by nickm

Replying to ruebezahl:

Replying to nickm:

Does your CPU have aesni instruction? ("cat /proc/cpuinfo |grep aes" will tell you on Linuxy systems.)

Yes, but I don't use it in my static OpenSSL/TOR Setup.

(Had some issues with the dynamic loading of the AESNI Engine at tor startup with static linked OpenSSL - since CPU is not (yet) the bottleneck on the Nodes, I didn't bother so far. This may be specific to my Setup)

I don't think aesni is a separate engine in openssl 1.0.1 -- I think you get it by default unless it's specifically disabled.

comment:12 in reply to:  11 Changed 6 years ago by ruebezahl

Replying to nickm:

I don't think aesni is a separate engine in openssl 1.0.1 -- I think you get it by default unless it's specifically disabled.

after setting "AccelName aesni" in torrc and starting the no-asm build:

[notice] Tor v0.2.4.10-alpha (git-36abdfe7dcff3c65) running on Linux with Libevent 2.1.2-alpha and OpenSSL 1.0.1d.
[notice] Tor 0.2.4.10-alpha (git-36abdfe7dcff3c65) opening log file.
[warn] Unable to load dynamic OpenSSL engine "aesni".

comment:13 Changed 6 years ago by ruebezahl

patching openssl 1.0.1d with http://paste.ubuntu.com/1618317/ (removing aesni) seems to fix it.

comment:14 Changed 6 years ago by cypherpunks

I believe that this is a bug in the AES-NI, stitched code. I'm looking at the diff now but I didn't write this code and it's complex. I'll let the OpenSSL team know about the problem tonight, whether or not I'm able to pin down the issue.

comment:15 Changed 6 years ago by nickm

Summary: openssl 1.0.1d seems to break SSL Handshakes/Renegotiationsstitched aes-ni ciphers in openssl 1.0.1d seems to break SSL Handshakes/Renegotiations

comment:16 Changed 6 years ago by nickm

I can confirm that, for me, the attached "disable-stitched-aes.diff" patch makes Tor work again, by disabling those ciphersuites.

This isn't the right fix though, since:

  • It's an openssl bug.
  • We should try to detect when stitched AES is broken and disable it then, and not just do it by trying to detect whether the version is 1.0.1d specifically.
  • It'll need a backport to 0.2.2 and 0.2.3.

comment:17 Changed 6 years ago by nickm

err, s/by disabling those ciphersuites/by disabling EVP_aes_128_cbc_hmac_sha1 and EVP_aes_256_cbc_hmac_sha1/ .

Changed 6 years ago by nickm

Attachment: disable-stitched-aes.diff added

comment:18 Changed 6 years ago by nickm

Just replaced that patch with one that should be less crashy in the cases where it isn't needed.

comment:19 Changed 6 years ago by nickm

Okay, there's a possible set of workaround in my repository as branches "bug8179_022", "bug8179_023", and "bug8179_024". They don't fix the underlying problem -- they just tell OpenSSL it's not allowed to use those stitched ciphers.

It would be great to have a fix in openssl; some understanding of how this broke without getting caught; and some code Tor can call at runtime to see whether it has a broken openssl or not.

comment:20 Changed 6 years ago by nickm

Adam Langley has investigated and encouraged the OpenSSL team to do so as well: It appears that the code for using AEAD CBC ciphers with TLS is broken in OpenSSL 1.0.1d. Right now, the stitched aesni-cbc-hmac-sha1 cipher is the only such cipher.

Since this is a pretty bad problem (and will break all commonly used AES ciphers when used with AESNI), I'd hope that a fix will come out soon. To detect this at runtime, we'll have to try doing a TLS connection with ourselves: testing the cipher implementation itself won't work.

comment:21 Changed 6 years ago by nickm

Resolution: worksforme
Status: newclosed

It appears that openssl 1.0.1e solves these issues. Cool! With any luck, nobody will try to build Tor with 1.0.1d at this point. If they do, we can add workaround code.

Note: See TracTickets for help on using tickets.