Tor clients warn when they use the wrong ntor onion key

changed milestone to %Tor: 0.2.6.x-final

added 025-backport 026-triaged-1 component::core tor/tor milestone::Tor: 0.2.6.x-final nickm-patch priority::medium resolution::fixed spammy-warnings status::closed tor-bridge type::defect version::tor unspecified labels

I saw one of these today too.

I assume it's a codepath rarely used, for when we give up on a circuit we're making because it's been a while since we sent a create cell, that is triggered more commonly during our "million new users" crisis.

The warning is nothing to worry about, but it's something we should track down and fix in the code.

Trac:
Milestone: Tor: 0.2.4.x-final to Tor: 0.2.5.x-final
Keywords: N/A deleted, tor-bridge 024-backport added

Here's the bonus info-level message that happens after the above three and gives us our clue:

Sep 10 18:33:52.724 [info] command_process_relay_cell(): circuit_receive_relay_
cell (backward) failed. Closing.

Looks like the call chain is that command_process_relay_cell() receives an extend or extend2 cell, passes it to circuit_receive_relay_cell(), which passes it to connection_edge_process_relay_cell() (in the CELL_DIRECTION_IN case), which sees RELAY_COMMAND_EXTENDED or RELAY_COMMAND_EXTENDED2 and calls circuit_finish_handshake() which pulls out the 'created' cell and calls onion_skin_client_handshake(), which returns -1 but doesn't say why.

So if I were to suggest a fix, I'd suggest that onion_skin_client_handshake() should have an *errmsg parameter which it sets if it wants to complain about something. Then the various subfunctions it calls can set it, or we can choose not to complain if a failure isn't worth telling the user about.

skruffy describes a plausible explanation for why these client-side warns are happening: the client is using an old ntor onion key, and in the new ntor design, there's no way for the relay to know what onion key the create request is intended for. So the relay answers anyway, and the client is shocked to find that it can't complete the handshake.

I guess it would be nice if the relay could realize that the handshake is going to fail, so it can help the client understand what's happening. But failing that, I guess we should make the client not complain when it fails?

Trac:
Summary: Tor fails handshakes, onion_skin_client_handshake, circuit_finish_handshake, connection_edge_process_relay_cell to Tor clients warn when they use the wrong ntor onion key

Replying to arma:

skruffy describes a plausible explanation for why these client-side warns are happening: the client is using an old ntor onion key, and in the new ntor design, there's no way for the relay to know what onion key the create request is intended for. So the relay answers anyway, and the client is shocked to find that it can't complete the handshake.

What is KEYID for then?

rransom is correct that there is a perfectly good way (KEYID) for the relay to learn which ntor key was supposed to get used.

But, if for some weird reason, the client has an onion key so ancient that the relay doesn't recognize it any longer, then rather than send a DESTROY cell, the relay will pick a junk onion key and use that to complete the handshake anyway, on the theory that it's better not to leak anything in timing information. Could that be what's going on here?

For more data, on moria1,

$ grep "\[warn\] onion_skin_client_handshake failed." moria1-notice|grep ^Sep|wc -l          
162

So this is a real thing that seems to be happening quite often to lots of people.

Replying to nickm:

But, if for some weird reason, the client has an onion key so ancient that the relay doesn't recognize it any longer, then rather than send a DESTROY cell, the relay will pick a junk onion key and use that to complete the handshake anyway, on the theory that it's better not to leak anything in timing information. Could that be what's going on here?

If the client has an invalid ntor onion key, then it looks like this would be a symptom of it because the AUTH value sent by the server will not match the client's computed auth_input. This seems possible, and the warning will be meaningless to the user, so a possible patch is to simply zero Y, before sending it back to the client, if the server chooses the junk key. This will be the point at infinity in the exponentiation, so current implementations will still fail and warn, but new implementations can check for this value and not warn on failure. Is it worth reducing the security level of the handshake just so we can handle this exception?

Alternatively, maybe just demote the message from warning to notice or info level. In either case, because there are subsequent log messages, we'll need to be able to inform the calling function that we had an error but not to emit the warning. We could return -2 and catch that up the stack, I guess. What's choice 3?

I think at this point the right choice for 0.2.5 is a better log message.

Trac:
Keywords: tor-bridge 024-backport deleted, tor-bridge 024-backport 025-triaged added

I've done a "better log message" branch as "bug9635". I think it's fine to merge it in 0.2.5.

We aren't actually seeing a lot of reports of these IRL, though, so I'll suggest that deferring to 0.2.6 could be fine too.

(Grepping an authority's logs isn't representative, since we'd expect an authority to see a larger-than-usual share of nodes that are trying to configure themselves.)

Trac:
Status: new to needs_review

Trac:
Milestone: Tor: 0.2.5.x-final to Tor: 0.2.6.x-final

merged the 'better log message' bit to 0.2.5.

Trac:
Status: needs_review to new

Someone said this in IRC today:

Seeing a few of these in my log:- [warn] Invalid result from curve25519 handshake: 4
Always followed by:-
Jun 19 07:51:24.000 [warn] onion_skin_client_handshake failed.
Jun 19 07:51:24.000 [warn] circuit_finish_handshake failed.
Jun 19 07:51:24.000 [warn] connection_edge_process_relay_cell (at origin) failed.

Is this the same bug? I haven't seen the Invalid result from... message before.

Replying to asn:

Is this the same bug? I haven't seen the Invalid result from... message before.

Yes. That log message is from Nickm's "merged the 'better log message' bit to 0.2.5." comment above. See also git commit df03e9b7 from May 2014.

So, "4" in that context means the auth value was wrong, which could indeed correspond to the case discussed above where the client asks for an onion key the server doesn't believe in, so the server uses one that it does, knowing that it will generate a bogus result.

Trac:
Keywords: tor-bridge 024-backport 025-triaged deleted, tor-bridge 026-triaged-1 025-backport added

Trac:
Keywords: N/A deleted, spammy-warnings added

just in case it makes sense to report:

Oct 01 22:18:58.000 [warn] Invalid result from curve25519 handshake: 4 Oct 01 22:18:58.000 [warn] onion_skin_client_handshake failed. Oct 01 22:18:58.000 [warn] circuit_finish_handshake failed. Oct 01 22:18:58.000 [warn] connection_edge_process_relay_cell (at origin) failed.

on

Oct 01 22:12:46.000 [notice] Tor 0.2.5.8-rc (git-a64f3ab3ee5c433c) opening log file.

from homebrew on osx 10.7.5

Trac:
Username: siekiera

Still getting a lot of these with v0.2.5.8-rc (git-eaa9ca1011e73a9d)

Oct 14 09:36:25.000 [warn] Invalid result from curve25519 handshake: 4
Oct 14 09:36:25.000 [warn] onion_skin_client_handshake failed.
Oct 14 09:36:25.000 [warn] circuit_finish_handshake failed.
Oct 14 09:36:25.000 [warn] connection_edge_process_relay_cell (at origin) failed.
Oct 14 09:36:26.000 [warn] Invalid result from curve25519 handshake: 4
Oct 14 09:36:26.000 [warn] onion_skin_client_handshake failed.
Oct 14 09:36:26.000 [warn] circuit_finish_handshake failed.
Oct 14 09:36:26.000 [warn] connection_edge_process_relay_cell (at origin) failed.

Trac:
Username: kargig

Turning these into info-level logs makes sense to me -- it's not anything the user can do anything about, and it does indeed happen often enough that people keep reporting it.

That said, is there a patch? I think it might be a bit complex to downgrade the log severity here without downgrading the general "a circuit build attempt failed" case.

My "if I were to suggest a fix" paragraph above still makes sense, but it's sure not a one-liner.

Ug. Yeah, that's not a one-liner at all. See branch bug9635_warnings_025. It's a 400-line diff touching over a dozen modules. It's against 0.2.5, but despite the name I think maybe this one should go into 0.2.6.

Of especial interest when reviewing this is the change in relay.c, which makes a circuit_mark_for_close happen earlier and changes a return value.

Trac:
Status: new to needs_review

Trac:
Keywords: tor-bridge 026-triaged-1 025-backport deleted, tor-bridge 026-triaged-1 025-backport nickm-patch added

commit fb91d647acdf0560fc7479d72eeea52e4e6ff41d

circuit_finish_handshake() +1284: adding log_warn(LD_BUG, "Couldn't initialize cpath crypto"); seems redondant because circuit_init_cpath_crypto() already has logging for every possible failure.
onion_skin_ntor_client_handshake() +291: So this means that if msg_out is NULL, there are no log at all. Seems reasonable for bad & 4 because it was downgraded but bad & 3 is still a warning thus should it be logged even if msg_out is NULL?

The rest looks good to me! ACK.

Trac:
Status: needs_review to needs_revision

I compiled openssl and Tor with following command on CentOS 6.6 2.6.32-431.el6.x86_64

cd ./openssl-1.0.2
./Configure shared enable-ec_nistp_64_gcc_128 linux-x86_64

cd ./tor-0.2.6.2-alpha
./configure --with-openssl-dir=/usr/local/ssl --with-zlib-dir=/lib --with-libevent-dir=/usr/local/lib

after a while, I got a warn log when runninng relay

Jan 29 12:09:51.828 [notice] Tor v0.2.6.2-alpha running on Linux with Libevent 2.0.22-stable, OpenSSL 1.0.2 and Zlib 1.2
.8.
...
Jan 29 12:11:02.000 [warn] Invalid result from curve25519 handshake: 4
Jan 29 12:11:02.000 [warn] onion_skin_client_handshake failed.
Jan 29 12:11:02.000 [warn] circuit_finish_handshake failed.
Jan 29 12:11:02.000 [warn] connection_edge_process_relay_cell (at origin) failed.

In my case, this problem solved with zlib.

cd ./openssl-1.0.2
./Configure shared zlib enable-ec_nistp_64_gcc_128 linux-x86_64

Trac:
Username: engrish

Made dgoulet's changes, and merged the patch. Thanks!

Trac:
Resolution: N/A to fixed
Status: needs_revision to closed

closed

mentioned in issue #16706 (moved)

moved to tpo/core/tor#9635 (closed)

mentioned in issue tpo/core/tor#16706 (closed)

Tor clients warn when they use the wrong ntor onion key

Child items ...

Activity