Opened 6 years ago

Closed 4 years ago

#9635 closed defect (fixed)

Tor clients warn when they use the wrong ntor onion key

Reported by: bastik Owned by:
Priority: Medium Milestone: Tor: 0.2.6.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Keywords: spammy-warnings, tor-bridge 026-triaged-1 025-backport nickm-patch
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I got this warnings (exactly once, for the first time I'm aware of) on my 0.2.4.16-rc bridge on Windows.

Strangely I found no tickets for any of these failures.

Aug 31 09:37:10.793 [Warning] onion_skin_client_handshake failed.
Aug 31 09:37:10.793 [Warning] circuit_finish_handshake failed.
Aug 31 09:37:10.794 [Warning] connection_edge_process_relay_cell (at origin) failed.

I'm unsure what _client means. Did my client (yes, I use my bridge's client functions) fail or did the client of an connecting bridge user fail?

Client functionality is not affected.

Child Tickets

Change History (27)

comment:1 Changed 6 years ago by arma

I saw one of these today too.

I assume it's a codepath rarely used, for when we give up on a circuit we're making because it's been a while since we sent a create cell, that is triggered more commonly during our "million new users" crisis.

The warning is nothing to worry about, but it's something we should track down and fix in the code.

comment:2 Changed 6 years ago by nickm

Keywords: tor-bridge 024-backport added
Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final

comment:3 Changed 6 years ago by arma

Here's the bonus info-level message that happens after the above three and gives us our clue:

Sep 10 18:33:52.724 [info] command_process_relay_cell(): circuit_receive_relay_
cell (backward) failed. Closing.

Looks like the call chain is that command_process_relay_cell() receives an extend or extend2 cell, passes it to circuit_receive_relay_cell(), which passes it to connection_edge_process_relay_cell() (in the CELL_DIRECTION_IN case), which sees RELAY_COMMAND_EXTENDED or RELAY_COMMAND_EXTENDED2 and calls circuit_finish_handshake() which pulls out the 'created' cell and calls onion_skin_client_handshake(),
which returns -1 but doesn't say why.

So if I were to suggest a fix, I'd suggest that onion_skin_client_handshake() should have an *errmsg parameter which it sets if it wants to complain about something. Then the various subfunctions it calls can set it, or we can choose not to complain if a failure isn't worth telling the user about.

comment:4 Changed 6 years ago by arma

Summary: Tor fails handshakes, onion_skin_client_handshake, circuit_finish_handshake, connection_edge_process_relay_cellTor clients warn when they use the wrong ntor onion key

skruffy describes a plausible explanation for why these client-side warns are happening: the client is using an old ntor onion key, and in the new ntor design, there's no way for the relay to know what onion key the create request is intended for. So the relay answers anyway, and the client is shocked to find that it can't complete the handshake.

I guess it would be nice if the relay could realize that the handshake is going to fail, so it can help the client understand what's happening. But failing that, I guess we should make the client not complain when it fails?

comment:5 in reply to:  4 Changed 6 years ago by rransom

Replying to arma:

skruffy describes a plausible explanation for why these client-side warns are happening: the client is using an old ntor onion key, and in the new ntor design, there's no way for the relay to know what onion key the create request is intended for. So the relay answers anyway, and the client is shocked to find that it can't complete the handshake.

What is KEYID for then?

comment:6 Changed 6 years ago by nickm

rransom is correct that there is a perfectly good way (KEYID) for the relay to learn which ntor key was supposed to get used.

But, if for some weird reason, the client has an onion key so ancient that the relay doesn't recognize it any longer, then rather than send a DESTROY cell, the relay will pick a junk onion key and use that to complete the handshake anyway, on the theory that it's better not to leak anything in timing information. Could that be what's going on here?

comment:7 Changed 6 years ago by arma

For more data, on moria1,

$ grep "\[warn\] onion_skin_client_handshake failed." moria1-notice|grep ^Sep|wc -l          
162

So this is a real thing that seems to be happening quite often to lots of people.

comment:8 in reply to:  6 Changed 6 years ago by sysrqb

Replying to nickm:

But, if for some weird reason, the client has an onion key so ancient that the relay doesn't recognize it any longer, then rather than send a DESTROY cell, the relay will pick a junk onion key and use that to complete the handshake anyway, on the theory that it's better not to leak anything in timing information. Could that be what's going on here?

If the client has an invalid ntor onion key, then it looks like this would be a symptom of it because the AUTH value sent by the server will not match the client's computed auth_input. This seems possible, and the warning will be meaningless to the user, so a possible patch is to simply zero Y, before sending it back to the client, if the server chooses the junk key. This will be the point at infinity in the exponentiation, so current implementations will still fail and warn, but new implementations can check for this value and not warn on failure. Is it worth reducing the security level of the handshake just so we can handle this exception?

Alternatively, maybe just demote the message from warning to notice or info level. In either case, because there are subsequent log messages, we'll need to be able to inform the calling function that we had an error but not to emit the warning. We could return -2 and catch that up the stack, I guess. What's choice 3?

comment:9 Changed 5 years ago by nickm

Keywords: 025-triaged added

I think at this point the right choice for 0.2.5 is a better log message.

comment:10 Changed 5 years ago by nickm

Status: newneeds_review

I've done a "better log message" branch as "bug9635". I think it's fine to merge it in 0.2.5.

We aren't actually seeing a lot of reports of these IRL, though, so I'll suggest that deferring to 0.2.6 could be fine too.

(Grepping an authority's logs isn't representative, since we'd expect an authority to see a larger-than-usual share of nodes that are trying to configure themselves.)

comment:11 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.6.x-final

comment:12 Changed 5 years ago by nickm

merged the 'better log message' bit to 0.2.5.

comment:13 Changed 5 years ago by nickm

Status: needs_reviewnew

comment:14 Changed 5 years ago by asn

Someone said this in IRC today:

Seeing a few of these in my log:- [warn] Invalid result from curve25519 handshake: 4
Always followed by:-
Jun 19 07:51:24.000 [warn] onion_skin_client_handshake failed.
Jun 19 07:51:24.000 [warn] circuit_finish_handshake failed.
Jun 19 07:51:24.000 [warn] connection_edge_process_relay_cell (at origin) failed.

Is this the same bug? I haven't seen the Invalid result from... message before.

comment:15 in reply to:  14 Changed 5 years ago by arma

Replying to asn:

Is this the same bug? I haven't seen the Invalid result from... message before.

Yes. That log message is from Nickm's "merged the 'better log message' bit to 0.2.5." comment above. See also git commit df03e9b7 from May 2014.

comment:16 Changed 5 years ago by nickm

So, "4" in that context means the auth value was wrong, which could indeed correspond to the case discussed above where the client asks for an onion key the server doesn't believe in, so the server uses one that it does, knowing that it will generate a bogus result.

comment:17 Changed 5 years ago by nickm

Keywords: 026-triaged-1 025-backport added; 024-backport 025-triaged removed

comment:18 Changed 5 years ago by Sebastian

Keywords: spammy-warnings added

comment:19 Changed 5 years ago by siekiera

just in case it makes sense to report:

Oct 01 22:18:58.000 [warn] Invalid result from curve25519 handshake: 4
Oct 01 22:18:58.000 [warn] onion_skin_client_handshake failed.
Oct 01 22:18:58.000 [warn] circuit_finish_handshake failed.
Oct 01 22:18:58.000 [warn] connection_edge_process_relay_cell (at origin) failed.

on

Oct 01 22:12:46.000 [notice] Tor 0.2.5.8-rc (git-a64f3ab3ee5c433c) opening log file.

from homebrew on osx 10.7.5

comment:20 Changed 5 years ago by kargig

Still getting a lot of these with v0.2.5.8-rc (git-eaa9ca1011e73a9d)

Oct 14 09:36:25.000 [warn] Invalid result from curve25519 handshake: 4
Oct 14 09:36:25.000 [warn] onion_skin_client_handshake failed.
Oct 14 09:36:25.000 [warn] circuit_finish_handshake failed.
Oct 14 09:36:25.000 [warn] connection_edge_process_relay_cell (at origin) failed.
Oct 14 09:36:26.000 [warn] Invalid result from curve25519 handshake: 4
Oct 14 09:36:26.000 [warn] onion_skin_client_handshake failed.
Oct 14 09:36:26.000 [warn] circuit_finish_handshake failed.
Oct 14 09:36:26.000 [warn] connection_edge_process_relay_cell (at origin) failed.

comment:21 Changed 5 years ago by arma

Turning these into info-level logs makes sense to me -- it's not anything the user can do anything about, and it does indeed happen often enough that people keep reporting it.

That said, is there a patch? I think it might be a bit complex to downgrade the log severity here without downgrading the general "a circuit build attempt failed" case.

My "if I were to suggest a fix" paragraph above still makes sense, but it's sure not a one-liner.

comment:22 Changed 5 years ago by nickm

Status: newneeds_review

Ug. Yeah, that's not a one-liner at all. See branch bug9635_warnings_025. It's a 400-line diff touching over a dozen modules. It's against 0.2.5, but despite the name I think maybe this one should go into 0.2.6.

Of especial interest when reviewing this is the change in relay.c, which makes a circuit_mark_for_close happen earlier and changes a return value.

comment:23 Changed 5 years ago by nickm

Keywords: nickm-patch added

comment:24 Changed 5 years ago by dgoulet

commit fb91d647acdf0560fc7479d72eeea52e4e6ff41d

  • circuit_finish_handshake() +1284: adding log_warn(LD_BUG, "Couldn't initialize cpath crypto"); seems redondant because circuit_init_cpath_crypto() already has logging for every possible failure.
  • onion_skin_ntor_client_handshake() +291: So this means that if msg_out is NULL, there are no log at all. Seems reasonable for bad & 4 because it was downgraded but bad & 3 is still a warning thus should it be logged even if msg_out is NULL?

The rest looks good to me! ACK.

comment:25 Changed 5 years ago by nickm

Status: needs_reviewneeds_revision

comment:26 Changed 4 years ago by engrish

I compiled openssl and Tor with following command on CentOS 6.6 2.6.32-431.el6.x86_64

cd ./openssl-1.0.2
./Configure shared enable-ec_nistp_64_gcc_128 linux-x86_64

cd ./tor-0.2.6.2-alpha
./configure --with-openssl-dir=/usr/local/ssl --with-zlib-dir=/lib --with-libevent-dir=/usr/local/lib

after a while, I got a warn log when runninng relay

Jan 29 12:09:51.828 [notice] Tor v0.2.6.2-alpha running on Linux with Libevent 2.0.22-stable, OpenSSL 1.0.2 and Zlib 1.2
.8.
...
Jan 29 12:11:02.000 [warn] Invalid result from curve25519 handshake: 4
Jan 29 12:11:02.000 [warn] onion_skin_client_handshake failed.
Jan 29 12:11:02.000 [warn] circuit_finish_handshake failed.
Jan 29 12:11:02.000 [warn] connection_edge_process_relay_cell (at origin) failed.

In my case, this problem solved with zlib.

cd ./openssl-1.0.2
./Configure shared zlib enable-ec_nistp_64_gcc_128 linux-x86_64

comment:27 Changed 4 years ago by nickm

Resolution: fixed
Status: needs_revisionclosed

Made dgoulet's changes, and merged the patch. Thanks!

Note: See TracTickets for help on using tickets.