Opened 9 months ago

Closed 8 months ago

#24898 closed defect (fixed)

We have two conflicting notions of channel_is_client()

Reported by: arma Owned by:
Priority: Medium Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version: Tor: 0.3.1.1-alpha
Severity: Normal Keywords: tor-relay, 032-backport, 031-backport, 030-backport-maybe-with-21406, 029-backport-maybe-with-21406
Cc: mikeperry Actual Points:
Parent ID: #24902 Points: 0.5
Reviewer: teor Sponsor:

Description

In channel_tls_process_netinfo_cell() we have a section that says

      if (!(chan->conn->handshake_state->authenticated)) {
        tor_assert(tor_digest_is_zero(
                  (const char*)(chan->conn->handshake_state->
                      authenticated_rsa_peer_id)));
        tor_assert(tor_mem_is_zero(
                  (const char*)(chan->conn->handshake_state->
                                authenticated_ed25519_peer_id.pubkey), 32));
        /* If the client never authenticated, it's a tor client or bridge
         * relay, and we must not use it for EXTEND requests (nor could we, as
         * there are no authenticated peer IDs) */
        channel_mark_client(TLS_CHAN_TO_BASE(chan));

That's great and awesome -- it's a clear solid definition of client: a client is somebody who did the client handshake with us, not the relay handshake.

But then in command.c we have this weird clause:

  if (connection_or_digest_is_known_relay(chan->identity_digest)) {
    rep_hist_note_circuit_handshake_requested(create_cell->handshake_type);
    // Needed for chutney: Sometimes relays aren't in the consensus yet, and
    // get marked as clients. This resets their channels once they appear.
    // Probably useful for normal operation wrt relay flapping, too.
    channel_clear_client(chan);
  } else {
    channel_mark_client(chan);
  }

This definition of client is much murkier: a client is anybody who, at this moment of receiving a create cell, had earlier presented an identity digest that isn't in our current consensus and that we don't have a cached descriptor for.

I think we should standardize on one definition, and I think we should pick the first one: it's clean and not full of false positives.

Child Tickets

TicketStatusOwnerSummaryComponent
#21406closedteorThe channel is_client flag is inaccurateCore Tor/Tor
#22805closednickmRemove or_circuit_t.is_first_hop, because it's not accurate any moreCore Tor/Tor

Change History (25)

comment:1 in reply to:  description Changed 9 months ago by arma

Replying to arma:

full of false positives

I discovered this bug while working on the current circuit / conn overload patches, considering blacklisting client IP addresses that misbehave in certain ways. I found that my fast relay has a *lot* of conns that are marked as channel_is_client() that are very clearly connections to/from relays.

I assume the conns started off not marked as clients, and then the clause in command.c declared them as clients.

comment:2 Changed 9 months ago by arma

Right now, in channel_tls_process_netinfo_cell(), for the OR_CONN_STATE_OR_HANDSHAKING_V3 case, we call channel_mark_client() properly.

I propose that we call everything a client that handshakes with tor_tls_used_v1_handshake() or OR_CONN_STATE_OR_HANDSHAKING_V2, since relays should never use those legacy handshakes with other relays at this point.

And then we can happily remove the dangerous clause from command.c.

Last edited 9 months ago by arma (previous) (diff)

comment:3 Changed 9 months ago by teor

Sounds good to me.

(I wrote the first definition to fix an earlier issue where we used CREATE_FAST to mark clients. I wasn't aware of the second.)

comment:4 Changed 9 months ago by teor

Keywords: tor-relay 032-backport 031-backport 030-backport-maybe-with-21406 029-backport-maybe-with-21406 added
Points: 0.5
Reviewer: teor
Status: newmerge_ready
Version: Tor: 0.3.1.1-alpha

This works fine for me in the latest chutney, that comment it outdated.

But please don't remove these lines, they are used by rep_hist:

  if (connection_or_digest_is_known_relay(chan->identity_digest)) {
    rep_hist_note_circuit_handshake_requested(create_cell->handshake_type);

I restored this code in my branch bug24898 on https://github.com/teor2345/tor.git

Now we just have to decide how far to backport it.
If we want it to go to 0.3.0 or 0.2.9, we need to backport #21406 as well.

comment:5 Changed 9 months ago by arma

Comment written in parallel to teor's above comment:

I pushed a bug24898 branch that I think fixes everything.

It's on master though, and we might want to backport this. I'm not sure how far back to backport it -- we could go back to maint-0.3.1 pretty easily (there's a conflict but it's just because of a comment), but going back to 0.2.9 would be quite tricky because #21406 went into 0.3.1.

The question of whether to backport more depends on how much we want these fixes in 0.2.9 -- this one is a cornerstone of some of the DoS mitigations that might come next.

comment:6 Changed 9 months ago by arma

Great, I made a bug24898-2 branch that goes on master and includes teor's fix.

I also just made a bug24898-031 branch that goes on maint-0.3.1 if we want to merge it to there.

comment:7 Changed 9 months ago by teor

I am happy with the latest branch, let's merge it to master and consider back to 0.3.1.

I added #22805 as a child, because those changes are related as well.
But 0.2.9 and 0.3.0 have been fine without any of this other code.

If we want to go back earlier than 0.3.1, let's only backport the parts of channel_tls_process_netinfo_cell() in #21406 that we need to get this working, and leave out everything else.

comment:8 Changed 9 months ago by teor

(This code sets chan->is_client = 0; directly in 0.3.0, and it does the rep_hist thing in another place, so we'll need another branch if we want to backport there. The good news is that the create cell bug was introduced in 0.3.1 with the netflow padding code, so we only need to backport the v1/2 fix to 0.3.0 and earlier.)

Last edited 9 months ago by teor (previous) (diff)

comment:9 Changed 9 months ago by arma

Commit f5ff9f23 in my bug24898-more branch has a bunch of fixes to replace connection_or_digest_is_known_relay() with channel_is_client().

Feel free to merge the commit straight-up, or feel free to strip it down for parts and take pieces of it for your own commits for all these related tickets.

comment:10 in reply to:  9 Changed 9 months ago by teor

Replying to arma:

Commit f5ff9f23 in my bug24898-more branch has a bunch of fixes to replace connection_or_digest_is_known_relay() with channel_is_client().

Feel free to merge the commit straight-up, or feel free to strip it down for parts and take pieces of it for your own commits for all these related tickets.

Let's deal with that in #23423, because all the details are there. It's not something we're going to backport.

If we want to backport arma's bug24898-031 to 0.2.9 and 0.3.0, we need to backport:

  • the channel_mark_client() in connection_or_set_state_open() from commit af8cadf3a9 in branch bug24898-031 in this ticket
  • the channel_mark_client() in channel_tls_process_netinfo_cell() from commit 46fe353f25 in master in #21406

I am still happy with merging arma's bug24898-031 to 0.3.1 and arma's bug24898-2 to master.

comment:11 Changed 9 months ago by teor

This patch sometimes causes mixed chutney networks to fail log warnings. That's a bug in the unpatched version, not the patch.

Last edited 9 months ago by teor (previous) (diff)

comment:12 Changed 9 months ago by nickm

Cc: mikeperry added

Will this cause warnings on the real network too?

Why (did we introduce the create-cell check in the first place? ISTR that we thought we had a reason to do so. It looks like Mike introduced it in b0e92634d85a3bf7612a6ce0339b96e4aad1e0bb, and moved it in 02a5835c27780e45f705fc1c044b9c471b929dbe. Adding mike to cc.

We could skip the 0.3.0 backport, I think, since 0.3.0 is EOL at the end of this month. But we should do an 0.2.9 backport if we think this is important: 0.2.9 is supported till 2020.

comment:13 Changed 9 months ago by nickm

Keywords: review-group-30 added

comment:14 in reply to:  12 Changed 9 months ago by teor

Replying to nickm:

Will this cause warnings on the real network too?

Why (did we introduce the create-cell check in the first place? ISTR that we thought we had a reason to do so. It looks like Mike introduced it in b0e92634d85a3bf7612a6ce0339b96e4aad1e0bb, and moved it in 02a5835c27780e45f705fc1c044b9c471b929dbe. Adding mike to cc.

Mike's comments say that it led to chutney failures.
I haven't observed any failures, but I have observed warnings in 0.3.2.9 when we apply this patch to master in a mixed network.
That really is a bug in 0.3.1 and 0.3.2, because Mike's 0.3.1 fix is identifying clients incorrectly.
There's nothing we can do about these warnings, except backport the fix.

We could skip the 0.3.0 backport, I think, since 0.3.0 is EOL at the end of this month. But we should do an 0.2.9 backport if we think this is important: 0.2.9 is supported till 2020.

comment:15 Changed 9 months ago by mikeperry

It does appear to me that if we use the handshake patch from #21406 that we're going to be much better about identifying well-behaved clients correctly, and this should be better for the netflow code too (and not break it). I am a little worried about misbehaving clients - for those it feels smarter to check the digest in cases where that matters for DoS. But that should be a separate bug.

Thanks for this.

Edit: fix ticket number

Last edited 9 months ago by teor (previous) (diff)

comment:16 Changed 9 months ago by nickm

Keywords: review-group-30 removed
Milestone: Tor: 0.3.3.x-finalTor: 0.2.9.x-final

I've merged this to 0.3.1 and forward. I'm marking it for a potential 0.2.9 backport.

comment:17 Changed 9 months ago by arma

Status: merge_readyneeds_review

I have made a bug24898-029 branch, designed for inclusion in 029 and 030. It will conflict with 031 because stuff is already fixed there, so you'll want to do the right version of 'merge -ours' after 030.

We will want this branch merged when we put the #24902 DoS patch into 029.

comment:18 Changed 9 months ago by teor

Parent ID: #24902
Status: needs_reviewmerge_ready

This looks fine to me, I've been running a similar patch on 030 for a while now.

Do we still use CREATE_FAST to mark clients in 029? Or did we backport that fix as well?

comment:19 in reply to:  18 ; Changed 9 months ago by arma

Replying to teor:

Do we still use CREATE_FAST to mark clients in 029? Or did we backport that fix as well?

We still do it. That is, anybody who sends you a create-fast cell when you're running 029 or 030 now gets counted as a client, meaning the DoS defenses will be triggered against them (once #24902 gets backported), and meaning when we consider whether a channel can satisfy an extend request, this one can't.

Why, are we going to regret that? :) See also the discussions in #22805.

comment:20 in reply to:  19 Changed 9 months ago by teor

Status: merge_readyneeds_revision

Replying to arma:

Replying to teor:

Do we still use CREATE_FAST to mark clients in 029? Or did we backport that fix as well?

We still do it. That is, anybody who sends you a create-fast cell when you're running 029 or 030 now gets counted as a client, meaning the DoS defenses will be triggered against them (once #24902 gets backported), and meaning when we consider whether a channel can satisfy an extend request, this one can't.

Why, are we going to regret that? :) See also the discussions in #22805.

I think we are going to regret that, and we should backport the parts of #22805 that remove the CREATE_FAST mark client.

We should also take them out on general principles, because it reduces the number of connections between relays.

comment:21 Changed 9 months ago by arma

Status: needs_revisionmerge_ready

Ok, I added a second commit to my bug24898-029 branch that backports that part from #22805. I think we are good to go now.

To be clear, that means commit 4bd208b wants to make it into 0.2.9 and 0.3.0, and its changes are already present in 0.3.1. Whereas commit 2076db4 wants to make it into 0.2.9 and 0.3.0 and 0.3.1, and its changes are already present in 0.3.2.

Sorry in advance for the mess Nick. I would just merge it all myself but I think I would screw up the "ours" aspects of the conflict handling. :)

comment:22 Changed 9 months ago by nickm

Remove 030-backport from all open tickets that have it: 0.3.0 is now deprecated.

comment:23 Changed 8 months ago by nickm

Is this still a must-backport-as-part-of-0.2.9 thing?

comment:24 in reply to:  21 Changed 8 months ago by teor

Replying to arma:

Ok, I added a second commit to my bug24898-029 branch that backports that part from #22805. I think we are good to go now.

To be clear, that means commit 4bd208b wants to make it into 0.2.9 and 0.3.0, and its changes are already present in 0.3.1. Whereas commit 2076db4 wants to make it into 0.2.9 and 0.3.0 and 0.3.1, and its changes are already present in 0.3.2.

Sorry in advance for the mess Nick. I would just merge it all myself but I think I would screw up the "ours" aspects of the conflict handling. :)

Yes, we must backport 4bd208b and 2076db4 to 0.2.9, and backport 2076db4 to 0.3.1, so that the DoS backport identifies clients correctly.

comment:25 Changed 8 months ago by dgoulet

Resolution: fixed
Status: merge_readyclosed

Everything has been backported it appears. Closing.

Note: See TracTickets for help on using tickets.