Opened 23 months ago

Closed 5 months ago

#17604 closed defect (fixed)

Try to use only one canonical connection

Reported by: mikeperry Owned by: mikeperry
Priority: High Milestone: Tor: 0.3.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: TorCoreTeam-postponed-201604, nickm-deferred-20160905, review-group-9, nickm-deferred-20161005, sponsor2
Cc: nickm, arma Actual Points:
Parent ID: #16861 Points: see-parent
Reviewer: nickm Sponsor:

Description

For #16861, I would like to experiment with padding between relays, as well as generally keep relay-to-relay orconns open much longer. This will be beneficial against attacks like Torscan (https://eprint.iacr.org/2012/432.pdf), as well as related netflow-based attacks that attempt to determine the guard node of a connection by using netflow data to accomplish the same thing as the Torscan attack.

Unfortunately, the logic for preferring orconns (is_canonical and channel_is_better()) is a mind-bender, but it appears to be the case that we may have situations where multiple orconns can be opened between relays where each side disagrees over which connection is canonical. This can happen because the source port won't match the orport in the descriptor of the remote relay for incoming connections. It will also be the case for nodes that make outgoing connections from different IP address than those in their descriptor.

I asked on #tor-dev, and two relay operators reported cases of such multiple connections to relays.

I think the following changes will improve the situation:

  1. Mark all authenticated connections as canonical (everything with link proto v3 and higher will authenticate, yes?)
  2. Alter channel_is_better() to prefer older orconns in the case of multiple canonical connections, and use the orconn with more circuits on it in case of age ties.

I think age is more important than number of circuits because we want to avoid giving an attacker the ability to switch an orconn between relays by creating a new one and opening a bunch of circuits on it. channel_is_better() is doing the opposite of this behavior right now, on both fronts.

Child Tickets

Change History (35)

comment:1 Changed 23 months ago by mikeperry

I'm also tempted to patch channel_tls_matches_target_method() so that it allows extend cells to be sent on an orconn if they match either the descriptor address or the actual originating address of an orconn. This would also help converge on a single orconn for relays that have outbound traffic from different IPs as their inbound traffic.

However, it will also mean that it becomes possible to steal a relay's keys and start making TLS connections to all other relays from anywhere on the Internet, and wait for those connections to become old enough to be chosen for extends. This issue may outweigh the corner case. It probably does, in fact. Happy to hear thoughts, though. Maybe there are other things that would prevent this attack?

comment:2 Changed 23 months ago by mikeperry

Status: newneeds_review

FYI: This commit implements the ideas in the description: https://gitweb.torproject.org/mikeperry/tor.git/commit/?h=netflow_padding-v3-squashed&id=86f950da4675a4c247236dc352bbeb3408f040eb

I chose not to mess with channel_tls_matches_target_method() because of the issues in comment:1.

I've tested it a bit in chutney. It seems to be behaving sanely. If this looks ok to do, I will bang on it harder and do some more thorough chutney testing.

comment:3 Changed 22 months ago by mikeperry

Status: needs_reviewneeds_revision

Roger suggested that we have some kind of check in the relays themselves for multiple connections so we can see how bad this is still in practice, in case this doesn't solve it all the way. I think such a check may be expensive, but we could do it on SIGUSR1, perhaps? Or once per day?

comment:4 Changed 22 months ago by mikeperry

Status: needs_revisionneeds_review

Ok, after implementing the periodic check that Roger suggested, and after much chutney testing and code spelunking, I changed strategies here. Instead of granting canonical status to *more* things, I decided to add some checks so that relays are more likely to *agree* on their canonical status (inspired in part by Roger's comment at https://trac.torproject.org/projects/tor/ticket/6799#comment:14). For this, I use NETINFO peer address information to compare against what we are advertising for our router address, and if they disagree, the other side probably won't think we are canonical.

I then changed channel_is_better() to not only prefer older connections, but also prefer connections where we think the peer will decide we are canonical. With these updates to channel_is_better(), connection_or_set_bad_connections() will mark all of these "half-canonical" orcons as bad for circs if we ever have a "full-canonical" option available for use instead. It will also mark younger orcons as bad for circs, as it is actually better to prefer old orcons when defending against Torscan attacks. Orcons will still live for a max of 1 week regardless, though. I did not change that.

Here is the commit:
https://gitweb.torproject.org/mikeperry/tor.git/commit/?h=netflow_padding-v4&id=d0a3ddd7814745a0760cc38b7d86e113e9be8b51

Oh, it also turns out that we're already vulnerable to the attack in comment:1, because all a rogue node has to do is list its rogue address in its NETINFO cells, and it gets marked canonical. It is only non-canonical connections that get their real_addr checked by channel_tls_matches_target_method(). Do we care about that? I did not change that behavior in this patch at all. I merely noted the issue with an XXX in the source.

comment:5 in reply to:  4 ; Changed 22 months ago by teor

This patch looks good overall.

Just a few questions:

channel_check_for_duplicates() says:

This function is similar to connection_or_set_bad_connections(),
and probably could be adapted to replace it, if it was modified to actually
take action on any of these connections.

Are we waiting to see what it logs before using it to replace connection_or_set_bad_connections()?

Replying to mikeperry:

Oh, it also turns out that we're already vulnerable to the attack in comment:1, because all a rogue node has to do is list its rogue address in its NETINFO cells, and it gets marked canonical. It is only non-canonical connections that get their real_addr checked by channel_tls_matches_target_method(). Do we care about that? I did not change that behavior in this patch at all. I merely noted the issue with an XXX in the source.

Can we check real_addr for all connections?
Will it take a long time to code up?
Does it impact performance?

And a nitpick:

In check_canonical_channels_callback:

  • I think public_server_mode(options) is the standard way of saying !options->BridgeRelay && server_mode(options). I think they do the same thing, but it might be worth checking.

comment:6 Changed 22 months ago by teor

How do these changes affect the attack described in #13155?
"I can use an extend cell to remotely determine whether two relays have a connection open"

comment:7 in reply to:  5 ; Changed 22 months ago by mikeperry

Replying to teor:

This patch looks good overall.

Just a few questions:

channel_check_for_duplicates() says:

This function is similar to connection_or_set_bad_connections(),
and probably could be adapted to replace it, if it was modified to actually
take action on any of these connections.

Are we waiting to see what it logs before using it to replace connection_or_set_bad_connections()?

I think so. That or a switch to a datagram transport, or some other wider effort to completely remove all of the connection_or layer.

Replying to mikeperry:

Oh, it also turns out that we're already vulnerable to the attack in comment:1, because all a rogue node has to do is list its rogue address in its NETINFO cells, and it gets marked canonical. It is only non-canonical connections that get their real_addr checked by channel_tls_matches_target_method(). Do we care about that? I did not change that behavior in this patch at all. I merely noted the issue with an XXX in the source.

Can we check real_addr for all connections?
Will it take a long time to code up?
Does it impact performance?

I think the main problem is that if we don't allow this netinfo mechanism, we need to find a different way for IPv6 connections to become 'canonical'. If we do care about this (and maybe we do), I think it should probably be a different ticket to change this behavior. The right way to do it probably means checking that the netinfo cell stuff matches at least *something* from the descriptor. But maybe that will have other issues? Nick or Andrea probably need to chime in on that topic.

And a nitpick:

In check_canonical_channels_callback:

  • I think public_server_mode(options) is the standard way of saying !options->BridgeRelay && server_mode(options). I think they do the same thing, but it might be worth checking.

Fixed in another fixup commit.

comment:8 in reply to:  7 Changed 22 months ago by teor

Replying to mikeperry:

Replying to teor:

Replying to mikeperry:

Oh, it also turns out that we're already vulnerable to the attack in comment:1, because all a rogue node has to do is list its rogue address in its NETINFO cells, and it gets marked canonical. It is only non-canonical connections that get their real_addr checked by channel_tls_matches_target_method(). Do we care about that? I did not change that behavior in this patch at all. I merely noted the issue with an XXX in the source.

Can we check real_addr for all connections?
Will it take a long time to code up?
Does it impact performance?

I think the main problem is that if we don't allow this netinfo mechanism, we need to find a different way for IPv6 connections to become 'canonical'. If we do care about this (and maybe we do), I think it should probably be a different ticket to change this behavior. The right way to do it probably means checking that the netinfo cell stuff matches at least *something* from the descriptor. But maybe that will have other issues? Nick or Andrea probably need to chime in on that topic.

I'm working on IPv6 client support at the moment. These sort of complexities are one of the reasons I don't even want to touch the IPv6 server code.

And a nitpick:

In check_canonical_channels_callback:

  • I think public_server_mode(options) is the standard way of saying !options->BridgeRelay && server_mode(options). I think they do the same thing, but it might be worth checking.

Fixed in another fixup commit.

The fixups all look good.

Are we going to merge this? (Pending comment from Nick or Andrea on canonical IPv6 connections.)

comment:9 Changed 22 months ago by nickm

Milestone: Tor: 0.2.8.x-final

oh crud, wasn't on my review queue because it didn't have a milestone.

comment:10 Changed 21 months ago by nickm

I've squashed the three major commits here into a new "netflow_padding-v4_squashed" branch so I can review them one-by-one.

comment:11 Changed 20 months ago by nickm

  • "Your relay has a very large number of connections" should probably emphasize that it has a large number of connections _per relay_.

Otherwise this looks pretty nice...

Though going to "prefer new connections" means that rotating connections is going to be nigh-impossible, right?

comment:12 Changed 20 months ago by mikeperry

The log message and some comments for the "prefer new connection" connection rotation behavior are in 8736b01c7fe0d599214067e20fe00d78c9f9de81 of mikeperry/netflow_padding-v4_squashed+rebased.

comment:13 Changed 19 months ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

These seem like features, or like other stuff unlikely to be possible this month. Bumping them to 0.2.9

comment:14 Changed 18 months ago by nickm

Points: see-parent

comment:15 Changed 18 months ago by nickm

Keywords: TorCoreTeam201604 added

Every postponed needs_review ticket should get a review in April

comment:16 Changed 18 months ago by nickm

Reviewer: nickm

comment:17 Changed 17 months ago by nickm

Keywords: TorCoreTeam-postponed-201604 TorCoreTeam201605 added; TorCoreTeam201604 removed

I will not be getting these revised and reviewed this week. I hold out hope for May. Sorry mike. Please let me know whether you want to revise them wrt my handles/timing patches, or whether I should. I'm happy either way.

comment:18 Changed 17 months ago by nickm

Status: needs_reviewneeds_revision

My current understanding here is that mike means to revise this branch based on other merges we're doing. Moving these to needs_revision in the meantime. Please let me know if I'm incorrect.

comment:19 Changed 16 months ago by nickm

Keywords: TorCoreTeam201605 removed

Remove "TorCoreTeam201605" keyword. The time machine is broken.

comment:20 Changed 13 months ago by nickm

Keywords: nickm-deferred-20160905 added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

Deferring many tickets that are in needs_revision with no progress noted for a while, where I think they could safely wait till 0.3.0 or later.

Please feel free to move these back to 0.2.9 if you finish the revisions soon.

comment:21 Changed 13 months ago by mikeperry

Milestone: Tor: 0.2.???Tor: 0.2.9.x-final
Status: needs_revisionneeds_review

Alright. I switched the code over to using the new handle, monotonic timer, and timer wheel abstractions. All unit tests pass without leaks from this code (though the unit tests have grown new memory leaks of their own).

mikeperry/netflow_padding-v6. The commit specific to this bug is 1ce02c50b5accd5059922e980745a43ca4ecfee5.

comment:22 Changed 13 months ago by nickm

Keywords: review-group-9 added

comment:23 Changed 12 months ago by nickm

Status: needs_reviewneeds_revision

comment:24 Changed 12 months ago by mikeperry

Status: needs_revisionneeds_review

comment:25 Changed 12 months ago by nickm

Keywords: nickm-deferred-20161005 added
Milestone: Tor: 0.2.9.x-finalTor: 0.3.0.x-final

Deferring big/risky-feature things (even the ones I really love!) to 0.3.0. Please argue if I'm wrong.

comment:26 Changed 11 months ago by nickm

Keywords: review-group-11 added

comment:27 Changed 10 months ago by nickm

Keywords: review-group-12 added; review-group-11 removed

comment:28 Changed 10 months ago by nickm

Keywords: review-group-13 added; review-group-12 removed

comment:29 Changed 9 months ago by nickm

Keywords: review-group-14 added; review-group-13 removed

And that's all for review-group-13.

comment:30 Changed 9 months ago by nickm

Milestone: Tor: 0.3.0.x-finalTor: 0.3.1.x-final

comment:31 Changed 9 months ago by nickm

Keywords: review-group-14 removed

comment:32 Changed 7 months ago by nickm

Keywords: review-group-16 added

comment:33 Changed 7 months ago by nickm

Keywords: sponsor2 added
Priority: MediumHigh

comment:34 Changed 6 months ago by nickm

Keywords: review-group-16 removed

comment:35 Changed 5 months ago by nickm

Resolution: fixed
Status: needs_reviewclosed

merging parent

Note: See TracTickets for help on using tickets.