For #16861 (moved), I would like to experiment with padding between relays, as well as generally keep relay-to-relay orconns open much longer. This will be beneficial against attacks like Torscan (https://eprint.iacr.org/2012/432.pdf), as well as related netflow-based attacks that attempt to determine the guard node of a connection by using netflow data to accomplish the same thing as the Torscan attack.
Unfortunately, the logic for preferring orconns (is_canonical and channel_is_better()) is a mind-bender, but it appears to be the case that we may have situations where multiple orconns can be opened between relays where each side disagrees over which connection is canonical. This can happen because the source port won't match the orport in the descriptor of the remote relay for incoming connections. It will also be the case for nodes that make outgoing connections from different IP address than those in their descriptor.
I asked on #tor-dev, and two relay operators reported cases of such multiple connections to relays.
I think the following changes will improve the situation:
Mark all authenticated connections as canonical (everything with link proto v3 and higher will authenticate, yes?)
Alter channel_is_better() to prefer older orconns in the case of multiple canonical connections, and use the orconn with more circuits on it in case of age ties.
I think age is more important than number of circuits because we want to avoid giving an attacker the ability to switch an orconn between relays by creating a new one and opening a bunch of circuits on it. channel_is_better() is doing the opposite of this behavior right now, on both fronts.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I'm also tempted to patch channel_tls_matches_target_method() so that it allows extend cells to be sent on an orconn if they match either the descriptor address or the actual originating address of an orconn. This would also help converge on a single orconn for relays that have outbound traffic from different IPs as their inbound traffic.
However, it will also mean that it becomes possible to steal a relay's keys and start making TLS connections to all other relays from anywhere on the Internet, and wait for those connections to become old enough to be chosen for extends. This issue may outweigh the corner case. It probably does, in fact. Happy to hear thoughts, though. Maybe there are other things that would prevent this attack?
I chose not to mess with channel_tls_matches_target_method() because of the issues in comment:1.
I've tested it a bit in chutney. It seems to be behaving sanely. If this looks ok to do, I will bang on it harder and do some more thorough chutney testing.
Roger suggested that we have some kind of check in the relays themselves for multiple connections so we can see how bad this is still in practice, in case this doesn't solve it all the way. I think such a check may be expensive, but we could do it on SIGUSR1, perhaps? Or once per day?
Ok, after implementing the periodic check that Roger suggested, and after much chutney testing and code spelunking, I changed strategies here. Instead of granting canonical status to more things, I decided to add some checks so that relays are more likely to agree on their canonical status (inspired in part by Roger's comment at https://trac.torproject.org/projects/tor/ticket/6799#comment:14). For this, I use NETINFO peer address information to compare against what we are advertising for our router address, and if they disagree, the other side probably won't think we are canonical.
I then changed channel_is_better() to not only prefer older connections, but also prefer connections where we think the peer will decide we are canonical. With these updates to channel_is_better(), connection_or_set_bad_connections() will mark all of these "half-canonical" orcons as bad for circs if we ever have a "full-canonical" option available for use instead. It will also mark younger orcons as bad for circs, as it is actually better to prefer old orcons when defending against Torscan attacks. Orcons will still live for a max of 1 week regardless, though. I did not change that.
Oh, it also turns out that we're already vulnerable to the attack in comment:1, because all a rogue node has to do is list its rogue address in its NETINFO cells, and it gets marked canonical. It is only non-canonical connections that get their real_addr checked by channel_tls_matches_target_method(). Do we care about that? I did not change that behavior in this patch at all. I merely noted the issue with an XXX in the source.
This function is similar to connection_or_set_bad_connections(),and probably could be adapted to replace it, if it was modified to actuallytake action on any of these connections.
Are we waiting to see what it logs before using it to replace connection_or_set_bad_connections()?
Oh, it also turns out that we're already vulnerable to the attack in comment:1, because all a rogue node has to do is list its rogue address in its NETINFO cells, and it gets marked canonical. It is only non-canonical connections that get their real_addr checked by channel_tls_matches_target_method(). Do we care about that? I did not change that behavior in this patch at all. I merely noted the issue with an XXX in the source.
Can we check real_addr for all connections?
Will it take a long time to code up?
Does it impact performance?
And a nitpick:
In check_canonical_channels_callback:
I think public_server_mode(options) is the standard way of saying !options->BridgeRelay && server_mode(options). I think they do the same thing, but it might be worth checking.
How do these changes affect the attack described in #13155 (moved)?
"I can use an extend cell to remotely determine whether two relays have a connection open"
channel_check_for_duplicates() says:
{{{
This function is similar to connection_or_set_bad_connections(),
and probably could be adapted to replace it, if it was modified to actually
take action on any of these connections.
}}}
Are we waiting to see what it logs before using it to replace connection_or_set_bad_connections()?
I think so. That or a switch to a datagram transport, or some other wider effort to completely remove all of the connection_or layer.
Oh, it also turns out that we're already vulnerable to the attack in comment:1, because all a rogue node has to do is list its rogue address in its NETINFO cells, and it gets marked canonical. It is only non-canonical connections that get their real_addr checked by channel_tls_matches_target_method(). Do we care about that? I did not change that behavior in this patch at all. I merely noted the issue with an XXX in the source.
Can we check real_addr for all connections?
Will it take a long time to code up?
Does it impact performance?
I think the main problem is that if we don't allow this netinfo mechanism, we need to find a different way for IPv6 connections to become 'canonical'. If we do care about this (and maybe we do), I think it should probably be a different ticket to change this behavior. The right way to do it probably means checking that the netinfo cell stuff matches at least something from the descriptor. But maybe that will have other issues? Nick or Andrea probably need to chime in on that topic.
And a nitpick:
In check_canonical_channels_callback:
I think public_server_mode(options) is the standard way of saying !options->BridgeRelay && server_mode(options). I think they do the same thing, but it might be worth checking.
Oh, it also turns out that we're already vulnerable to the attack in comment:1, because all a rogue node has to do is list its rogue address in its NETINFO cells, and it gets marked canonical. It is only non-canonical connections that get their real_addr checked by channel_tls_matches_target_method(). Do we care about that? I did not change that behavior in this patch at all. I merely noted the issue with an XXX in the source.
Can we check real_addr for all connections?
Will it take a long time to code up?
Does it impact performance?
I think the main problem is that if we don't allow this netinfo mechanism, we need to find a different way for IPv6 connections to become 'canonical'. If we do care about this (and maybe we do), I think it should probably be a different ticket to change this behavior. The right way to do it probably means checking that the netinfo cell stuff matches at least something from the descriptor. But maybe that will have other issues? Nick or Andrea probably need to chime in on that topic.
I'm working on IPv6 client support at the moment. These sort of complexities are one of the reasons I don't even want to touch the IPv6 server code.
And a nitpick:
In check_canonical_channels_callback:
I think public_server_mode(options) is the standard way of saying !options->BridgeRelay && server_mode(options). I think they do the same thing, but it might be worth checking.
Fixed in another fixup commit.
The fixups all look good.
Are we going to merge this? (Pending comment from Nick or Andrea on canonical IPv6 connections.)
The log message and some comments for the "prefer new connection" connection rotation behavior are in 8736b01c7fe0d599214067e20fe00d78c9f9de81 of mikeperry/netflow_padding-v4_squashed+rebased.
I will not be getting these revised and reviewed this week. I hold out hope for May. Sorry mike. Please let me know whether you want to revise them wrt my handles/timing patches, or whether I should. I'm happy either way.
My current understanding here is that mike means to revise this branch based on other merges we're doing. Moving these to needs_revision in the meantime. Please let me know if I'm incorrect.
Alright. I switched the code over to using the new handle, monotonic timer, and timer wheel abstractions. All unit tests pass without leaks from this code (though the unit tests have grown new memory leaks of their own).
mikeperry/netflow_padding-v6. The commit specific to this bug is 1ce02c50b5accd5059922e980745a43ca4ecfee5.
Trac: Milestone: Tor: 0.2.??? to Tor: 0.2.9.x-final Status: needs_revision to needs_review