Hi! I have some comments on the code, but before I get to them, we should talk about the approach.
The new algorithm seems to be
Always keep OR-to-OR connections; always keep directory connections. Only inspect client-to-guard and exit connections.
When discarding connections, discard those that were created most recently.
Is that right? If so, I wonder if there is some way that attacker can exploit this by making a bunch of directory connections, if our directory port is open. Maybe we should consider CONN_TYPE_DIR as well.
I also wonder if the attacker can reduce our number of available sockets by simply attempting a socket exhaustion attack. We'll kill off some of their connections, but we won't kill them all. If the attacker preserves the ones that we don't kill, they will always survive instead of any newer connections that we receive in the future. Can we do any better than this?
(Once we're in agreement here, we should describe the algorithm we want to follow in a patch to tor-spec.txt, so that the correct behavior is documented.)
Is that right? If so, I wonder if there is some way that attacker can exploit this by making a bunch of directory connections, if our directory port is open. Maybe we should consider CONN_TYPE_DIR as well.
Good point. I could rework it to sort OR-OR connections to a low priority band, the rest in a high band without excluding any category except listeners. Connection age sort within bands. Open to your thoughts on this.
I also wonder if the attacker can reduce our number of available sockets by simply attempting a socket exhaustion attack. We'll kill off some of their connections, but we won't kill them all. If the attacker preserves the ones that we don't kill, they will always survive instead of any newer connections that we receive in the future. Can we do any better than this?
Have further work where underlying circuits are killed rather than connections. Do you see that as improving this issue? And dynamic configuration of the limits for threshold min, max and soft nofiles.
(Once we're in agreement here, we should describe the algorithm we want to follow in a patch to tor-spec.txt, so that the correct behavior is documented.)
two bands: [OR-OR] and [client-OR, exit, dir]; listeners exempt
but OR-OR connections with a weight lower than a configuration
(in consensus and locally) threshold, perhaps default 100
go in the not OR-OR band, i.e. new/trivial relays do not qualify
OR-OR band, lower kill priority
sorted connection age * weight with lower value higher kill priority
non OR-OR band, higher kill priority
sorted by connection age with newer at higher kill priority
I'm thinking that this sounds like it's getting closer to plausible, but I'd want to see pseudocode to be sure. I don't understand how killing circuits instead of connections would help with socket exhaustion, though.
I'm still wondering about the attack where the attacker reduces the number of available sockets. Not sure how bad that would actually be.
Instead of writing pseudo code I implemented the algorithm described. Down to neatness-counts items, replacing hard-code constants with configs etc. Busy at work and it might take a couple more weeks--can post current state if desired.
enhancements to connection limit logic ?support for random upper threshold in a range configurable log level for both "Recomputed OOS thresholds" and OOS event messages emit warnings when config thresholds ignored (don't change) enhanced sort ~core logic [complete] report configurable faint relay consensus threshold ~conn stats on one line [complete] ~enhance eligible-to-kill count stats, include all categories [complete] possibly redundant commit "eliminate OOS kill duplicate circuit mark-closed warnings" written before push of oos_victim set via circuit connection iterate i.e. "if (c->oos_victim) continue;"
The git-am branch in the attachment is off ff9313340; rebased last weekend. 0.4.2.5 is slightly different now with an alternate "configurable parameters apply/revert logic for OOS handler".
still have a few minor loose ends to tie-up (per todo above); rather than a comprehensive pre-merge review at this stage an examination of the pick_oos_victims() algorithm rewrite will be helpful and appreciated
So as I understand it, the proposed new algorithm is:
Consider only edge connections and OR connections with no identity set.
Close the newest N such connections, until we have regained enough sockets.
There are some problems here that we should think about. They all stem from the fact that an attacker is not required to do the kind of DoS attack that we expect: the attacker will know our algorithm, so they can adjust their attack to work around it.
If we have a DirPort open, the attacker can open connections to our DirPort: so we should also consider DirPort connections that have sockets set. (The fix for this one is easy: just check Directory connections too.)
Checking whether a connection's identity_digest is zero will not always do what we want. First, bridges do not set their identity digest, even though a bridge may have circuits from multiple users. Second, any client can pretend to be a relay and provide authentication when it connects to us, thereby setting an identity digest. (This one is harder to fix: we could look for relays that are in the consensus, but a relay that is not in the consensus might just be a new one that we don't know about yet. I don't know a supported way to detect bridges -- there isn't supposed to be one, really. We could look at the number of circuits, perhaps?)
The attacker is not required to flood us with connections: they can send a trickle instead. Instead of opening a whole bunch of connections at once, the attacker can open a new connection every 5 minutes. This will still eat up all of our sockets over time, but when we go to close the newest ones, the attacker will still have a bunch of our capacity. (I do not know the right fix for this. We could randomize the algorithm, I guess?)
So as I understand it, the proposed new algorithm is:
Consider only edge connections and OR connections with no identity set.
Close the newest N such connections, until we have regained enough sockets.
There are some problems here that we should think about. They all stem from the fact that an attacker is not required to do the kind of DoS attack that we expect: the attacker will know our algorithm, so they can adjust their attack to work around it.
If we have a DirPort open, the attacker can open connections to our DirPort: so we should also consider DirPort connections that have sockets set. (The fix for this one is easy: just check Directory connections too.)
Checking whether a connection's identity_digest is zero will not always do what we want. First, bridges do not set their identity digest, even though a bridge may have circuits from multiple users. Second, any client can pretend to be a relay and provide authentication when it connects to us, thereby setting an identity digest. (This one is harder to fix: we could look for relays that are in the consensus, but a relay that is not in the consensus might just be a new one that we don't know about yet. I don't know a supported way to detect bridges -- there isn't supposed to be one, really. We could look at the number of circuits, perhaps?)
It's also worth thinking about onion services and single onion services here. A busy onion service may look similar to a bridge, from the perspective of the upstream hop: both open lots of circuits.
Also, bridges and onion services can experience a socket DoS, too. We should think about how this algorithm might work for them, even if we don't activate it right now.
The attacker is not required to flood us with connections: they can send a trickle instead. Instead of opening a whole bunch of connections at once, the attacker can open a new connection every 5 minutes. This will still eat up all of our sockets over time, but when we go to close the newest ones, the attacker will still have a bunch of our capacity. (I do not know the right fix for this. We could randomize the algorithm, I guess?)
I think randomising the sockets we close is the hardest algorithm to exploit, because the attacker can't know which sockets were going to close next.
We may want to assign a lower probability to sockets that we have recently opened to fetch directory documents, and connections on which we are currently fetching directory documents. (Attackers can occupy these sockets using a slowloris attack, so we should still be prepared to close them, if we have a lot of them open.)
We should also assign a threshold value, so we keep a few directory sockets. (150 seems like a good threshold for relays, because they do approximately 7000 relays / 96 descriptors per request * 2 requests for descriptors, when they don't have any cached descriptors.)
Remember, relays can use remote DirPorts and ORPorts for directory fetches, the code should handle both.
We should also try to think of any other kinds of essential sockets, that we don't want to close.
If we have a DirPort open, the attacker can open connections to our DirPort: so we should also consider DirPort connections that have sockets set. (The fix for this one is easy: just check Directory connections too.)
hi! This is implemented. . .can add more comments to pick_oos_victims() if desired.
Checking whether a connection's identity_digest is zero will not always do what we want. First, bridges do not set their identity digest, even though a bridge may have circuits from multiple users.
ok, how to tell if it's a bridge?
Second, any client can pretend to be a relay and provide authentication when it connects to us, thereby setting an identity digest. (This one is harder to fix: we could look for relays that are in the consensus, but a relay that is not in the consensus might just be a new one that we don't know about yet. I don't know a supported way to detect bridges -- there isn't supposed to be one, really. We could look at the number of circuits, perhaps?)
but can "any client" set the digest and be in the consensus with a stable flag, a cbw 500 or higher? Perhaps I should add some more comments.
The attacker is not required to flood us with connections: they can send a trickle instead. Instead of opening a whole bunch of connections at once, the attacker can open a new connection every 5 minutes. This will still eat up all of our sockets over time, but when we go to close the newest ones, the attacker will still have a bunch of our capacity. (I do not know the right fix for this. We could randomize the algorithm, I guess?)
Adding randomness while retaining some degree of time priority in band A, age*cbw in band B makes sense to me.
It's also worth thinking about onion services and single onion services here. A busy onion service may look similar to a bridge, from the perspective of the upstream hop: both open lots of circuits.
Will appreciate some big picture help on how to figure bridges and single onions services, can drill into the details on my own if I have a general picture.
Also, bridges and onion services can experience a socket DoS, too. We should think about how this algorithm might work for them, even if we don't activate it right now.
ok
I think randomising the sockets we close is the hardest algorithm to exploit, because the attacker can't know which sockets were going to close next.
sure, agree next comment above
We may want to assign a lower probability to sockets that we have recently opened to fetch directory documents, and connections on which we are currently fetching directory documents. (Attackers can occupy these sockets using a slowloris attack, so we should still be prepared to close them, if we have a lot of them open.)
something like a time-decaying rate as a negative priority factor, with a countervailing longer-horizon-and-higher-consumption positive priority factor?
We should also assign a threshold value, so we keep a few directory sockets. (150 seems like a good threshold for relays, because they do approximately 7000 relays / 96 descriptors per request * 2 requests for descriptors, when they don't have any cached descriptors.)
Have some hard data that suggest this may be unnecessary, can share privately.
Remember, relays can use remote DirPorts and ORPorts for directory fetches, the code should handle both.
Again, a few big picture hints on how to figure will help.
We should also try to think of any other kinds of essential sockets, that we don't want to close.
In the current implementation, only OR, DIR and EXIT connection types are considered--all other types are exempt.
Please take a fifteen minutes to read the one function pick_oos_victims().
I wrote this as quick mitigation to an issue, and it works (very well) in combination with some other mitigations. I've thought about it and do not necessarily see it as particularly great, just way way better then what it replaces and I don't advocate activating OOS by default. The supporting changes to permit dynamic configuration of limits are nice.
I have a bunch of much better and more important ideas I want to pursue and don't want to spend a whole lot more effort on this one. Please keep this in mind. I'm willing to improve it marginally, but if it turns into a time sink you've lost me.
It's also worth thinking about onion services and single onion services here. A busy onion service may look similar to a bridge, from the perspective of the upstream hop: both open lots of circuits.
Will appreciate some big picture help on how to figure bridges and single onions services, can drill into the details on my own if I have a general picture.
Bridges and onion services try to look like clients, for anonymity reasons. If you find a reliable distinguisher, we'll try to fix it, because it's a security issue:
Checking whether a connection's identity_digest is zero will not always do what we want. First, bridges do not set their identity digest, even though a bridge may have circuits from multiple users.
However, busy bridges and onion services should only have one connection to your relay. So they shouldn't be taking up very many sockets at all.
I think it's ok to have a bucket that's [client-OR (including onion services, bridges), exit, dir, OR-OR low consensus weight]. We just need to document where the onion services and bridges go, so people don't assume they're protected (like [OR-OR good consensus weight]).
Also, bridges and onion services can experience a socket DoS, too. We should think about how this algorithm might work for them, even if we don't activate it right now.
Bridges can use two buckets: [bridge-OR outbound] and [OR-bridge inbound, clients, onion services]. Bridges don't support exiting or DirPorts. OR-bridge inbound connections are reachability circuits, or a DoS via another relay. So they are not important.
Onion services shouldn't have socket issues, because they use guards.
Single onion services could also use two buckets: [long-term intro, directory fetches, HSDir posts] and [rendezvous]. Rendezvous connections are a big DoS risk. Keeping the long-term intro connections, directory fetches, and HSDir posts is important to keep the service online.
You don't have to make these changes, but the code should be designed so it's easy to change the way we filter connections. (You don't have to do a redesign, either - we can do that if we decided to merge.)
We may want to assign a lower probability to sockets that we have recently opened to fetch directory documents, and connections on which we are currently fetching directory documents. (Attackers can occupy these sockets using a slowloris attack, so we should still be prepared to close them, if we have a lot of them open.)
something like a time-decaying rate as a negative priority factor, with a countervailing longer-horizon-and-higher-consumption positive priority factor?
Directory fetches will either be OR-OR, or be an outbound directory fetch.
So we could do:
[OR-OR good consensus weight, outbound directory fetches], and [client-OR (including onion services, bridges), exit, inbound DirPort, OR-OR low consensus weight].
Remember, relays can use remote DirPorts and ORPorts for directory fetches, the code should handle both.
Again, a few big picture hints on how to figure will help.
I think dir_connection_t.dirconn_direct is pretty much what you want here: