RFE additional DOS mitigations for exits
A relay I operate recently experienced a DOS state resulting from intense scanning behavior. The scanner initiated huge quantities of connections outbound on an exit such that the interface maximum configured socket count (62k) was fully consumed and normal client activity was squashed to zero. Load was so intense it was difficult to SSH in, NTP complained it could not reach time servers and numerous attempts were required to successfully open a daemon control socket (via loopback, not sure why). Was able to mitigate the attack without restarting any daemons and nothing broke, node resumed normal operation. Clearly a recoverable resource exhaustion scenario.
To limit the impact of this category of activity, two relatively simple mitigations come to mind:
-
create a configurable limit on the number of OR + DIR + exit_edge connections on each interface which may be set lower than absolute resource limits; this will prevent a DOS situation from rendering the overall system inaccessible and hopefully permit unimpaired daemon control ports creation; the setting will interact with the maximum number of in-flight DNS queries when a local resolver is configured and this ought to be documented
-
create a outbound exit_edge connection rate limit set to some reasonable value to constrain scanning
NOTES:
file handle limit 128k
nf_conntrack_max = 65536