Opened 4 years ago

Closed 7 weeks ago

Last modified 10 days ago

#15516 closed enhancement (implemented)

Consider rate-limiting INTRODUCE2 cells when under load

Reported by: special Owned by: dgoulet
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-dos, tor-hs, network-team-roadmap-july, nickm-merge
Cc: yawning Actual Points:
Parent ID: #29999 Points: 10
Reviewer: asn Sponsor: Sponsor27-must

Description

In #15463, we're seeing an effective denial of service against a HS with a flood of introductions. The service falls apart trying to build rendezvous circuits, resulting in 100% CPU usage, many failed circuits, and impact on the guard.

We should consider dropping INTRODUCE2 cells when the HS is under too much load to build rendezvous circuits successfully. It's much better if the HS response in this situation is predictable, instead of hammering at the guard until something falls down.

One option is to add a HSMaxConnectionRate(?) option defining the number of INTRODUCE2 we would accept per 10(?) minutes, maybe with some bursting behavior. It's unclear what a useful default value would be.

We could try to use a heuristic based on when rend circuits start failing, but it's not obvious to me how that would work.

Child Tickets

TicketTypeStatusOwnerSummary
#30687enhancementclosedImplement a generic counter token bucket

Change History (56)

comment:1 Changed 4 years ago by asn

Cc: sjmurdoch yawning added

We should probably queue INTRODUCE2 cells, and act on them the best we can. If the queue grows too big (we are under DoS), we should drop cells enough so that we (and our guard) can handle the load.

This seems like queuing theory stuff, and specifically active queue management. Yawning suggested looking into algorithms like Stochastic Fair Blue and CoDeL .

comment:2 Changed 4 years ago by arma

I'd actually like to some exploration of initial throttling or dropping or queueing at the intro point as well. That was originally meant to be the first line of defense here.

(In a related design, the hs might consider which intro points the intro2 cells are arriving from, and if they're all arriving from one intro point, take that into account.)

comment:3 Changed 4 years ago by nickm

Milestone: Tor: 0.2.7.x-final
Priority: normalmajor

comment:4 Changed 4 years ago by dgoulet

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:5 Changed 4 years ago by nickm

Sponsor: SponsorU

Assigning to SponsorU as a likely anti-dos measure. Could also be SponsorR

comment:6 Changed 4 years ago by nickm

Points: medium

comment:7 Changed 4 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

It is impossible that we will fix all 226 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "new" and tickets, looking for things to move to 0.2.9.

comment:8 Changed 4 years ago by isabela

Sponsor: SponsorUSponsorU-can

comment:9 Changed 4 years ago by nickm

Priority: HighMedium

comment:10 Changed 3 years ago by isabela

Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

tickets market to be removed from milestone 029

comment:11 Changed 3 years ago by nickm

Parent ID: #15463#17293

comment:12 Changed 3 years ago by nickm

Keywords: SponsorU-deferred added
Sponsor: SponsorU-can

Remove the SponsorU status from these items, which we already decided to defer from 0.2.9. add the SponsorU-deferred tag instead in case we ever want to remember which ones these were.

comment:13 Changed 3 years ago by nickm

Keywords: tor-dos added

comment:14 Changed 3 years ago by nickm

Parent ID: #17293

Unparenting these from #17293; holding for future work.

comment:15 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:16 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:17 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:18 Changed 2 years ago by nickm

Keywords: tor-hs added
Severity: Normal

comment:19 Changed 2 years ago by sjmurdoch

Cc: sjmurdoch removed

comment:20 Changed 2 years ago by nickm

Sponsor: SponsorV-can

comment:21 Changed 6 months ago by asn

Points: medium10
Sponsor: SponsorV-canSponsor27-can

Stealing from V to 27 due to relation with the DoS objective.

comment:22 Changed 5 months ago by asn

Parent ID: #26294

comment:23 Changed 5 months ago by asn

Closed #15463 (and children #13738, #13739 #15540) in favor of this ticket.

comment:24 Changed 5 months ago by asn

Summary: Consider dropping INTRODUCE2 cells when under loadConsider rate-limiting INTRODUCE2 cells when under load

comment:25 Changed 5 months ago by asn

Parent ID: #26294

comment:26 Changed 5 months ago by asn

Parent ID: #29999

comment:27 Changed 5 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

Add keyword to tickets in network team's roadmap.

comment:28 in reply to:  2 Changed 5 months ago by arma

Replying to arma:

I'd actually like to some exploration of initial throttling or dropping or queueing at the intro point as well. That was originally meant to be the first line of defense here.

Here's my concrete proposal on this one: the intro point should see if the package window for the intro circuit is empty, and if so, it should nack the intro1 cell. That way there are at most 1000 intro2 cells in flight at once from that intro point.

This design is reasonable because it takes a long while for an onion service to process 1000 intro2 cells, so if we queue later ones and send them 'eventually', they're going to arrive much later, and the client will likely have timed out and moved on from that rendezvous point. So we're not harming legitimate clients who end up in this situation, because the current behavior is already harming them plenty.

The benefits are that (a) the onion service doesn't receive the excess intro2 cells that it wasn't going to be able to rendezvous with anyway, (b) clients get a much faster feedback that things aren't going to work so they can move to another intro point, and (c) when a DoS stops, the pain stops soon after: there isn't a huge queue of waiting intro2 cells that have to slowly drain, for no value.

We could imagine an extension on this idea, where the intro point silently drops the excess intro1 cells, rather than explicitly nacking them. This variant will force the client to time out rather than immediately try the next intro point, thus slowing down attacks by clients that follow the protocol. (Modified clients could still use a smaller timeout, or not even care whether they get a response.)

comment:29 Changed 5 months ago by arma

Another idea I was considering here, but ultimately abandoned as more complex than we need, was to somehow timestamp the intro1 cell when it gets received at the intro point, which would allow the onion service to examine how many seconds have passed and discard it if it's more than n seconds ago. That would essentially mean that we have n *seconds* of valid intro2 cells in flight, rather than at-most-n *circwindows* of intro2 cells in flight. This approach would handle congestion that happens inside the network (between the intro point and the service), in that if it takes a long time for the intro2 cell to make it from the intro point to the onion service, it's less likely that the client is still around and waiting for the connect-back.

But how exactly to do the timestamp, and how and whether we need to synchronize clocks, made this too klunky an idea.

comment:30 Changed 5 months ago by asn

Owner: set to dgoulet
Status: newassigned

comment:31 Changed 5 months ago by dgoulet

Implementing comment:28 proposal: ticket15516_042_01. Unfortunately, this can't work without #30440...

comment:32 Changed 5 months ago by dgoulet

Because #30440 won't be a mature thing in the network for many years to come, we can only use the "package_window" proposal once it is.

So until then, we'll use a token bucket system, add knobs in the consensus (like the dos.c subsystem) and go on from there. Not sure how we are going to come up with the values but they need to be large enough so it doesn't affect legit busy HS.

comment:33 Changed 4 months ago by dgoulet

Plan B activated. Here is the development branch: ticket15516_042_02.

So far working on the testing side. Requires unit tests and most likely better values for the rate/burst of the INTRO2.

I still want to explore the idea of putting these knobs in the ESTABLISH_INTRO cell so an operator can tweak them at the intro point.

comment:34 Changed 4 months ago by dgoulet

Reviewer: asn
Status: assignedneeds_review

The rate and burst values are very arbitrary here. See this thread for the discussion:

https://lists.torproject.org/pipermail/tor-dev/2019-May/013837.html

NOTE: *IGNORE* the token-bucket: commit since this comes from #30687.

Branch: ticket15516_042_02
PR: https://github.com/torproject/tor/pull/1061

comment:35 Changed 4 months ago by asn

I did a review on the code without considering the higher-level design here. I will think more about the numbers and such and reply to the tor-dev thread today or tomorrow.

comment:36 Changed 4 months ago by asn

Status: needs_reviewneeds_revision

comment:37 Changed 3 months ago by gaba

Owner: dgoulet deleted
Status: needs_revisionassigned

dgoulet will assign himself to the ones he is working on right now.

comment:38 Changed 3 months ago by dgoulet

Owner: set to dgoulet
Status: assignedaccepted

comment:39 Changed 3 months ago by dgoulet

I've pushed a series of fixup and squashed them together along with a new commit that adds the consensus parameters discussed prior.

Branch: ticket15516_042_02

Last thing that will need to be confirmed is the rate/burst values accepted from proposal 305.

comment:40 Changed 3 months ago by dgoulet

Status: acceptedneeds_review

comment:41 Changed 3 months ago by asn

Status: needs_reviewneeds_revision

Thanks for the revisions!

I left some more comments to the PR!

comment:42 Changed 3 months ago by dgoulet

Status: needs_revisionneeds_review

Addressed everything I hope!

Note that much of this will get complemented by #30924 (which introduces the torrc options for instance).

comment:43 Changed 3 months ago by asn

Thanks for the updates David! Only a single nit remains on the GH (and maybe also open the tokenbucket ticket so that we don't forget?).

As a further thing: I lost track of the experimental results of this ticket when I went to AllHands. I now don't rememember exactly how this ticket affects (a) the health of the network and (b) the availability of the service. Any chance you could update us on these two thigns in the tor-dev mailing list? I think it would be great to have this documented so that we know what exactly we are doing by merging this patch.

Marking as needs_revision for these last bits of action.

Thanks! :)

comment:44 Changed 3 months ago by asn

Status: needs_reviewneeds_revision

comment:45 in reply to:  43 Changed 3 months ago by dgoulet

Status: needs_revisionneeds_review

Replying to asn:

Thanks for the updates David! Only a single nit remains on the GH (and maybe also open the tokenbucket ticket so that we don't forget?).

Fixed.

New ticket: #31062

As a further thing: I lost track of the experimental results of this ticket when I went to AllHands. I now don't rememember exactly how this ticket affects (a) the health of the network and (b) the availability of the service. Any chance you could update us on these two thigns in the tor-dev mailing list? I think it would be great to have this documented so that we know what exactly we are doing by merging this patch.

Yes I can do this!

With #30924, we'll have a more complete feature and we should at that point probably blog post this entire new defense and how to leverage it.

comment:46 Changed 3 months ago by asn

Status: needs_reviewneeds_revision

Fixes and ticket looks good to me.

Let's get the tor-dev thread going and we can move forward here.

I'm marking this as needs_revision until we get the tor-dev thread so that it does not clobber my review queue.

comment:47 Changed 2 months ago by asn

Sponsor: Sponsor27-canSponsor27-must

comment:48 Changed 2 months ago by gaba

Keywords: network-team-roadmap-july added; SponsorU-deferred network-team-roadmap-2019-Q1Q2 removed

comment:49 Changed 2 months ago by dgoulet

Status: needs_revisionneeds_review

Post Stockholm meeting. Back in needs_review for asn's "go".

comment:50 Changed 7 weeks ago by asn

Status: needs_reviewneeds_revision

Looks good to me!

Unfortunately, I think this is gonna need some revision because of the latest practracker changes that caused conflicts. Marking as needs_revision, and feel free to put to merge_ready after this.

comment:51 Changed 7 weeks ago by dgoulet

Status: needs_revisionmerge_ready

Ok, rebased on master. Final PR: https://github.com/torproject/tor/pull/1208

Branch: ticket15516_042_04

Travis is feeding. But this should be ready for merge.

comment:52 Changed 7 weeks ago by dgoulet

Keywords: nickm-merge added

comment:53 Changed 7 weeks ago by nickm

Resolution: implemented
Status: merge_readyclosed

Merged!

comment:54 Changed 5 weeks ago by cypherpunks

I have deep concerns about this. It may not help against DoS at all, and NACKing in reply rather than dropping may make it worse. Although there are many of of them, the bandwidth consumed by INTRODUCE2 cells is not the main problem. The best defense in practice would likely be as described in https://lists.torproject.org/pipermail/tor-dev/2019-May/013849.html, or that, but modified so it's the service that drops them rather than the intro point. That would allow current unmodified relays to be used as intro points.

comment:55 Changed 5 weeks ago by cypherpunks

Last edited 10 days ago by cypherpunks (previous) (diff)

comment:56 in reply to:  54 Changed 5 weeks ago by asn

Replying to cypherpunks:

I have deep concerns about this. It may not help against DoS at all, and NACKing in reply rather than dropping may make it worse. Although there are many of of them, the bandwidth consumed by INTRODUCE2 cells is not the main problem. The best defense in practice would likely be as described in https://lists.torproject.org/pipermail/tor-dev/2019-May/013849.html, or that, but modified so it's the service that drops them rather than the intro point. That would allow current unmodified relays to be used as intro points.

Hello,

as you say, we doubt that this attack will help restore availability to DoSed onion services. More about this on this old thread: https://lists.torproject.org/pipermail/tor-dev/2019-April/013790.html

I also doubt that the NACK will make things worse for the health of the network since intro points were already sending an ACK anyway. And it will have no impact on the availability of the service either.

Please see ticket #31223 for approaches that will improve availability of the service.

Personally, while I'm cautiously open to PoW approaches, I doubt that they will help against a motivated adversary with a couple of GPUs, except if you also want only GPU clients to be able to visit the service. People who are experts on PoW have told me that they pretty inelegant when it comes to DoS resistance. If you feel the opposite feel free to run the numbers and let us know how it would work. In particular, tell us which PoW scheme, and what kind of users it could support, and what kind of adversaries it could stop, since from what I've seen there is no good combination here. Also, please use the mailing list for such discussions.

In any case if you don't believe in this defence you can still disable it using #30924.

Thanks! :)

Last edited 5 weeks ago by asn (previous) (diff)
Note: See TracTickets for help on using tickets.