Opened 4 years ago

Last modified 5 days ago

#15516 needs_revision enhancement

Consider rate-limiting INTRODUCE2 cells when under load

Reported by: special Owned by: dgoulet
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: SponsorU-deferred, tor-dos, tor-hs, network-team-roadmap-2019-Q1Q2
Cc: yawning Actual Points:
Parent ID: #29999 Points: 10
Reviewer: asn Sponsor: Sponsor27-must

Description

In #15463, we're seeing an effective denial of service against a HS with a flood of introductions. The service falls apart trying to build rendezvous circuits, resulting in 100% CPU usage, many failed circuits, and impact on the guard.

We should consider dropping INTRODUCE2 cells when the HS is under too much load to build rendezvous circuits successfully. It's much better if the HS response in this situation is predictable, instead of hammering at the guard until something falls down.

One option is to add a HSMaxConnectionRate(?) option defining the number of INTRODUCE2 we would accept per 10(?) minutes, maybe with some bursting behavior. It's unclear what a useful default value would be.

We could try to use a heuristic based on when rend circuits start failing, but it's not obvious to me how that would work.

Child Tickets

TicketTypeStatusOwnerSummary
#30687enhancementclosedImplement a generic counter token bucket

Change History (47)

comment:1 Changed 4 years ago by asn

Cc: sjmurdoch yawning added

We should probably queue INTRODUCE2 cells, and act on them the best we can. If the queue grows too big (we are under DoS), we should drop cells enough so that we (and our guard) can handle the load.

This seems like queuing theory stuff, and specifically active queue management. Yawning suggested looking into algorithms like Stochastic Fair Blue and CoDeL .

comment:2 Changed 4 years ago by arma

I'd actually like to some exploration of initial throttling or dropping or queueing at the intro point as well. That was originally meant to be the first line of defense here.

(In a related design, the hs might consider which intro points the intro2 cells are arriving from, and if they're all arriving from one intro point, take that into account.)

comment:3 Changed 4 years ago by nickm

Milestone: Tor: 0.2.7.x-final
Priority: normalmajor

comment:4 Changed 4 years ago by dgoulet

Milestone: Tor: 0.2.7.x-finalTor: 0.2.8.x-final

comment:5 Changed 4 years ago by nickm

Sponsor: SponsorU

Assigning to SponsorU as a likely anti-dos measure. Could also be SponsorR

comment:6 Changed 4 years ago by nickm

Points: medium

comment:7 Changed 3 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

It is impossible that we will fix all 226 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "new" and tickets, looking for things to move to 0.2.9.

comment:8 Changed 3 years ago by isabela

Sponsor: SponsorUSponsorU-can

comment:9 Changed 3 years ago by nickm

Priority: HighMedium

comment:10 Changed 3 years ago by isabela

Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

tickets market to be removed from milestone 029

comment:11 Changed 3 years ago by nickm

Parent ID: #15463#17293

comment:12 Changed 3 years ago by nickm

Keywords: SponsorU-deferred added
Sponsor: SponsorU-can

Remove the SponsorU status from these items, which we already decided to defer from 0.2.9. add the SponsorU-deferred tag instead in case we ever want to remember which ones these were.

comment:13 Changed 3 years ago by nickm

Keywords: tor-dos added

comment:14 Changed 3 years ago by nickm

Parent ID: #17293

Unparenting these from #17293; holding for future work.

comment:15 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:16 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:17 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:18 Changed 2 years ago by nickm

Keywords: tor-hs added
Severity: Normal

comment:19 Changed 2 years ago by sjmurdoch

Cc: sjmurdoch removed

comment:20 Changed 2 years ago by nickm

Sponsor: SponsorV-can

comment:21 Changed 4 months ago by asn

Points: medium10
Sponsor: SponsorV-canSponsor27-can

Stealing from V to 27 due to relation with the DoS objective.

comment:22 Changed 3 months ago by asn

Parent ID: #26294

comment:23 Changed 3 months ago by asn

Closed #15463 (and children #13738, #13739 #15540) in favor of this ticket.

comment:24 Changed 3 months ago by asn

Summary: Consider dropping INTRODUCE2 cells when under loadConsider rate-limiting INTRODUCE2 cells when under load

comment:25 Changed 3 months ago by asn

Parent ID: #26294

comment:26 Changed 3 months ago by asn

Parent ID: #29999

comment:27 Changed 3 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

Add keyword to tickets in network team's roadmap.

comment:28 in reply to:  2 Changed 3 months ago by arma

Replying to arma:

I'd actually like to some exploration of initial throttling or dropping or queueing at the intro point as well. That was originally meant to be the first line of defense here.

Here's my concrete proposal on this one: the intro point should see if the package window for the intro circuit is empty, and if so, it should nack the intro1 cell. That way there are at most 1000 intro2 cells in flight at once from that intro point.

This design is reasonable because it takes a long while for an onion service to process 1000 intro2 cells, so if we queue later ones and send them 'eventually', they're going to arrive much later, and the client will likely have timed out and moved on from that rendezvous point. So we're not harming legitimate clients who end up in this situation, because the current behavior is already harming them plenty.

The benefits are that (a) the onion service doesn't receive the excess intro2 cells that it wasn't going to be able to rendezvous with anyway, (b) clients get a much faster feedback that things aren't going to work so they can move to another intro point, and (c) when a DoS stops, the pain stops soon after: there isn't a huge queue of waiting intro2 cells that have to slowly drain, for no value.

We could imagine an extension on this idea, where the intro point silently drops the excess intro1 cells, rather than explicitly nacking them. This variant will force the client to time out rather than immediately try the next intro point, thus slowing down attacks by clients that follow the protocol. (Modified clients could still use a smaller timeout, or not even care whether they get a response.)

comment:29 Changed 3 months ago by arma

Another idea I was considering here, but ultimately abandoned as more complex than we need, was to somehow timestamp the intro1 cell when it gets received at the intro point, which would allow the onion service to examine how many seconds have passed and discard it if it's more than n seconds ago. That would essentially mean that we have n *seconds* of valid intro2 cells in flight, rather than at-most-n *circwindows* of intro2 cells in flight. This approach would handle congestion that happens inside the network (between the intro point and the service), in that if it takes a long time for the intro2 cell to make it from the intro point to the onion service, it's less likely that the client is still around and waiting for the connect-back.

But how exactly to do the timestamp, and how and whether we need to synchronize clocks, made this too klunky an idea.

comment:30 Changed 3 months ago by asn

Owner: set to dgoulet
Status: newassigned

comment:31 Changed 2 months ago by dgoulet

Implementing comment:28 proposal: ticket15516_042_01. Unfortunately, this can't work without #30440...

comment:32 Changed 2 months ago by dgoulet

Because #30440 won't be a mature thing in the network for many years to come, we can only use the "package_window" proposal once it is.

So until then, we'll use a token bucket system, add knobs in the consensus (like the dos.c subsystem) and go on from there. Not sure how we are going to come up with the values but they need to be large enough so it doesn't affect legit busy HS.

comment:33 Changed 7 weeks ago by dgoulet

Plan B activated. Here is the development branch: ticket15516_042_02.

So far working on the testing side. Requires unit tests and most likely better values for the rate/burst of the INTRO2.

I still want to explore the idea of putting these knobs in the ESTABLISH_INTRO cell so an operator can tweak them at the intro point.

comment:34 Changed 7 weeks ago by dgoulet

Reviewer: asn
Status: assignedneeds_review

The rate and burst values are very arbitrary here. See this thread for the discussion:

https://lists.torproject.org/pipermail/tor-dev/2019-May/013837.html

NOTE: *IGNORE* the token-bucket: commit since this comes from #30687.

Branch: ticket15516_042_02
PR: https://github.com/torproject/tor/pull/1061

comment:35 Changed 6 weeks ago by asn

I did a review on the code without considering the higher-level design here. I will think more about the numbers and such and reply to the tor-dev thread today or tomorrow.

comment:36 Changed 6 weeks ago by asn

Status: needs_reviewneeds_revision

comment:37 Changed 5 weeks ago by gaba

Owner: dgoulet deleted
Status: needs_revisionassigned

dgoulet will assign himself to the ones he is working on right now.

comment:38 Changed 5 weeks ago by dgoulet

Owner: set to dgoulet
Status: assignedaccepted

comment:39 Changed 4 weeks ago by dgoulet

I've pushed a series of fixup and squashed them together along with a new commit that adds the consensus parameters discussed prior.

Branch: ticket15516_042_02

Last thing that will need to be confirmed is the rate/burst values accepted from proposal 305.

comment:40 Changed 3 weeks ago by dgoulet

Status: acceptedneeds_review

comment:41 Changed 3 weeks ago by asn

Status: needs_reviewneeds_revision

Thanks for the revisions!

I left some more comments to the PR!

comment:42 Changed 3 weeks ago by dgoulet

Status: needs_revisionneeds_review

Addressed everything I hope!

Note that much of this will get complemented by #30924 (which introduces the torrc options for instance).

comment:43 Changed 2 weeks ago by asn

Thanks for the updates David! Only a single nit remains on the GH (and maybe also open the tokenbucket ticket so that we don't forget?).

As a further thing: I lost track of the experimental results of this ticket when I went to AllHands. I now don't rememember exactly how this ticket affects (a) the health of the network and (b) the availability of the service. Any chance you could update us on these two thigns in the tor-dev mailing list? I think it would be great to have this documented so that we know what exactly we are doing by merging this patch.

Marking as needs_revision for these last bits of action.

Thanks! :)

comment:44 Changed 2 weeks ago by asn

Status: needs_reviewneeds_revision

comment:45 in reply to:  43 Changed 2 weeks ago by dgoulet

Status: needs_revisionneeds_review

Replying to asn:

Thanks for the updates David! Only a single nit remains on the GH (and maybe also open the tokenbucket ticket so that we don't forget?).

Fixed.

New ticket: #31062

As a further thing: I lost track of the experimental results of this ticket when I went to AllHands. I now don't rememember exactly how this ticket affects (a) the health of the network and (b) the availability of the service. Any chance you could update us on these two thigns in the tor-dev mailing list? I think it would be great to have this documented so that we know what exactly we are doing by merging this patch.

Yes I can do this!

With #30924, we'll have a more complete feature and we should at that point probably blog post this entire new defense and how to leverage it.

comment:46 Changed 2 weeks ago by asn

Status: needs_reviewneeds_revision

Fixes and ticket looks good to me.

Let's get the tor-dev thread going and we can move forward here.

I'm marking this as needs_revision until we get the tor-dev thread so that it does not clobber my review queue.

comment:47 Changed 5 days ago by asn

Sponsor: Sponsor27-canSponsor27-must
Note: See TracTickets for help on using tickets.