Opened 6 months ago

Closed 4 months ago

Last modified 4 months ago

#25226 closed defect (implemented)

Circuit cell queue can fill up memory

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-cell, tor-relay, tor-dos, 033-must, review-group-34, security, 033-triage-20180320, 033-included-20180320
Cc: sjcjonker@…, mikeperry Actual Points:
Parent ID: Points:
Reviewer: arma Sponsor:

Description

A relay operator just reported this on 0.3.3.2-alpha:

https://lists.torproject.org/pipermail/tor-relays/2018-February/014496.html

In a nutshell, the OOM fired up with these logs:

Feb 12 18:54:55 tornode2 Tor[6362]: We're low on memory (cell queues total alloc: 1602579792 buffer total alloc: 1388544, tor compress total alloc: 1586784 rendezvous cache total alloc: 489909). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)
Feb 12 18:54:56 tornode2 Tor[6362]: Removed 1599323088 bytes by killing 1 circuits; 39546 circuits remain alive. Also killed 0 non-linked directory connections.

Notice the ~1GB of cells for one single circuit? Somehow, there is an issue in tor that makes it possible to fill up the circuit cell queue while the scheduler is just not emptying that queue.

This really looks like the Sniper Attack: http://www.robgjansen.com/publications/sniper-ndss2014.pdf

Child Tickets

Change History (38)

comment:1 Changed 6 months ago by dgoulet

Status: newneeds_information

This is *not* trivial so I'll try to explain what I can see:

In append_cell_to_circuit_queue(), there is a check on the cell queue for a maximum size which then makes the circuit to stop reading on the connection if reached:

  /* If we have too many cells on the circuit, we should stop reading from
   * the edge streams for a while. */
  if (!streams_blocked && queue->n >= CELL_QUEUE_HIGHWATER_SIZE)
    set_streams_blocked_on_circ(circ, chan, 1, 0); /* block streams */

In set_streams_blocked_on_circ() there is this non documented non trivial if as the very first thing it does:

  if (circ->n_chan == chan) {
    circ->streams_blocked_on_n_chan = block;
    if (CIRCUIT_IS_ORIGIN(circ))
      edge = TO_ORIGIN_CIRCUIT(circ)->p_streams;
  } else {
    circ->streams_blocked_on_p_chan = block;
    tor_assert(!CIRCUIT_IS_ORIGIN(circ));
    edge = TO_OR_CIRCUIT(circ)->n_streams;
  }

  [stop reading on all the "edge" streams]

Lets use the example where we are a Guard and we have an or_circuit_t with the "p_chan" being the client connection and the "n_chan" being the connection to the middle node.

If we set the block on n_chan, we would only stop the read() if the circuit is origin because p_streams is only set on a tor client.

Does this means that if we try to deliver a cell forward (n_chan), even though we've reached the high limit, we'll still queue it because there is never a time where we check if we are blocked on "n_chan" at the relay level?

comment:2 Changed 6 months ago by sjcjonker

Cc: sjcjonker@… added

comment:3 Changed 6 months ago by cypherpunks

grep Killing results:

Jan 28 00:02:49 We're low on memory.  Killing circuits with over-long queues. (This behavior is controlled by MaxMemInQueues.)
Jan 28 00:02:49 Removed 182051760 bytes by killing 1 circuits; 36129 circuits remain alive. Also killed 0 non-linked directory connections.
Jan 27 09:03:08 We're low on memory.  Killing circuits with over-long queues. (This behavior is controlled by MaxMemInQueues.)
Jan 27 09:03:08 Removed 95846784 bytes by killing 9 circuits; 36814 circuits remain alive. Also killed 0 non-linked directory connections.
Jan 26 20:28:13 We're low on memory.  Killing circuits with over-long queues. (This behavior is controlled by MaxMemInQueues.)
Jan 26 20:28:13 Removed 178708464 bytes by killing 2 circuits; 38905 circuits remain alive. Also killed 0 non-linked directory connections.
Jan 23 07:00:33 We're low on memory.  Killing circuits with over-long queues. (This behavior is controlled by MaxMemInQueues.)
Jan 23 07:00:33  Removed 72245184 bytes by killing 18 circuits; 20919 circuits remain alive. Also killed 0 non-linked directory connections.

comment:4 Changed 6 months ago by dgoulet

Owner: set to dgoulet
Status: needs_informationaccepted

comment:5 Changed 6 months ago by dgoulet

Per a discussion with armadev on IRC, this situation is unfortunately possible and the only defense we have in place is the OOM handler.

We should thought of having a upper limit on any given circuit queue which would indicate that we queued stuff but weren't suppose if the tor client was acting normally.

comment:6 Changed 6 months ago by sjcjonker

Just an FYI, the frequency seems to increase. Which I think has a negative effect for the entire network, as in the times leading up to this the CPU usage doubles (possibly higher, but hitting max of VM) the node becomes sluggish and so does the network performance for this host.

So possibly it's language, but the status and previous comment seems to indicate this won't be fixed? Should the operator then tweak the MaxMemInQueues by the # of expected circuits times X KiB to assure this doesn't have such an impact?

Feb 12 18:54:56 tornode2 Tor[6362]: Removed 1599323088 bytes by killing 1 circuits; 39546 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 14 18:46:29 tornode2 Tor[13637]: Removed 519278496 bytes by killing 1 circuits; 122786 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 14 19:04:21 tornode2 Tor[13637]: Removed 785149728 bytes by killing 1 circuits; 111745 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 14 19:46:44 tornode2 Tor[13637]: Removed 1510173456 bytes by killing 1 circuits; 98098 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 15 14:14:18 tornode1 Tor[23044]: Removed 1360147536 bytes by killing 1 circuits; 58922 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 16 01:58:55 tornode1 Tor[23044]: Removed 509234880 bytes by killing 4 circuits; 57780 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 16 02:07:22 tornode1 Tor[23044]: Removed 699879840 bytes by killing 1 circuits; 53007 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 16 03:05:34 tornode1 Tor[23044]: Removed 1593724176 bytes by killing 2 circuits; 52509 circuits remain alive. Also killed 0 non-linked directory connections.

comment:7 Changed 6 months ago by nickm

Keywords: 033-maybe-must added

Mark some tickets as possibly belonging in 033-must.

comment:8 Changed 6 months ago by nickm

Keywords: 033-must added; 033-maybe-must removed

move 033-maybe-must into 033-must

comment:9 Changed 6 months ago by dgoulet

Status: acceptedneeds_review

An upper limit on the circuit queue bound with the SENDME logic would look like this: bug25226_033_01

The gist is that if the queue size goes above the circuit window start maximum limit (1000), the circuit is closed with TORPROTOCOL reason. Assuming we ever reach that limit, it means something is wrong in the path and the edge connection keeps sending stuff even though it shouldn't.

comment:10 Changed 6 months ago by nickm

Note from IRC: I like the idea here as a band-aid. I think that the maximum should be configured via a consensus param, though, and default to something a bit higher than 1000?

Also I really want to know what Roger thinks of that idea.

In the long term, we're not going to get long-term fixes for the performance issues caused by being low on memory of this until we can better identify memory pressure: either by setting MaxMemInQueues more aggressively, working with the OS/allocator somehow, asking the user to configure MaxMemInQueues to a lower value, or something like that.

comment:11 in reply to:  9 ; Changed 6 months ago by cypherpunks

Replying to dgoulet:

An upper limit on the circuit queue bound with the SENDME logic would look like this: bug25226_033_01

The gist is that if the queue size goes above the circuit window start maximum limit (1000), the circuit is closed with TORPROTOCOL reason. Assuming we ever reach that limit, it means something is wrong in the path and the edge connection keeps sending stuff even though it shouldn't.

tor-spec.txt:

To control a circuit's bandwidth usage, each OR keeps track of two 'windows', consisting of how many RELAY_DATA cells it is allowed to originate (package for transmission), and how many RELAY_DATA cells it is willing to consume (receive for local streams). These limits do not apply to cells that the OR receives from one host and relays to another.

tor-design.pdf:

Leaky-pipe circuit topology: Through in-band signaling within the circuit, Tor initiators can direct traffic to nodes partway down the circuit. This novel approach allows traffic to exit the circuit from the middle — possibly frustrating traffic shape and volume attacks based on observing the end of the circuit. (It also allows for long-range padding if future research shows this to be worthwhile.)

comment:12 in reply to:  11 Changed 6 months ago by teor

Replying to cypherpunks:

Replying to dgoulet:

An upper limit on the circuit queue bound with the SENDME logic would look like this: bug25226_033_01

The gist is that if the queue size goes above the circuit window start maximum limit (1000), the circuit is closed with TORPROTOCOL reason. Assuming we ever reach that limit, it means something is wrong in the path and the edge connection keeps sending stuff even though it shouldn't.

tor-spec.txt:

To control a circuit's bandwidth usage, each OR keeps track of two 'windows', consisting of how many RELAY_DATA cells it is allowed to originate (package for transmission), and how many RELAY_DATA cells it is willing to consume (receive for local streams). These limits do not apply to cells that the OR receives from one host and relays to another.

We can change this part of the spec.

tor-design.pdf:

Leaky-pipe circuit topology: Through in-band signaling within the circuit, Tor initiators can direct traffic to nodes partway down the circuit. This novel approach allows traffic to exit the circuit from the middle — possibly frustrating traffic shape and volume attacks based on observing the end of the circuit. (It also allows for long-range padding if future research shows this to be worthwhile.)

Standard Tor clients do not use this feature, but they get close:

  • connection netflow padding is at link level, not circuit level
  • client introduce circuits send a single INTRODUCE1 cell, then EXTEND if the introduction fails

comment:13 Changed 6 months ago by cypherpunks

We can change this part of the spec.

¯\_(ツ)_/¯

comment:14 Changed 5 months ago by nickm

Keywords: review-group-34 added

comment:15 Changed 5 months ago by dgoulet

Status: needs_reviewassigned

comment:16 Changed 5 months ago by nickm

Keywords: security added

comment:17 Changed 5 months ago by nickm

Keywords: 033-triage-20180320 added

Marking all tickets reached by current round of 033 triage.

comment:18 Changed 5 months ago by nickm

Keywords: 033-included-20180320 added

Mark 033-must tickets as triaged-in for 0.3.3

comment:19 Changed 5 months ago by dgoulet

This has been discussed at the Rome meeting with arma and Robjansen.

The gist is that tor does keep a rolling count of circuit level SENDMEs at each hop on the circuit because the tor protocol allow the client to exit at any hop on the circuit.

However, normal circumstances, it is end-to-end from client to Exit. Considering a the top limit of 8 hops per circuits, the circuit cell queue should be at most 8000 cells (circ window start * 8 hops).

I'll submit a new version of the branch for this.

comment:20 Changed 5 months ago by dgoulet

Datapoint: #9072

comment:21 Changed 5 months ago by dgoulet

Reviewer: armadev
Status: assignedneeds_review

As an attempt, see branch: bug25226_033_01.

I think we still need to figure out possibly a better default value or at the very least a consensus parameter that makes sense.

comment:22 Changed 5 months ago by dgoulet

Reviewer: armadevarma

comment:23 in reply to:  1 Changed 5 months ago by arma

Replying to dgoulet:

This is *not* trivial so I'll try to explain what I can see:

In append_cell_to_circuit_queue(), there is a check on the cell queue for a maximum size which then makes the circuit to stop reading on the connection if reached:

Careful here! You're right, but by "connection" you mean edge connection, or stream. Not TLS ("OR") connection.

Lets use the example where we are a Guard and we have an or_circuit_t with the "p_chan" being the client connection and the "n_chan" being the connection to the middle node.

If we set the block on n_chan, we would only stop the read() if the circuit is origin because p_streams is only set on a tor client.

Correct. That whole set_streams_blocked_on_circ() business is for *streams*, i.e. edge connections. Specifically, origin streams when we're a client, or exit streams when we're an exit.

Does this means that if we try to deliver a cell forward (n_chan), even though we've reached the high limit, we'll still queue it because there is never a time where we check if we are blocked on "n_chan" at the relay level?

Yes.

comment:24 in reply to:  20 ; Changed 5 months ago by arma

Cc: mikeperry added

Replying to dgoulet:

Datapoint: #9072

This old ticket is really important here, because there was an earlier ticket (#9063) that proposed to limit the number of cells in-flight on a circuit, and #9072 is arguing that it opens up a bad guard discovery attack.

Mind you, when I opened #9072, it was simply about how our protocol actually allows a huge number of cells in-flight, because non-data cells don't count in the sendme windows, so there is no easy small number where if a circuit goes over you know it's violating protocol.

Mike was the one who retitled my ticket to be about guard discovery attacks. He says:

The attack enabled by #9063 is extremely similar to the Guard discovery attack from rpw's paper.

I wonder what the guard discovery attack actually is? Nobody says what it is on that ticket.

So it would seem smart to reconstruct what it was we were worried about, to figure out if we should still be worried.

I am bringing Mikeperry back into this ticket so he can channel his five-year-earlier self and tell us what the danger is.

comment:25 in reply to:  24 Changed 5 months ago by arma

Replying to arma:

our protocol actually allows a huge number of cells in-flight, because non-data cells don't count in the sendme windows

One of the proposed longer term fixes here is to make all kinds of relay cells count in the sendme windows. Then you wouldn't have these weird side channel issues.

The first problem to solve if we want to do that is the deadlock question: if both sides have exhausted their package window, then nobody can acknowledge anything, and they're stuck. This one could be solved by making only sendme cells be an exception to sendme windows, which still lets us keep to a quite limited number of in-flight cells.

After that one, there might be other issues, like "I can't send these end cells until I've gotten a sendme from you", or "I can't send this begin cell until I've gotten a sendme", which aren't the end of the world but might produce some surprising usability issues.

It's worth exploring more in case we can convince ourselves the benefits outweigh the costs!

comment:26 in reply to:  24 Changed 5 months ago by mikeperry

Replying to arma:

Replying to dgoulet:

Datapoint: #9072

This old ticket is really important here, because there was an earlier ticket (#9063) that proposed to limit the number of cells in-flight on a circuit, and #9072 is arguing that it opens up a bad guard discovery attack.

Mind you, when I opened #9072, it was simply about how our protocol actually allows a huge number of cells in-flight, because non-data cells don't count in the sendme windows, so there is no easy small number where if a circuit goes over you know it's violating protocol.

Mike was the one who retitled my ticket to be about guard discovery attacks. He says:

The attack enabled by #9063 is extremely similar to the Guard discovery attack from rpw's paper.

I wonder what the guard discovery attack actually is? Nobody says what it is on that ticket.

So it would seem smart to reconstruct what it was we were worried about, to figure out if we should still be worried.

I am bringing Mikeperry back into this ticket so he can channel his five-year-earlier self and tell us what the danger is.

The attack in #9072 was due to allowing a small, fixed amount of stream creates in flight to kill the circuit. If this number is small and predicable, a malicious website can create that many streams to cause the circuit to fail, at which point Tor will rebuild a new one on a new path. The website gets to rinse and repeat, and create enough circuits until one of its middles is chosen next to the guard.

comment:27 in reply to:  21 Changed 5 months ago by arma

Replying to dgoulet:

As an attempt, see branch: bug25226_033_01.

I think we still need to figure out possibly a better default value or at the very least a consensus parameter that makes sense.

Big picture review: I think we should proceed with doing this feature, even though we can't really pick a low threshold yet.

I see three benefits for putting this feature in:

  • We should pick a really high threshold for the consensus, like 50000 cells or 100000 cells, which is essentially at the "oom attempt" level, and now we're killing circuits when they overload us a lot, without needing to wait until we're actually running out of memory, and without needing to have our reaction be a function of how much memory the relay has.

I was originally going to say "I don't think there's any number where we should set this in the consensus right now on the main Tor network," but I think at the 50k or 100k cell mark, even if somebody is following the protocol, we could still kill the circuit "because fairness".

  • If things go to shit in the future and people start doing bad things to the network that we're not expecting right now, then this would be another available tool for letting relays defend themselves. Shipping it out now will mean it's in place if we decide we need it.
  • The test networks, where they know the client and website traffic behaviors, can set it to a much lower value, and use it for debugging when they hit the threshold.

For that last one, there are really two things we want to understand here. First, what are the limits on acceptable behavior by "honest" users? That is, what is the threshold above which we say "no honest user would attempt that". And second, are there bugs or surprises in our current design that cause us to hit a higher threshold than we meant to? And it's that second one that a good network testing harness, plus this ticket, can discover.

comment:28 Changed 5 months ago by dgoulet

To summarize arma, we should get this in, keep the current default in the code as is but use a much larger one network wide (consensus). But also, you seem to be suggesting that it would be useful for relay operator to maybe control that value somehow which means it needs to become also a possible torrc option? (the patch doesn't have that).

50k cells is ~25MB so very little. Going to 50MB for 100k cells. I've seen 8001 cells *once* on my fast relay in the last 7 days so 50k cells.

comment:29 in reply to:  28 Changed 5 months ago by arma

Replying to dgoulet:

But also, you seem to be suggesting that it would be useful for relay operator to maybe control that value somehow which means it needs to become also a possible torrc option? (the patch doesn't have that).

Ah, no, I think the way we'd let relays defend themselves is just by turning the knob down on the network-wide consensus value, so relays automatically defend themselves more aggressively.

comment:30 Changed 5 months ago by arma

Status: needs_reviewneeds_revision

Ok, review of the actual branch:

+  if (PREDICT_UNLIKELY(queue->n > max_circuit_cell_queue_size)) {

This wants to be >=, right? Otherwise we let it grow one past the maximum.

and then in the log message, s/cell/cells/.

The default if we don't have a consensus param is 9000? I'm a bit nervous about edge cases where we somehow fail to set a consensus param, or when relays act before they know a consensus param, especially in the future once we've rolled out Mike's "defensive dropping" website fingerprinting design. What would you say to having a default of 50000? Since that's what we're imagining to set as the default for now anyway.

Also, the minimum settable number is 8000? It seems like we want to allow a lower number for chutney experiments where we control all of the traffic and we're trying to check for bugs. I think 1000 might be a little bit too low, for example because if you ask for two streams and they finish, you'll get 500+500+1+1=1002 cells coming at you from the exit. Or if you're using optimistic data, you'll get the CONNECTED cells back first, bumping it up to more like 1004. But still, I think setting the minimum to 1000 is a legitimate choice, since that makes it easy for experimentors to experiment however they see fit. We should just avoid setting the minimum possible value in the consensus, but that's already true for other consensus params, e.g. we should avoid setting DOS_CC_CIRCUIT_BURST_DEFAULT to 1 even though it's allowed by the code.

And, now that we understand more what counts in the sendme window and what doesn't, it might be smart to revise the huge comment to be more accurate.

And, a changes file. :)

comment:31 in reply to:  30 Changed 4 months ago by dgoulet

Status: needs_revisionneeds_review

Replying to arma:

Ok, review of the actual branch:

+  if (PREDICT_UNLIKELY(queue->n > max_circuit_cell_queue_size)) {

This wants to be >=, right? Otherwise we let it grow one past the maximum.

and then in the log message, s/cell/cells/.

Fixup commit ed963d7604

The default if we don't have a consensus param is 9000? I'm a bit nervous about edge cases where we somehow fail to set a consensus param, or when relays act before they know a consensus param, especially in the future once we've rolled out Mike's "defensive dropping" website fingerprinting design. What would you say to having a default of 50000? Since that's what we're imagining to set as the default for now anyway.

I'm fine with it. 50k cells is 25MB so ~40 circuits like that would start putting memory pressure at 1GB. Not many circuits but even at 9k limit, the number of circuits would be a bit less than 250 which is not much either.

Fixup commit 7d25850415

Also, the minimum settable number is 8000? It seems like we want to allow a lower number for chutney experiments where we control all of the traffic and we're trying to check for bugs. I think 1000 might be a little bit too low, for example because if you ask for two streams and they finish, you'll get 500+500+1+1=1002 cells coming at you from the exit. Or if you're using optimistic data, you'll get the CONNECTED cells back first, bumping it up to more like 1004. But still, I think setting the minimum to 1000 is a legitimate choice, since that makes it easy for experimentors to experiment however they see fit. We should just avoid setting the minimum possible value in the consensus, but that's already true for other consensus params, e.g. we should avoid setting DOS_CC_CIRCUIT_BURST_DEFAULT to 1 even though it's allowed by the code.

Fixup commit 497954459d

And, now that we understand more what counts in the sendme window and what doesn't, it might be smart to revise the huge comment to be more accurate.

Fixup commit d4ee02a714

And, a changes file. :)

Fixup commit f87c2ff62a

All the above have been pushed in my _01 branch. I have a _02 branch that squashes everything but also changes one commit message to reflect the changes made above .

Branch (with fixup): bug25226_033_01
Branch (squashed): bug25226_033_02

comment:32 Changed 4 months ago by cypherpunks

Branch (with fixup): bug25226_033_01
Branch (squashed): bug25226_033_02

#define RELAY_CIRC_CELL_QUEUE_SIZE_MIN CIRCWINDOW_START_MAX
/* Default value is set to a large value so we can handle padding cells
 * properly which aren't accounted for in the SENDME window. Default is 50000
 * allowed cells in the queue resulting in ~25MB. */
#define RELAY_CIRC_CELL_QUEUE_SIZE_DEFAULT \
  (50 + RELAY_CIRC_CELL_QUEUE_SIZE_MIN)

(50 + RELAY_CIRC_CELL_QUEUE_SIZE_MIN)

comment:33 Changed 4 months ago by dgoulet

Woa, GREAT catch! Thanks for this cypherpunks.

I've pushed a fixup in the _01 and _02 has been rebased squashed:

Branch (with fixup): bug25226_033_01
Branch (squashed): bug25226_033_02

comment:34 Changed 4 months ago by arma

Status: needs_reviewneeds_revision

Tiny but potentially fun bug:

+static int32_t max_circuit_cell_queue_size;

That starts it out at 0, right?

So until the relay gets (or has) a consensus (e.g. because it just restarted and it's trying to bootstrap but somebody starts using it anyway), any circuits get closed if they get a cell?

Maybe we should initialize it to RELAY_CIRC_CELL_QUEUE_SIZE_DEFAULT at the very start?

Other than this one, looks great!

comment:35 in reply to:  34 Changed 4 months ago by dgoulet

Status: needs_revisionmerge_ready

Replying to arma:

Tiny but potentially fun bug:

+static int32_t max_circuit_cell_queue_size;

That starts it out at 0, right?

Well that is a great catch! lol...

So until the relay gets (or has) a consensus (e.g. because it just restarted and it's trying to bootstrap but somebody starts using it anyway), any circuits get closed if they get a cell?

Maybe we should initialize it to RELAY_CIRC_CELL_QUEUE_SIZE_DEFAULT at the very start?

Fixup commit: 3d1243f6f2a42324

Again:

Branch (with fixup): bug25226_033_01
Branch (squashed): bug25226_033_02

Going in merge_ready for nickm's eyes.

comment:36 Changed 4 months ago by nickm

Resolution: implemented
Status: merge_readyclosed

LGTM; merging to 0.3.3 and forward.

comment:37 Changed 4 months ago by dgoulet

Spec change: ticket25226_01.

It mentions 0.3.3.6-rc which isn't released yet so we can either merge it now or when we release. No strong opinion.

comment:38 Changed 4 months ago by nickm

merged that too; thanks!

Note: See TracTickets for help on using tickets.