Opened 2 months ago

Last modified 47 minutes ago

#29494 needs_revision enhancement

Optimize interaction between circuitmux and circuitpadding

Reported by: mikeperry Owned by: mikeperry
Priority: High Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: wtf-pad, network-team-roadmap-2019-Q1Q2
Cc: mikeperry Actual Points:
Parent ID: #28634 Points: 5
Reviewer: Sponsor: Sponsor2


In #29204, we realized that we needed to inspect the circuit queue to prevent the circuit padding system from queuing too much padding. For that ticket, we're going to do the simple fix -- we don't schedule padding if the circuit's cell queue already has a window full of cells on it. In that case, we just lie to circpad and tell the machine we sent padding even though we did not.

However, we should determine if we want to have a more complicated interaction, such as delaying our padding until the queue drains. We should also examine creating alternate machine events for when cells are dequeued from circuitmux vs when they are sent locally. This has pluses and minuses for machine interaction.

Child Tickets

Change History (8)

comment:1 Changed 8 weeks ago by mikeperry

Ok, so for simplicity, lying to circpad seems like the right move. If the queue is full, its going to be wrong anyway..

However, the second piece of this ticket is about when to tell circpad that a padding cell or normal cell was sent. Right now, we tell circpad that a padding cell was sent as soon as its callback to send it fires (via circpad_cell_event_padding_sent() from circpad_send_padding_cell_for_callback()). With respect to the wire, this is incorrect because the cell may still wait around in a circuitmux queue for a long time before being sent. This delay may cause the padding system to decide to send another padding packet before the one it just sent even hits the wire.

Similarly, we call circpad_cell_event_nonpadding_sent() from circpad_deliver_sent_relay_cell_events() (via relay_send_command_from_edge_()) and from circpad_deliver_unrecognized_cell_events() (via circuit_receive_relay_cell()). These calls are *also* before the cell is added to the circuitmux queue. This means that circuitpadding may schedule padding timers for additional padding that fire before the actual data packet makes it through the queue.

In both cases, this will lead to more padding added to the circuit than we wanted to. Worse, it means more padding added to *already busy, saturated, and/or stalled circuits*, which is...not a good look.

I think the right solution is to make these circpad_cell_event_*_sent() callbacks from channel_flush_from_first_active_circuit(), possibly right after the cell is popped from the queue. This will be about as accurate as we can get, because right after this it goes into the outbuf and should get sent by KIST within ~10ms, give or take TCP congestion.

The difficulty with this is that circpad needs to know if this cell is padding or not. Since the cell has already been encrypted, we can't just inspect the relay command field anymore. We need to add an additional bitfield to packed_cell_t. 1 bit should suffice, though. All we need to know is is_padding or not.. But we need to set that bitfield in the relay_send_command_from_edge_() call, where the relay cell padding command is set. Otherwise the bitfield flag should remain 0.

comment:2 Changed 8 weeks ago by mikeperry

Draft branch: Tests fail due to event changes. I want to mull this over a little more, but figured I'd post it for reference.

comment:3 Changed 7 weeks ago by mikeperry

Thoughts since Friday:

  1. Because we decrement tokens early but don't check for bins empty until the padding event is sent, we can have a situation where another padding or non-padding cell causes us to decrement tokens on the same bin before a check, eventually leading to a negative token count in that bin. Right now this will trigger a backtrace/fragile assert.
  2. A further optimization is not to send padding if there is at least one cell in the cell queue, because the idea of padding is to append cells to a burst, not place them in the middle of one that is sitting in the queue. However, we'll have to figure out how to update token counts properly in this case, as well. And then there is the question as to if it does make sense to allow a string of padding cells to get queued up. It actually might. But then how much? Perhaps this should be a third ticket.

comment:4 Changed 7 weeks ago by mikeperry

Priority: MediumHigh

comment:5 Changed 7 weeks ago by dgoulet

Ok two comments on the commit. No show stopper, overall the approach makes sense. I have no idea what the overhead here can be at each cell so if you have a clue, maybe noting it here or adding a comment so we have an idea how much work that is adding at each cell.

comment:6 Changed 6 weeks ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

comment:7 Changed 9 days ago by nickm

Cc: mikeperry added
Milestone: Tor: 0.4.1.x-final
Status: assignedneeds_revision

Tentatively calling this 0.4.1, needs_revision? Mike, please adjust if it isn't.

comment:8 Changed 47 minutes ago by mikeperry

Parent ID: #28634

This is not strictly needed for #28634 or 041, but it will change all of our padding vs non-padding relative timing delays, which may affect classifier accuracy. For apples-to-apples comparisons for future attacks+defenses wrt #28634. we should get this in at the same time as it lands.

So yes, I agree. 041.

Note: See TracTickets for help on using tickets.