Opened 9 months ago

Last modified 3 months ago

#23712 accepted defect

sched: DESTROY cell on a circuit bypasses the scheduler

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-sched, 034-triage-20180328, 034-removed-20180328
Cc: Actual Points:
Parent ID: #23993 Points:
Reviewer: Sponsor: SponsorV

Description

If you look at circuitmux_append_destroy_cell(), it is the one appending a DESTROY cell to the cmux queue and then calls channel_flush_from_first_active_circuit() if no writes are pending that is if the outbuf is empty (also looks at the out queue but that is always empty #23709).

In the case the flush is triggered, the cell is immediately put in the outbuf and written to kernel by libevent which completely bypasses the scheduler. Maybe it is what we want that is go as fast as we can in destroying a circuit? Don't know but it has this effect on the scheduler where the channel is scheduled with a "wants_to_write" event from the connection subsystem and ultimately the channel gets scheduled with nothing in the queue because it is already on the outbuf. For KIST, this is not ideal because KIST should control the flow of data to the kernel.

It seems there are two places we queue cells into a cmux queue: circuitmux_append_destroy_cell() and append_cell_to_circuit_queue(). The latter triggers a "has waiting cells" for the scheduler which is what we want but the former just bypasses it.

I think it should simply trigger that notify to the scheduler instead of flushing it by itself.

Child Tickets

Change History (16)

comment:1 Changed 9 months ago by dgoulet

Status: newneeds_review

See branch: bug23712_032_01.

Ok this might look like a very simple fix but it is really not.

For KIST, this is what we _need_ to do else this process of appending a DESTROY cell bypasses the scheduler and can trigger KIST to try to flush an empty channel which could be one of the source of #23676. The following lists the possible channel state it can be when the destroy cell is appended and the scheduler notified:

  1. if in WAITING_FOR_CELLS, all good, we will get scheduled
  2. KIST can't let unscheduled channel in WAITING_TO_WRITE so we can't be calling "waiting for cells" on a "waiting to write" channel.
  3. KIST doesn't put channel in IDLE state
  4. PENDING state, can't escape the scheduler, like (ii).

For the Vanilla kernel:

  1. if in WAITING_FOR_CELLS, all good, we will get scheduled
  2. if in WAITING_TO_WRITE, it means we have data in the queue but can't put them in the outbuf because it is full. The flushed_some() function will call the "wants to write" event when the outbuf is flushed to the network and thus the DESTROY cell will ultimately get drained.
  3. PENDING state, can't escape the scheduler.
  4. if in the IDLE state, it means we have no more cell in the queue and the outbuf is full, the channel is put in WAITING_TO_WRITE state so (ii) applies to this case that is the "wants to write" event will trigger a flush of the destroy cell at some point when the outbuf is drained.

comment:2 Changed 9 months ago by dgoulet

One quick note here. When a DESTROY cell is sent, it is from circuit_about_to_free() which means that the "circuit_t" will get freed momentarily. However, we put a cell on the circuit queue for which we should be updating our cmux cell count (circuitmux_set_num_cells()) but that function takes a circuit object that we can't use... The circ ID is available but we can't do much with it with the current interface.

That is "OK" because for the scheduler to know if more needs to be flushed, it uses circuitmux_num_cells() which counts the cell + destroy cell.

comment:3 Changed 9 months ago by nickm

Owner: set to dgoulet
Status: needs_reviewassigned

setting owner

comment:4 Changed 9 months ago by nickm

Status: assignedneeds_review

comment:5 Changed 8 months ago by nickm

Keywords: review-group-24 added

review-group-24 is now open.

comment:6 Changed 8 months ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

deferring after talking with dgoulet: this probably isn't critical for 0.3.2

comment:7 Changed 8 months ago by dgoulet

Sponsor: SponsorV

Tagging all scheduler (and KIST) related tickets for SponsorV

comment:8 Changed 8 months ago by nickm

Status: needs_reviewmerge_ready

I'm okay with this change, but first: has this been tested enough to make sure that these properties hold?

1) In the case where the channel has no currently queued cells, the DESTROY cell should get sent soon, and not wait for some other activity to trigger the scheduler.
2) DESTROY cells should really get sent.

It looks to me like they should, given the code, but it would be good to be sure, since these are the kind of issues where if we got them wrong, we might not notice for a while.

comment:9 Changed 8 months ago by dgoulet

Many things needs to be confirmed in terms of cell ordering properties and assumptions from channels to the scheduler.

I plan to do this as part of a broader set of changes that needs to happen since KIST has been added. I'm scheduled by November to do this so expect it in 033.

comment:10 Changed 8 months ago by dgoulet

Parent ID: 23993

comment:11 Changed 8 months ago by dgoulet

Parent ID: 23993#23993

comment:12 Changed 8 months ago by dgoulet

Status: merge_readyneeds_revision

Not merge_ready at all... I thought I had change its status!...

comment:13 Changed 7 months ago by dgoulet

Keywords: review-group-24 removed
Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final
Status: needs_revisionaccepted

The proposed branch won't work properly so putting this one back in accepted state. The solution is much more involving and possibly requires a bit of research. Moving this one to 034 as there is little chance to fix this in 033.

comment:14 Changed 3 months ago by nickm

Keywords: 034-triage-20180328 added

comment:15 Changed 3 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:16 Changed 3 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

Note: See TracTickets for help on using tickets.