Opened 3 weeks ago

Last modified 3 weeks ago

#23711 new defect

sched: KIST writes to kernel and get a "wants to write" notification right after

Reported by: dgoulet Owned by:
Priority: Medium Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-sched
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


KIST scheduler does call a write to kernel contrary to the vanilla scheduler. This is done through channel_write_to_kernel() which calls connection_handle_write().

That last function will ultimately call connection_or_flushed_some() which triggers a scheduler_channel_wants_writes() because of this condition:

  datalen = connection_get_outbuf_len(TO_CONN(conn));
  if (datalen < OR_CONN_LOWWATER) {

That is OK if datalen > 0 but useless if datalen == 0. For KIST, it makes the channel go back in pending state and scheduled because it wants to write. But then if the outbuf or the cmux queue is empty, we end up scheduling a channel that actually does NOT need to write at all.

Could be the fix here is probably simple as:

  if (datalen > 0 && datalen < OR_CONN_LOWWATER) {

I suspect with KIST, the datalen will always be 0 because KIST in theory controls exactly what goes in the outbuf and what can be written to the kernel so when it triggers a connection write(), the entire outbuf should be drained (in theory). So the effect of this is that every write to the kernel from KIST triggers a useless "wants to write" event rescheduling the channel. Note that this only happens if the channel is in SCHED_CHAN_WAITING_TO_WRITE state.

Child Tickets

Change History (2)

comment:1 Changed 3 weeks ago by dgoulet

After some analysis, the Vanilla scheduler needs this call for the case where the connection outbuf was full and the circuit queue still has cells in it.

This way, once the outbuf is flushed onto the network, if we ever are under the low watermark (16k), the channel is rescheduled so to try to flush the cells into the outbuf which has room at that point.

This is a big house of cards because the channel *has* to be in SCHED_CHAN_WAITING_TO_WRITE if we ever want to drain the circuit queue because the scheduler_channel_has_waiting_cells() will never do anything unless the channel is in state SCHED_CHAN_WAITING_FOR_CELLS.

For KIST, this "wants to write" is useless outside of the scheduler because KIST actually schedules things and always put back a channel in state "waiting to write" in the channels pending list to be re-assess at the next tick and writes to kernel what it just flushed.

comment:2 Changed 3 weeks ago by dgoulet

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

This won't really get fixed until #23744 so postponing.

For now, I think it is fine, KIST recovers properly from that.

Note: See TracTickets for help on using tickets.