Opened 9 months ago

Last modified 3 months ago

#23711 accepted defect

sched: KIST writes to kernel and get a "wants to write" notification right after

Reported by: dgoulet Owned by: dgoulet
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-sched, 034-triage-20180328, 034-removed-20180328
Cc: Actual Points:
Parent ID: #23993 Points:
Reviewer: Sponsor: SponsorV

Description

KIST scheduler does call a write to kernel contrary to the vanilla scheduler. This is done through channel_write_to_kernel() which calls connection_handle_write().

That last function will ultimately call connection_or_flushed_some() which triggers a scheduler_channel_wants_writes() because of this condition:

  datalen = connection_get_outbuf_len(TO_CONN(conn));
  if (datalen < OR_CONN_LOWWATER) {
    scheduler_channel_wants_writes(TLS_CHAN_TO_BASE(conn->chan));

That is OK if datalen > 0 but useless if datalen == 0. For KIST, it makes the channel go back in pending state and scheduled because it wants to write. But then if the outbuf or the cmux queue is empty, we end up scheduling a channel that actually does NOT need to write at all.

Could be the fix here is probably simple as:

  if (datalen > 0 && datalen < OR_CONN_LOWWATER) {

I suspect with KIST, the datalen will always be 0 because KIST in theory controls exactly what goes in the outbuf and what can be written to the kernel so when it triggers a connection write(), the entire outbuf should be drained (in theory). So the effect of this is that every write to the kernel from KIST triggers a useless "wants to write" event rescheduling the channel. Note that this only happens if the channel is in SCHED_CHAN_WAITING_TO_WRITE state.

Child Tickets

Change History (10)

comment:1 Changed 9 months ago by dgoulet

After some analysis, the Vanilla scheduler needs this call for the case where the connection outbuf was full and the circuit queue still has cells in it.

This way, once the outbuf is flushed onto the network, if we ever are under the low watermark (16k), the channel is rescheduled so to try to flush the cells into the outbuf which has room at that point.

This is a big house of cards because the channel *has* to be in SCHED_CHAN_WAITING_TO_WRITE if we ever want to drain the circuit queue because the scheduler_channel_has_waiting_cells() will never do anything unless the channel is in state SCHED_CHAN_WAITING_FOR_CELLS.

For KIST, this "wants to write" is useless outside of the scheduler because KIST actually schedules things and always put back a channel in state "waiting to write" in the channels pending list to be re-assess at the next tick and writes to kernel what it just flushed.

comment:2 Changed 9 months ago by dgoulet

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

This won't really get fixed until #23744 so postponing.

For now, I think it is fine, KIST recovers properly from that.

comment:3 Changed 8 months ago by dgoulet

Sponsor: SponsorV

Tagging all scheduler (and KIST) related tickets for SponsorV

comment:4 Changed 8 months ago by dgoulet

Owner: set to dgoulet
Status: newaccepted

Taking the scheduler related tickets.

comment:5 Changed 8 months ago by dgoulet

Parent ID: 23993

comment:6 Changed 8 months ago by dgoulet

Parent ID: 23993#23993

comment:7 Changed 5 months ago by dgoulet

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

Move 033 ticket I own to 034

comment:8 Changed 3 months ago by nickm

Keywords: 034-triage-20180328 added

comment:9 Changed 3 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:10 Changed 3 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

Note: See TracTickets for help on using tickets.