Opened 2 years ago
Last modified 6 months ago
#23711 assigned defect
sched: KIST writes to kernel and get a "wants to write" notification right after
Reported by: | dgoulet | Owned by: | |
---|---|---|---|
Priority: | Medium | Milestone: | Tor: unspecified |
Component: | Core Tor/Tor | Version: | |
Severity: | Normal | Keywords: | tor-sched, 034-triage-20180328, 034-removed-20180328 |
Cc: | Actual Points: | ||
Parent ID: | #23993 | Points: | |
Reviewer: | Sponsor: |
Description
KIST scheduler does call a write to kernel contrary to the vanilla scheduler. This is done through channel_write_to_kernel()
which calls connection_handle_write()
.
That last function will ultimately call connection_or_flushed_some()
which triggers a scheduler_channel_wants_writes()
because of this condition:
datalen = connection_get_outbuf_len(TO_CONN(conn)); if (datalen < OR_CONN_LOWWATER) { scheduler_channel_wants_writes(TLS_CHAN_TO_BASE(conn->chan));
That is OK if datalen > 0
but useless if datalen == 0
. For KIST, it makes the channel go back in pending state and scheduled because it wants to write. But then if the outbuf or the cmux queue is empty, we end up scheduling a channel that actually does NOT need to write at all.
Could be the fix here is probably simple as:
if (datalen > 0 && datalen < OR_CONN_LOWWATER) {
I suspect with KIST, the datalen will always be 0 because KIST in theory controls exactly what goes in the outbuf and what can be written to the kernel so when it triggers a connection write(), the entire outbuf should be drained (in theory). So the effect of this is that every write to the kernel from KIST triggers a useless "wants to write" event rescheduling the channel. Note that this only happens if the channel is in SCHED_CHAN_WAITING_TO_WRITE
state.
Child Tickets
Change History (13)
comment:1 Changed 2 years ago by
comment:2 Changed 2 years ago by
Milestone: | Tor: 0.3.2.x-final → Tor: 0.3.3.x-final |
---|
This won't really get fixed until #23744 so postponing.
For now, I think it is fine, KIST recovers properly from that.
comment:3 Changed 2 years ago by
Sponsor: | → SponsorV |
---|
Tagging all scheduler (and KIST) related tickets for SponsorV
comment:4 Changed 2 years ago by
Owner: | set to dgoulet |
---|---|
Status: | new → accepted |
Taking the scheduler related tickets.
comment:5 Changed 2 years ago by
Parent ID: | → 23993 |
---|
comment:6 Changed 2 years ago by
Parent ID: | 23993 → #23993 |
---|
comment:7 Changed 23 months ago by
Milestone: | Tor: 0.3.3.x-final → Tor: 0.3.4.x-final |
---|
Move 033 ticket I own to 034
comment:8 Changed 21 months ago by
Keywords: | 034-triage-20180328 added |
---|
comment:9 Changed 21 months ago by
Keywords: | 034-removed-20180328 added |
---|
Per our triage process, these tickets are pending removal from 0.3.4.
comment:10 Changed 20 months ago by
Milestone: | Tor: 0.3.4.x-final → Tor: unspecified |
---|
These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.
comment:11 Changed 6 months ago by
Removing sponsor V as we do not have more time to include this tickets in the sponsor.
comment:12 Changed 6 months ago by
Sponsor: | SponsorV |
---|
Removing sponsor from tickets that we do not have time to fit in the remain of this sponsorship.
comment:13 Changed 6 months ago by
Owner: | dgoulet deleted |
---|---|
Status: | accepted → assigned |
dgoulet will assign himself to the ones he is working on right now.
After some analysis, the Vanilla scheduler needs this call for the case where the connection outbuf was full and the circuit queue still has cells in it.
This way, once the outbuf is flushed onto the network, if we ever are under the low watermark (16k), the channel is rescheduled so to try to flush the cells into the outbuf which has room at that point.
This is a big house of cards because the channel *has* to be in
SCHED_CHAN_WAITING_TO_WRITE
if we ever want to drain the circuit queue because thescheduler_channel_has_waiting_cells()
will never do anything unless the channel is in stateSCHED_CHAN_WAITING_FOR_CELLS
.For KIST, this "wants to write" is useless outside of the scheduler because KIST actually schedules things and always put back a channel in state "waiting to write" in the channels pending list to be re-assess at the next tick and writes to kernel what it just flushed.