Opened 11 months ago

Last modified 7 months ago

#24668 new defect

sched: scheduler_compare_channels() function will never pick a channel with no active circuits.

Reported by: dgoulet Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-sched, tor-channel, tor-cell, 034-triage-20180328, 034-removed-20180328
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In the schedulers, scheduler_compare_channels() is used to decide which channel is 'best' to write data from. It delegates to circuitmux_compare_muxes(), which delegates to ewma_cmp_cmux().

But ewma_cmp_cmux() will never prefer a cmux with no active circuits on it! So a channel without active circuits will never be picked by the scheduler to flush from a circuit, which is what triggers flushing from its destroy queue. So the channel will stay around forever, never flushing.

To fix this one, we probably have to fix ewma_cmp_cmux() to look at destroy cells too (somehow). And we still need to make sure that the scheduler's position in the heap changes when the data considered by scheduler_compare_channels() changes [*].

[*] I'm not convinced that we're even doing this right with the current scheduler_compare_channels() code. :(

Child Tickets

Change History (8)

comment:1 in reply to:  description ; Changed 11 months ago by arma

Replying to dgoulet:

To fix this one, we probably have to fix ewma_cmp_cmux() to look at destroy cells too (somehow).

Yes, agreed.

I assume this bug was introduced by our destroy cell queue fix from #7912 (when we separated queued destroy cells from queued other-types-of-cells).

How hard would it be to add the circuit to the active_circuit_pqueue when it has a non-empty destroy queue? Then it would at least get a chance to be considered, even if we aren't load balancing in an optimal way (whatever that means) between sending data cells and destroy cells.

Looking more at #7912, it seems like we think we already did this. So maybe there is some simple bug somewhere in the current code where it doesn't get triggered correctly, rather than a fundamentally broken design?

comment:2 in reply to:  description Changed 11 months ago by arma

Replying to dgoulet:

And we still need to make sure that the scheduler's position in the heap changes when the data considered by scheduler_compare_channels() changes

Can you explain more about what you mean here?

comment:3 in reply to:  1 Changed 11 months ago by dgoulet

Replying to arma:

Replying to dgoulet:

To fix this one, we probably have to fix ewma_cmp_cmux() to look at destroy cells too (somehow).

[snip]

How hard would it be to add the circuit to the active_circuit_pqueue when it has a non-empty destroy queue? Then it would at least get a chance to be considered, even if we aren't load balancing in an optimal way (whatever that means) between sending data cells and destroy cells.

Actually, we can't because it is possible for the circuit_t object to be freed or about to be freed. We often send DESTROY cells from the about_to_free() circuit.

Thus this DESTROY queue keeps what needs to be sent while tor get rid of the circuit object.

It is imo a hard problem because an EWMA circuit priority is a moving value over time for a specific circuit while DESTROY cells, once put in the queue, can't have a priority that relates to the moving one or even to the current circuit alive. Thus the FIFO design maybe?

Because we could decide that when we send a DESTROY cell, we flush the entire circuit queue (like we do now in all cases), append the DESTROY and then keep the circuit object alive until it gets scheduled by the EWMA policy and only then we free(). That is quite an added complexity but would finally make the scheduler handle those cells instead of being bypassed.

Else we need to go in crazier algorithm with multiple priority queues because prioritizing DESTROY cells is important for many reasons.

comment:4 Changed 11 months ago by arma

Summary: sched: scheduler_compare_channels() function will never pick a channel with no active circuis.sched: scheduler_compare_channels() function will never pick a channel with no active circuits.

comment:5 Changed 10 months ago by dgoulet

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

Moving a bunch of tickets from 033 to 034.

comment:6 Changed 8 months ago by nickm

Keywords: 034-triage-20180328 added

comment:7 Changed 8 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:8 Changed 7 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

Note: See TracTickets for help on using tickets.