We shouldn't have to rescale the EWMA for circuit selection
From relay.c comments on the EWMA algorithm:
If 'double' had infinite precision, we could do this simply by counting a cell sent at startup as having weight 1.0, and a cell sent N seconds later as having weight F^-N. This way, we would never need to re-scale any already-sent cells.
To prevent double from overflowing, we could count a cell sent now as having weight 1.0 and a cell sent N seconds ago as having weight F^N. This, however, would mean we'd need to re-scale ALL old circuits every time we wanted to send a cell.
So as a compromise, we divide time into 'ticks' (currently, 10-second increments) and say that a cell sent at the start of a current tick is worth 1.0, a cell sent N seconds before the start of the current tick is worth F^N, and a cell sent N seconds after the start of the current tick is worth F^-N. This way we don't overflow, and we don't need to constantly rescale.
(end quote from relay.c comments)
I don't think it's true that we would need to re-scale on every send to do this continuously. Conceptually, every circuit gets a weight which is the sum over all cells C_i they've transmitted at time t_i in the past of exp(-(T-t_i)lambda), where lambda is the decay constant and T is the present time. Equivalently, we can factor out an exp(-Tlambda) everywhere and get time-independent weights, which would be the double-overflowing representation those comments mentioned.
However, if we have a priority queue sorted on the summed weight of each circuit, the decay over time can't change the sort order, as long as the time constants are the same for every circuit. The selection of which cell to transmit happens in channel_flush_from_first_active_circuit() (or connection_or_flush_from_first_active_circuit() in the pre-channels world); the algorithm to find a cell to send should go like:
1.) Pick the lowest-weighted active circuit (this is O(1) if we have the sorted circuit list ready, as with the current priority queue implementation), and get the first cell from its queue. 2.) Send the cell and update the EWMA for that circuit (this calculation is O(1), again) 3.) If the circuit has no more cells to send, remove it from the active circuit list (for a reasonable sorted circuit list implementation, this is no worse than O(log(n))). 4.) Otherwise, if it still has cells, move it to its new position in the active circuit list (again, this shouldn't be worse than O(log(n)). 5.) If we want more cells, go back to step 1, possibly picking a new circuit. Otherwise, we're done.
Now, there's a slight hitch there: the comparison function for the weights depends the current time, T - except, as we saw, it can be factored out, and the decay never changes the order. Thus, instead of rescaling stored weights, we should store the time each circuit EWMA was last updated (i.e,, the last time we sent a cell on behalf of that circuit) and the value of sum(exp(-(T-t_i)*lambda) at that time T, and then redefine the comparison function to compensate.
Specifically, given two circuits C_a and C_b, with last updated times T_a and T_b and stored weights W_a = sum(exp(-(T_a-t_i)*lambda),{i for all cells sent on C_a}) and W_b = sum(exp(-(T_b-t_i)*lambda),{i for all cells sent on C_b}), then we can just rescale W_b to T_a by multiplying by exp(T_b-T_a), so we cancel out the exp(-T_b) in W_b. That is, circuit C_a should take precedence over C_b iff W_a < exp(T_b-T_a)*W_b.
Notice that W_a, W_b, T_a, and T_b are all static values which only ever need to be updated when a circuit actually sends a cell; we can define a comparison function on them rather than actually storing up-to-date weights and needing to periodically rescale, and thus obtain the precision of a continuous calculation without the compromise of rescaling only on ticks, without the computational cost of having to rescale at all.
(I believe the current implementation in relay.c is actually incorrect anyway in the case that it wants to send more than one cell and the lowest-weighted circuit changes from one cell to the next in the loop; see the assert that came up in testing for channel_t mentioned in ticket 6465: https://trac.torproject.org/projects/tor/ticket/6465#comment:6 )