When the exit relay gets a circuit-level sendme cell, it starts reading on the exit streams, even if we have 500 cells queued in our circuit queue already. So our circuit queue just grows and grows in some cases.
Reported by Mashael, Ian's grad student. She might even have a patch, but I haven't seen it yet so opening a ticket.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
When the exit relay gets a circuit-level sendme cell, it starts reading on the exit streams, even if we have 500 cells queued in our circuit queue already. So our circuit queue just grows and grows in some cases.
Reported by Mashael, Ian's grad student. She might even have a patch, but I haven't seen it yet so opening a ticket.
The patch looks good to me. I'd change the implementation on circuit_queue_high() so that instead of checking for "queue->n >= HIGHWATER" it checks for "streams_blocked && queue->n >= LOWWATER". Being above the high-water mark is the condition to block, but the condition to resume is getting under the low-water mark again.
See branch "bug1653" in my public repo; it is based on "yetonetime"'s original patch, and tries to fix the new-streams issue too.
It rebases cleanly onto maint-0.2.1 if we think it should go there. I think it should; this might also be the solution (or a solution) for bug #1298 (moved) .
As noted on the #1298 (moved) comments by yetonetime, this fix in 8782dcf6a291b isn't right. To consider why, look at the two warning cases in 8782dcf6a291b:
+ log_info(LD_BUG, "Got a cell added to a cell queue when streams were "+ "supposed to be blocked; found that %d streams weren't.", n);
This can happen when we get a new connection on a blocked circuit: the first CONNECTED relay cell will get added, since connections don't start out blocked (nor should they).
+ log_info(LD_BUG, "Got a cell added to a cell queue when streams were "+ "all blocked. We should figure out why.");
This happens on every relay when a cell queue is high: we can't block incoming circuits (only edge connections), so we let them fill the circuit queue as much as they want. We sure don't want to be logging for every cell on a busy circuit, and we don't want every cell to make us do a loop over the streams.
I've tried to fix both of these on the latest commit on branch bug1653 (f89323afda).
Re f89323afdadadb8d, did you know we have a streamid_t type? We should either use it consistently or not have it anymore. (It looks like we're already not using it consistently, before this patch.)
bug1653 (296a7d83880d75) branch looks good to me. I don't know whether we missed any more edge cases, though. Sounds like we should merge it and find out if anything breaks? It probably also wants to be merged into maint-0.2.1 at some point.
Seems sensible to me. I've make a backport branch that we can merge into 0.2.1: see bug1653_for_021 in my public. For now, I'm just merging this to master and reassigning this ticket to the 0.2.1.x-final milestone.
Trac: Milestone: Tor: 0.2.2.x-final to Tor: 0.2.1.x-final
I think we should focus on getting 0.2.2 stable rather than trying to backport #1653 (moved) and #1298 (moved) to 0.2.1 and hoping they get enough testing (they won't).
Or said another way, I think 0.2.1 is frozen enough that this doesn't count as a 'critical' bugfix -- stuff still works after all.