If circuit_deliver_create_cell() failed Tor still assumes circuit was attached to n_chan and tries to mess with it later.
While marking circuit by circuit_mark_for_close() it calls:
if (circ->n_chan) { circuit_clear_cell_queue(circ, circ->n_chan); /* Only send destroy if the channel isn't closing anyway */ if (!(circ->n_chan->state == CHANNEL_STATE_CLOSING || circ->n_chan->state == CHANNEL_STATE_CLOSED || circ->n_chan->state == CHANNEL_STATE_ERROR)) { channel_send_destroy(circ->n_circ_id, circ->n_chan, reason); } circuitmux_detach_circuit(circ->n_chan->cmux, circ); circuit_set_n_circid_chan(circ, 0, NULL); }
It is useless, at least, or probably corrupting internal structures.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Trac: Summary: Tor wrongly process circuit as connected to n_chan even if create cell delivery not even tried to Tor wrongly process circuit as connected to n_chan if create cell delivery failed
Yet better to not assign circ->n_chan before call of circuit_set_n_circid_chan(), it's incorrect to call with circ->n_chan assigned because it counted like old_chan. This will require more refactoring.
This bug can to explain huge number of num_n_circuits reported in #12184 (moved). channel_send_destroy inserts pair of chan,id to conn-circid map with zero id, then circuit_set_n_circid_chan() removes it and decrements number of (old_chan->num_n_circuits). Depends number of real circuits and failed extend requests it will underflow num_n_circuits.
"Discrepancy in counts for queued destroy cells" explained by this bug too. channel_send_destroy appends destroy cell to queue with zero id, but traces from conn-circid map removed as soon as cell appends. Total number of destroy cells in queue more than number of ids in use because cells with zero id in queue.
It appears that this bug dates back to 0.2.4 at least, and probably 0.2.3 and earlier. It's probably causing the most trouble in 0.2.5 because of the destroy cell queue, though.
I looked for other places where we set circ->n_chan early, and found one in circuit_handle_first_hop() right before it calls circuit_send_next_onion_skin(). If onion_skin_create() fails there, then n_chan will still be set when circuit_send_next_onion_skin() returns. We should probably fix that too.
Maybe we should also add a check in channel_send_destroy() to make sure we don't send a destroy cell with circuit ID 0, in case it is possible to do that for some other reason.
I've put these all in branch "bug12848_024" in my public repository. I'm a bit concerned about the one in 981e037fd3b9e20b6e58e9c1470999a0f3a1ef0e . What do you think?
How did you find this?
By reading code, after I saw it in my last night dream. I assumed "No unused circIDs found on channel" can be result of real out of ID space without any software bug, but then triggers some another looking a like memory corruption bugs.
What do you think?
I'm worrying about 0044d74b3c51cf5824435e76eca2a675b51a14bc actually.
If circuit_send_next_onion_skin() failed after call of circuit_set_n_circid_chan() by circuit_deliver_create_cell() then pointer to circuit will stuck in chan-circid map.
If circuit_send_next_onion_skin() failed after call of circuit_set_n_circid_chan() by circuit_deliver_create_cell() then pointer to circuit will stuck in chan-circid map.
It's impossible it seems, then no worries.
Right. After circuit_send_next_onion_skin calls circuit_deliver_create_cell, it will either return an error because circuit_deliver_create_cell failed, or it will return 0.
I'm a bit concerned about the one in 981e037fd3b9e20b6e58e9c1470999a0f3a1ef0e
It's bug to send destroy cell with zero ID anyway, it shouldn't trigger anymore in theory. Lets keep it and investigate if reported in practice.