Opened 6 years ago

Closed 6 years ago

#12848 closed defect (fixed)

Tor wrongly process circuit as connected to n_chan if create cell delivery failed

Reported by: cypherpunks Owned by:
Priority: Medium Milestone: Tor: 0.2.5.x-final
Component: Core Tor/Tor Version: Tor: 0.2.5.5-alpha
Severity: Keywords: tor-relay 024-backport 023-backport
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

If circuit_deliver_create_cell() failed Tor still assumes circuit was attached to n_chan and tries to mess with it later.
While marking circuit by circuit_mark_for_close() it calls:

  if (circ->n_chan) {
    circuit_clear_cell_queue(circ, circ->n_chan);
    /* Only send destroy if the channel isn't closing anyway */
    if (!(circ->n_chan->state == CHANNEL_STATE_CLOSING ||
          circ->n_chan->state == CHANNEL_STATE_CLOSED ||
          circ->n_chan->state == CHANNEL_STATE_ERROR)) {
      channel_send_destroy(circ->n_circ_id, circ->n_chan, reason);
    }
    circuitmux_detach_circuit(circ->n_chan->cmux, circ);
    circuit_set_n_circid_chan(circ, 0, NULL);
  }

It is useless, at least, or probably corrupting internal structures.

Child Tickets

Change History (17)

comment:1 Changed 6 years ago by cypherpunks

Probably related to #12184 for some memory-corrupting like reported cases.

comment:2 Changed 6 years ago by cypherpunks

--- circuitbuild.c.orig
+++ circuitbuild.c
@@ -658,18 +658,19 @@
     static ratelim_t circid_warning_limit = RATELIM_INIT(9600);
     log_fn_ratelim(&circid_warning_limit, LOG_WARN, LD_CIRC,
                    "failed to get unique circID.");
-    return -1;
+    goto error;
   }
-  log_debug(LD_CIRC,"Chosen circID %u.", (unsigned)id);
-  circuit_set_n_circid_chan(circ, id, circ->n_chan);
 
   memset(&cell, 0, sizeof(cell_t));
   r = relayed ? create_cell_format_relayed(&cell, create_cell)
               : create_cell_format(&cell, create_cell);
   if (r < 0) {
     log_warn(LD_CIRC,"Couldn't format create cell");
-    return -1;
+    goto error;
   }
+  log_debug(LD_CIRC,"Chosen circID %u.", (unsigned)id);
+  circuit_set_n_circid_chan(circ, id, circ->n_chan);
+
   cell.circ_id = circ->n_circ_id;
 
   append_cell_to_circuit_queue(circ, circ->n_chan, &cell,
@@ -693,6 +694,8 @@
   }
 
   return 0;
+error:
+  circ->n_chan = NULL;
+  return -1;
 }
 
 /** We've decided to start our reachability testing. If all
Last edited 6 years ago by cypherpunks (previous) (diff)

comment:3 Changed 6 years ago by cypherpunks

Summary: Tor wrongly process circuit as connected to n_chan even if create cell delivery not even triedTor wrongly process circuit as connected to n_chan if create cell delivery failed

comment:4 Changed 6 years ago by cypherpunks

Yet better to not assign circ->n_chan before call of circuit_set_n_circid_chan(), it's incorrect to call with circ->n_chan assigned because it counted like old_chan. This will require more refactoring.

comment:5 Changed 6 years ago by cypherpunks

This bug can to explain huge number of num_n_circuits reported in #12184. channel_send_destroy inserts pair of chan,id to conn-circid map with zero id, then circuit_set_n_circid_chan() removes it and decrements number of (old_chan->num_n_circuits). Depends number of real circuits and failed extend requests it will underflow num_n_circuits.

comment:6 Changed 6 years ago by cypherpunks

"Discrepancy in counts for queued destroy cells" explained by this bug too. channel_send_destroy appends destroy cell to queue with zero id, but traces from conn-circid map removed as soon as cell appends. Total number of destroy cells in queue more than number of ids in use because cells with zero id in queue.

comment:7 Changed 6 years ago by nickm

Keywords: tor-relay 024-backport 023-backport added
Milestone: Tor: 0.2.5.x-final

Thanks! How did you find this?

It appears that this bug dates back to 0.2.4 at least, and probably 0.2.3 and earlier. It's probably causing the most trouble in 0.2.5 because of the destroy cell queue, though.

I looked for other places where we set circ->n_chan early, and found one in circuit_handle_first_hop() right before it calls circuit_send_next_onion_skin(). If onion_skin_create() fails there, then n_chan will still be set when circuit_send_next_onion_skin() returns. We should probably fix that too.

comment:8 Changed 6 years ago by nickm

Maybe we should also add a check in channel_send_destroy() to make sure we don't send a destroy cell with circuit ID 0, in case it is possible to do that for some other reason.

comment:9 Changed 6 years ago by nickm

Status: newneeds_review

I've put these all in branch "bug12848_024" in my public repository. I'm a bit concerned about the one in 981e037fd3b9e20b6e58e9c1470999a0f3a1ef0e . What do you think?

comment:10 Changed 6 years ago by cypherpunks

How did you find this?

By reading code, after I saw it in my last night dream. I assumed "No unused circIDs found on channel" can be result of real out of ID space without any software bug, but then triggers some another looking a like memory corruption bugs.

comment:11 Changed 6 years ago by cypherpunks

What do you think?

I'm worrying about 0044d74b3c51cf5824435e76eca2a675b51a14bc actually.
If circuit_send_next_onion_skin() failed after call of circuit_set_n_circid_chan() by circuit_deliver_create_cell() then pointer to circuit will stuck in chan-circid map.

comment:12 Changed 6 years ago by cypherpunks

If circuit_send_next_onion_skin() failed after call of circuit_set_n_circid_chan() by circuit_deliver_create_cell() then pointer to circuit will stuck in chan-circid map.

It's impossible it seems, then no worries.

comment:13 in reply to:  11 Changed 6 years ago by nickm

Right. After circuit_send_next_onion_skin calls circuit_deliver_create_cell, it will either return an error because circuit_deliver_create_cell failed, or it will return 0.

comment:14 Changed 6 years ago by cypherpunks

I'm a bit concerned about the one in 981e037fd3b9e20b6e58e9c1470999a0f3a1ef0e

It's bug to send destroy cell with zero ID anyway, it shouldn't trigger anymore in theory. Lets keep it and investigate if reported in practice.

comment:15 Changed 6 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.4.x-final

Merged into 0.2.5 and master. Possible backport if it doesn't cause any trouble.

comment:16 Changed 6 years ago by cypherpunks

Opps. Wrongish was here. circuit_deliver_create_cell is enough.

Last edited 6 years ago by cypherpunks (previous) (diff)

comment:17 Changed 6 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final
Resolution: fixed
Status: needs_reviewclosed

These aren't getting backported. Recommended solution: upgrade to 0.2.5.x stable releases.

Note: See TracTickets for help on using tickets.