Opened 7 years ago

Closed 7 years ago

#7959 closed defect (fixed)

onionskin_answer(): Bug: couldn't format created cell

Reported by: sysrqb Owned by:
Priority: High Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.8-alpha
Severity: Keywords: tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Running 0.2.4.8-alpha with UseNTorHandshake 1

Jan 14 23:01:33.000 [debug] connection_or_process_cells_from_inbuf(): 27: starting, inbuf_datalen 0 (0 pending in tls object).
Jan 14 23:01:33.000 [debug] global_read_bucket now 1073741824.
Jan 14 23:01:33.000 [debug] global_write_bucket now 1073741824.
Jan 14 23:01:33.000 [debug] global_relayed_read_bucket now 819200.
Jan 14 23:01:33.000 [debug] or_conn->read_bucket now 1073741824.
Jan 14 23:01:33.000 [debug] or_conn->write_bucket now 1073741824.
Jan 14 23:01:33.000 [debug] conn_read_callback(): socket 27 wants to read.
Jan 14 23:01:33.000 [debug] connection_read_to_buf(): 27: starting, inbuf_datalen 0 (0 pending in tls object). at_most 16384.
Jan 14 23:01:33.000 [debug] connection_read_to_buf(): After TLS read of 512: 549 read, 0 written
Jan 14 23:01:33.000 [debug] connection_or_process_cells_from_inbuf(): 27: starting, inbuf_datalen 512 (0 pending in tls object).
Jan 14 23:01:33.000 [debug] channel_queue_cell(): Directly handling incoming cell_t 0x7fff3db0c140 for channel 0x7f1548d16660 (global ID 5)
Jan 14 23:01:33.000 [debug] command_process_create_cell(): Got a CREATE cell for circ_id 60215 on channel 5 (0x7f1548d16660)
Jan 14 23:01:33.000 [debug] circuit_get_by_circid_channel_impl(): circuit_get_by_circid_channel_impl() found nothing for circ_id 60215, channel ID 5 (0x7f1548d16660)
Jan 14 23:01:33.000 [debug] circuitmux_attach_circuit(): Attaching circuit 60215 on channel 5 to cmux 0x7f1548d167c0
Jan 14 23:01:33.000 [debug] command_process_create_cell(): success: handed off onionskin.
Jan 14 23:01:33.000 [debug] connection_or_process_cells_from_inbuf(): 27: starting, inbuf_datalen 0 (0 pending in tls object).
Jan 14 23:01:33.000 [debug] conn_write_callback(): socket 22 wants to write.
Jan 14 23:01:33.000 [debug] cpuworker_main(): onion_skin_server_handshake succeeded.
Jan 14 23:01:33.000 [debug] cpuworker_main(): finished writing response.
Jan 14 23:01:33.000 [debug] conn_read_callback(): socket 22 wants to read.
Jan 14 23:01:33.000 [debug] read_to_chunk(): Read 620 bytes. 620 on inbuf.
Jan 14 23:01:33.000 [debug] connection_cpu_process_inbuf(): Unpacking cpuworker reply, chan_id is 5, circ_id is 60215
Jan 14 23:01:33.000 [debug] circuit_get_by_circid_channel_impl(): circuit_get_by_circid_channel_impl() returning circuit 0x7f1548d194f0 for circ_id 60215, channel ID 5 (0x7f1548d16660)
Jan 14 23:01:33.000 [warn] onionskin_answer(): Bug: couldn't format created cell
Jan 14 23:01:33.000 [warn] onionskin_answer failed. Closing.
Jan 14 23:01:33.000 [debug] circuit_get_by_circid_channel_impl(): circuit_get_by_circid_channel_impl() returning circuit 0x7f1548d194f0 for circ_id 60215, channel ID 5 (0x7f1548d16660)
Jan 14 23:01:33.000 [debug] channel_send_destroy(): Sending destroy (circID 60215) on channel 0x7f1548d16660 (global ID 5)
Jan 14 23:01:33.000 [debug] channel_write_cell(): Writing cell_t 0x7fff3db0c010 to channel 0x7f1548d16660 with global ID 5
Jan 14 23:01:33.000 [debug] conn_write_callback(): socket 27 wants to write.
Jan 14 23:01:33.000 [debug] flush_chunk_tls(): flushed 512 bytes, 0 ready to flush, 0 remain.
Jan 14 23:01:33.000 [debug] connection_handle_write_impl(): After TLS write of 512: 0 read, 586 written

Child Tickets

Change History (12)

comment:1 Changed 7 years ago by nickm

This is a server, I assume?

comment:2 Changed 7 years ago by nickm

Keywords: tor-relay added
Milestone: Tor: 0.2.4.x-final

comment:3 Changed 7 years ago by nickm

Do you see many of these ?

I've added a log message in ebf30613ea41bbed3340851e839da9b7db4351c5 that should shed some light on the matter, if you can run that version or later.

comment:4 Changed 7 years ago by nickm

Priority: normalmajor

Oh hey, I bet that this is happening because of this check in check_created_cell:

     if (cell->handshake_len != TAP_ONIONSKIN_REPLY_LEN)
      return -1;
    break;

embedded ntor handshakes have a different size.

I bet that if you're encountering the bug with my logging code, you'll see

"Bug: couldn't format CREATED cell (type=1, len=64)"

And if that's so, I think I have a fix coming up really soon.

comment:5 Changed 7 years ago by nickm

Status: newneeds_review

Fix in branch bug7959; please review.

comment:6 Changed 7 years ago by nickm

(The problem here was that when an ntor handshake is embedded as a create cell, we want to send an ntor reply in a created cell, but we had unfortunately forgotten to tell the code that checks created cells we're about to send that NTOR_REPLY_LEN is a valid length for such cells. I could *swear* I had tested exactly this case.)

comment:7 Changed 7 years ago by sysrqb

Jan 15 09:11:18.000 [warn] onionskin_answer(): Bug: couldn't format created cell (type=2, len=64)
Jan 15 09:11:18.000 [warn] onionskin_answer failed. Closing.

Testing your branch now.

comment:8 Changed 7 years ago by gsathya

I've been running nickm/bug7959 for a couple of hours without any of those msgs. Is there a better test than to just let tor run for more time and wait for it to say something?

comment:9 in reply to:  8 ; Changed 7 years ago by nickm

Replying to gsathya:

I've been running nickm/bug7959 for a couple of hours without any of those msgs. Is there a better test than to just let tor run for more time and wait for it to say something?

Make sure it's running as a relay, and set UseNTorHandshake 1 ; that'll be good for something. In fact, I believe that when it tries to make a testing circuit to probe itself, it'll use the NTor handshake to do so -- probably through a node that doesn't support ntor. So if it decides that its ORPort is reachable, then this bug is fixed.

(I've merged the code, since it's *a* correct thing to do, but I'm still hoping for more confirmation that it *does* fix this issue.)

comment:10 in reply to:  9 Changed 7 years ago by gsathya

Replying to nickm:

Make sure it's running as a relay, and set UseNTorHandshake 1 ; that'll be good for something. In fact, I believe that when it tries to make a testing circuit to probe itself, it'll use the NTor handshake to do so -- probably through a node that doesn't support ntor. So if it decides that its ORPort is reachable, then this bug is fixed.

It is running as a relay with UseNTorHandshake 1

Jan 15 06:56:54.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.

comment:11 Changed 7 years ago by nickm

Woo. If sysrqb confirms too, I think this one is good to close.

comment:12 in reply to:  9 Changed 7 years ago by sysrqb

Resolution: fixed
Status: needs_reviewclosed

Replying to nickm:

Make sure it's running as a relay, and set UseNTorHandshake 1 ; that'll be good for something. In fact, I believe that when it tries to make a testing circuit to probe itself, it'll use the NTor handshake to do so -- probably through a node that doesn't support ntor. So if it decides that its ORPort is reachable, then this bug is fixed.

Yeah, looking through the logs a little more thoroughly the socket that resulted in this was a cpu worker connected to localhost (I'm not too familiar with the cpu worker code right now). The relay eventually figured out it was reachable after successfully completing appropriate external connections.

(I've merged the code, since it's *a* correct thing to do, but I'm still hoping for more confirmation that it *does* fix this issue.)

Jan 15 09:44:32.000 [debug] conn_write_callback(): socket 16 wants to write.
Jan 15 09:44:32.000 [debug] cpuworker_main(): onion_skin_server_handshake succeeded.
Jan 15 09:44:32.000 [debug] cpuworker_main(): finished writing response.
Jan 15 09:44:32.000 [debug] conn_read_callback(): socket 16 wants to read.
Jan 15 09:44:32.000 [debug] read_to_chunk(): Read 620 bytes. 620 on inbuf.
Jan 15 09:44:32.000 [debug] connection_cpu_process_inbuf(): Unpacking cpuworker reply, chan_id is 2, circ_id is 56640
Jan 15 09:44:32.000 [debug] circuit_get_by_circid_channel_impl(): circuit_get_by_circid_channel_impl() returning circuit 0x7f7f6c03a8b0 for circ_id 56640, channel ID 2 (0x7f7f6c014b90)
Jan 15 09:44:32.000 [debug] onionskin_answer(): init digest forward 0x58de764c, backward 0x02b123d9.
Jan 15 09:44:32.000 [debug] append_cell_to_circuit_queue(): Made a circuit active.
Jan 15 09:44:32.000 [debug] append_cell_to_circuit_queue(): Primed a buffer.
Jan 15 09:44:32.000 [debug] channel_write_packed_cell(): Writing packed_cell_t 0x7f7f6c03b8a8 to channel 0x7f7f6c014b90 with global ID 2
Jan 15 09:44:32.000 [debug] channel_flush_from_first_active_circuit(): Made a circuit inactive.
Jan 15 09:44:32.000 [debug] onionskin_answer(): Finished sending 'created' cell.
Jan 15 09:44:32.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.

and socket 16 is a connection to localhost. :)

Looks like this one is good. Closing

Note: See TracTickets for help on using tickets.