Padding cells sent with 0ms delay cause circuit failures
Below appears to be a bug that results in a closed circuit due to a relay sending a padding cell (RELAY_COMMAND_DROP) to the client.
At links below you find code for two circuit padding machines:
- circpad_machine_client_close_circuit_minimal
- circpad_machine_relay_close_circuit_minimal
Machine 1 runs on a client on CIRCUIT_PURPOSE_C_GENERAL circuits with 3 hops as soon as CIRCPAD_CIRC_OPENED: the only thing it does is set a timer 1000s in the future and waits for the timer to expire. The purpose of the machine is to negotiate padding with the relay to activate Machine 2 on the middle relay.
Machine 2 runs at the middle relay and repeatedly sends a padding cell to the client 1 usec after the event CIRCPAD_EVENT_NONPADDING_SENT. In other words, for every non-padding cell we directly add a padding cell. At the client, this causes relay_decrypt_cell(): Incoming cell at client not recognized. Closing.
for all circuits triggering Machine 2 at the relay. Closing a circuit by injecting padding cells seems unintended.
Here is part of a log from a client on info level showing how the machine registers, negotiates with the relay (starting Machine 2 at the relay), eventually fails to decrypt, and the circuit closes.
Sep 05 10:32:19.000 [info] circpad_setup_machine_on_circ(): Registering machine client_close_circuit_minimal to origin circ 3 (5)
Sep 05 10:32:19.000 [info] circpad_node_supports_padding(): Checking padding: supported
Sep 05 10:32:19.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 3 (5), command 2
Sep 05 10:32:19.000 [info] link_apconn_to_circ(): Looks like completed circuit to [scrubbed] does allow optimistic data for connection to [scrubbed]
Sep 05 10:32:19.000 [info] connection_ap_handshake_send_begin(): Sending relay cell 0 on circ 3296464733 to begin stream 20575.
Sep 05 10:32:19.000 [info] connection_ap_handshake_send_begin(): Address/port sent, ap socket 13, n_circ_id 3296464733
Sep 05 10:32:19.000 [info] rep_hist_note_used_port(): New port prediction added. Will continue predictive circ building for 2355 more seconds.
Sep 05 10:32:19.000 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.
Sep 05 10:32:19.000 [info] exit circ (length 3): $[manually-scrubbed](open) $[manually-scrubbed](open) $[manually-scrubbed](open)
Sep 05 10:32:19.000 [info] pathbias_count_use_attempt(): Used circuit 3 is already in path state use attempted. Circuit is a General-purpose client currently open.
Sep 05 10:32:19.000 [info] link_apconn_to_circ(): Looks like completed circuit to [scrubbed] does allow optimistic data for connection to [scrubbed]
Sep 05 10:32:19.000 [info] connection_ap_handshake_send_begin(): Sending relay cell 0 on circ 3296464733 to begin stream 20576.
Sep 05 10:32:19.000 [info] connection_ap_handshake_send_begin(): Address/port sent, ap socket 14, n_circ_id 3296464733
Sep 05 10:32:19.000 [info] circpad_deliver_recognized_relay_cell_events(): Got padding cell on origin circuit 3.
Sep 05 10:32:20.000 [info] relay_decrypt_cell(): Incoming cell at client not recognized. Closing.
Sep 05 10:32:20.000 [info] circuit_receive_relay_cell(): relay crypt failed. Dropping connection.
Sep 05 10:32:20.000 [info] command_process_relay_cell(): circuit_receive_relay_cell (backward) failed. Closing.
Sep 05 10:32:20.000 [info] circpad_circuit_machineinfo_free_idx(): Freeing padding info idx 0 on circuit 3 (23)
Sep 05 10:32:20.000 [info] circpad_node_supports_padding(): Checking padding: supported
Sep 05 10:32:20.000 [info] circpad_negotiate_padding(): Negotiating padding on circuit 3 (23), command 1
Sep 05 10:32:20.000 [info] pathbias_send_usable_probe(): Sending pathbias testing cell to 0.233.9.53:25 on stream 20577 for circ 3.
Sep 05 10:32:20.000 [info] relay_decrypt_cell(): Incoming cell at client not recognized. Closing.
Sep 05 10:32:20.000 [info] circuit_receive_relay_cell(): relay crypt failed. Dropping connection.
Sep 05 10:32:20.000 [info] command_process_relay_cell(): circuit_receive_relay_cell (backward) failed. Closing.
Sep 05 10:32:20.000 [info] pathbias_send_usable_probe(): Got pathbias probe request for circuit 3 with outstanding probe
Sep 05 10:32:20.000 [info] pathbias_check_close(): Circuit 3 closed without successful use for reason 2. Circuit purpose 23 currently 1,open. Len 3.
Sep 05 10:32:20.000 [info] circuit_mark_for_close_(): Circuit 3296464733 (id: 3) marked for close at src/core/or/command.c:582 (orig reason: 2, new reason: 0)
Sep 05 10:32:20.000 [info] connection_edge_destroy(): CircID 0: At an edge. Marking connection for close.
Sep 05 10:32:20.000 [info] connection_edge_destroy(): CircID 0: At an edge. Marking connection for close.
Sep 05 10:32:20.000 [info] circuit_free_(): Circuit 0 (id: 3) has been freed.
If we delay sending the padding from the relay I cannot trigger the bug (see commented out fix in the Machine 2 function). With the code below, the code triggers on every circuit that has the machine negotiated.
Code: https://github.com/pylls/tor/tree/circuit-padding-relay-padding-bug (https://github.com/pylls/tor/blob/circuit-padding-relay-padding-bug/src/core/or/circuitpadding_machines.c#L460, as well as circuitpadding_machines.h and registration in circpad_machines_init() of circuitpadding.c).