Opened 10 months ago

Closed 6 months ago

#20307 closed defect (fixed)

[warn] Remote server sent bogus reason code 65021

Reported by: arma Owned by: dgoulet
Priority: Very High Milestone: Tor: 0.2.9.x-final
Component: Core Tor/Tor Version: Tor: 0.2.8.1-alpha
Severity: Normal Keywords: circuit, 029-backport, nickm-deferred-20161017, triage-out-030-201612,
Cc: isis Actual Points:
Parent ID: Points: 2
Reviewer: Sponsor:

Description

On my Tor client running 0.2.9.3-alpha-dev (git-bfaded9143d127cb) on a really crummy network, I got a string of 65 of these lines:

Oct 06 16:21:58.020 [warn] Remote server sent bogus reason code 65021
Oct 06 16:21:59.027 [warn] Remote server sent bogus reason code 65021
...

In fact, I got one of those warns per second for 65 seconds.

But when I look at the more detailed logs for the first one, I see

Oct 06 16:21:58.020 [info] circuit_build_failed(): Our circuit failed to get a response from the first hop (62.210.123.24:443). I'm going to try to rotate to a better connection.
Oct 06 16:21:58.020 [info] entry_guard_register_connect_status(): Unable to connect to entry guard 'Alastor' (2EB3C230180694A1E848001E20F36F76A2287039). Marking as unreachable.
Oct 06 16:21:58.020 [warn] Remote server sent bogus reason code 65021
Oct 06 16:21:58.020 [debug] circuit_get_by_circid_channel_impl(): circuit_get_by_circid_channel_impl() returning circuit 0x7fda41028de0 for circ_id 3884794536, channel ID 1 (0x7fda41c62410)
Oct 06 16:21:58.020 [debug] circuitmux_append_destroy_cell(): Cmux at 0x7fda41c625d0 queued a destroy for circ 3884794536, cmux counter is now 1, global counter is now 1
Oct 06 16:21:58.020 [debug] channel_send_destroy(): Sending destroy (circID 3884794536) on channel 0x7fda41c62410 (global ID 1)

So it would seem that no "end" cell was received at all. Maybe we are setting END_CIRC_REASON_FLAG_REMOTE somewhere where we shouldn't? Also, what is the mechanism by which the number makes it all the way up to 65021?

Child Tickets

Change History (18)

comment:1 Changed 10 months ago by nickm

  • Milestone changed from Tor: unspecified to Tor: 0.2.9.x-final

Maybe we're using negative numbers to indicate failure, and forgetting to turn them positive somewhere? Because if we interpreted -END_STREAM_REASON_NOSUCHSERVICE as a negative 16 bit integer, we'd get 0xfff4. And if we mask ou END_CIRC_REASON_FLAG_REMOTE (512), we get 0xfdf4 == 65012.

comment:2 Changed 10 months ago by nickm

err no, not END_STREAM_REASON_NOSUCHSERVICE. END_CIRC_REASON_NOSUCHSERVICE .

comment:3 Changed 10 months ago by nickm

Oh darn, you said 65021, not 65012. That makes the target -END_STREAM_REASON_CONNECTFAILED. NO, darn it. I can't do math today. It should be 3.

This warning should only be possible if you're running a controller, since it only appears in circuit_end_reason_to_control_string , and only control_event_circuit_status calls that...

Negative circuit close reason codes can come from at least these:

  • pathbias_count_build_attempt.
  • pathbias_check_probe_response, inconsistently!!
  • circuit_handle_first_hop
  • circuit_send_next_onion_skin
  • circuit_finish_handshake
  • connection_exit_begin_conn, inconsistently!!!
  • circuit_receive_relay_cell
  • connection_process_relay_cell_not_open, inconsistently!!
    • delegates to connection_edge_process_resolved_cell, which will return a stream reason sometimes??
  • connection_ap_process_end_not_open
  • connection_edge_process_relay_cell, inconsistently!
    • delegates to connection_process_relay_cell_not_open
    • delegates to circuit_extend, which can return -1.

And this is wrong:

src/or/control.c:      circuit_mark_for_close(TO_CIRCUIT(circ), -END_CIRC_REASON_CONNECTFAILED);

The above inconsistencies look quite suggestive, and they also suggest that we need a better way to return these codes, since we've made the same type of mistake here in more than one place.

Last edited 10 months ago by nickm (previous) (diff)

comment:4 Changed 10 months ago by nickm

(edited above comment because of my omissions and errors.)

comment:5 Changed 9 months ago by isis

  • Cc isis added
  • Keywords circuit 029-proposed added
  • Points set to 2

I also this bug today on a really crappy network, with the same reason code (65021).

Would it better to track down this bug and do a one-off patch for it, or attempt to make return codes universally more consistent?

comment:6 Changed 9 months ago by nickm

I think we should probably aim for a real solution : there are lots of inconsistencies here and I think we need to think of some better way to have these functions signal errors. Otherwise reason codes will continue to be generally messed up.


comment:7 Changed 9 months ago by nickm

  • Keywords nickm-deferred-20161017 added
  • Milestone changed from Tor: 0.2.9.x-final to Tor: 0.3.0.x-final

I am fairly sure that these are neither regressions nor major problems. So, deferring from 0.2.9. Please let me know if I'm wrong.

comment:8 Changed 7 months ago by dgoulet

  • Keywords triage-out-030-201612 added
  • Milestone changed from Tor: 0.3.0.x-final to Tor: 0.3.1.x-final

Triaged out on December 2016 from 030 to 031.

comment:9 Changed 7 months ago by dgoulet

  • Keywords 029-backport added; 029-proposed removed
  • Milestone changed from Tor: 0.3.1.x-final to Tor: 0.3.0.x-final
  • Priority changed from Medium to Very High
  • Version changed from Tor: 0.2.9.3-alpha to Tor: 0.2.8.1-alpha

Here is the problem which also fixes in part #21056 as introduction point are being flagged wrongfully because of this bug.

In circuit_mark_for_close_(), we take the int reason and then assigned it to:

  circ->marked_for_close_reason = reason;
  circ->marked_for_close_orig_reason = orig_reason;

Both of the above are uint16_t which is bad because the reason value is a int. Furthermore, we have internal reasons that are negative value such as #define END_CIRC_REASON_IP_NOW_REDUNDANT -4 used to indicate the circuit subsystem to NOT report the intro point as a failed connection attempt ultimately putting the intro point in the failure cache. But with the uint16_t conversion, the reason became 65532 instead of -4 which ultimately invalidates 2/3 of the intro points of a descriptor on the client side. BAD!

This issue has been introduced in 8b4e5b7ee released in tor-0.2.8.1-alpha.

I'm also flagging this for an 029 backport even though this is not a security issue, it has quite bad consequences for HS reachability.

comment:10 follow-up: Changed 7 months ago by nickm

  • Status changed from new to needs_review

I think you want this to be in needs_review with branch bug20307_030_01? Or is that not ready?

Should we make a similar change anywhere else? (like, on connections or channels or something?)

comment:11 in reply to: ↑ 10 Changed 7 months ago by dgoulet

Replying to nickm:

I think you want this to be in needs_review with branch bug20307_030_01? Or is that not ready?

It is, thanks! I somehow failed to do so.

Should we make a similar change anywhere else? (like, on connections or channels or something?)

I can only see end_reason (when searching for "reason") in an edge_connection_t that is a uint16_t but that seems we only assign END_STREAM_* reasons which none of them are negative so should be fine. We might want to be careful overtime with this because as far as I can tell, the end circuit reason is 1 byte in the payload of a DESTROY cell which is not even a uint16_t.

comment:12 Changed 7 months ago by nickm

  • Keywords review-group-14 added

comment:13 Changed 6 months ago by dgoulet

  • Owner set to dgoulet
  • Status changed from needs_review to accepted

Taking ownership and then will put back in needs_review

comment:14 Changed 6 months ago by dgoulet

  • Status changed from accepted to needs_review

Branch: bug20307_030_01

comment:15 Changed 6 months ago by nickm

  • Milestone changed from Tor: 0.3.0.x-final to Tor: 0.2.9.x-final

Okay, merging. I can't find anything that will actually break with this, and it _will_ fix the problem. So let's give it a try.

If we backport this to 0.2.9, 955d4b7abdb1f4a3295e9c3ef5fa7abf6f71f057 is the commit to cherry-pick.

If the world explodes, that's the commit to revert. :)

comment:16 Changed 6 months ago by nickm

  • Keywords added; review-group-14 removed

comment:17 Changed 6 months ago by arma

I think this got merged into 0.2.9 and came out in 0.2.9.9?

If so we should close as fixed?

comment:18 Changed 6 months ago by dgoulet

  • Resolution set to fixed
  • Status changed from needs_review to closed

Indeed. Merged all the way.

Note: See TracTickets for help on using tickets.