Opened 5 years ago

Closed 4 years ago

#12160 closed defect (fixed)

ORPort self-testing fails behind tcp proxy when using version 0.2.4.22

Reported by: kargig Owned by: andrea
Priority: Medium Milestone: Tor: 0.2.5.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.22
Severity: Keywords: tor-relay 024-backport
Cc: asn Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I was using 0.2.3.25 behind a tcp proxy (haproxy) with the following config and it correctly passed ORPort reachability test. But after upgrading to 0.2.4.22 the same config fails to pass ORPort reachability test.

ORPort 443 NoListen                                                                                   
ORPort 127.0.0.1:444 NoAdvertise

Tor 0.2.3.25

May 29 21:05:22.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
May 29 21:05:22.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: ORPort found reachable

Tor 0.2.4.22

May 29 20:56:42.000 [notice] Now checking whether ORPort 1.2.3.4:443 and DirPort 1.2.3.4:995 are reachable... (this may take up to 20 minutes -- look for log messages indicating success)
May 29 20:56:42.000 [info] consider_testing_reachability(): Testing reachability of my ORPort: 1.2.3.4:443.
May 29 20:57:41.000 [info] consider_testing_reachability(): Testing reachability of my ORPort: 1.2.3.4:443.
May 29 20:57:42.000 [info] circuit_testing_failed(): Our testing circuit (to see if your ORPort is reachable) has failed. I'll try again later.
May 29 20:58:42.000 [info] consider_testing_reachability(): Testing reachability of my ORPort: 1.2.3.4:443.
May 29 20:59:43.000 [info] consider_testing_reachability(): Testing reachability of my ORPort: 1.2.3.4:443.
May 29 21:00:44.000 [info] consider_testing_reachability(): Testing reachability of my ORPort: 1.2.3.4:443.
May 29 21:01:45.000 [info] consider_testing_reachability(): Testing reachability of my ORPort: 1.2.3.4:443.
May 29 21:01:46.000 [info] circuit_testing_failed(): Our testing circuit (to see if your ORPort is reachable) has failed. I'll try again later.

If I shut down haproxy and change 0.2.4.22 config to:

ORPort 443 

it passes ORPort reachability testing.

May 29 21:17:31.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
May 29 21:17:31.000 [info] mark_my_descriptor_dirty(): Decided to publish new relay descriptor: ORPort found reachable

I have logs of level info in all 3 of the above test situations if you need them, but reproducing the situation should be quite easy.

BTW, there was no other backend set for haproxy, it just passes connections from PUBLIC_IP:443 to a single backend (Tor) at 127.0.0.1:444

Child Tickets

Change History (18)

comment:1 Changed 5 years ago by kargig

I've tried to bisect the problem but it proved to be rather hard because many versions would not compile. I started from 0.2.4.9 because I had already tested it and it wasn't working.

Here's the bisect log:

git bisect start
# bad: [0301a1df6c96888680ff8f310af818f239e93f13] Bump to 0.2.4.9-alpha-dev
git bisect bad 0301a1df6c96888680ff8f310af818f239e93f13
# good: [a2f57b97998b325c059d2fac06ca37d4b4dc52a3] bump to 0.2.4.3-alpha-dev
git bisect good a2f57b97998b325c059d2fac06ca37d4b4dc52a3
# bad: [08436b27ffba4094760fd1fe5321bbd255043b53] Merge remote-tracking branch 'origin/maint-0.2.3'
git bisect bad 08436b27ffba4094760fd1fe5321bbd255043b53
# bad: [3a33b1fe3bd7ba4cb1fff73f97ee722a2b127db5] Merge branch 'move_contrib_source' of git://git.torproject.org/nickm/tor
git bisect bad 3a33b1fe3bd7ba4cb1fff73f97ee722a2b127db5
# skip: [b208539b8047a12fb2f1f794c9932fddd577dfdb] Use circuitmux_t in channels and when relaying cells
git bisect skip b208539b8047a12fb2f1f794c9932fddd577dfdb
# good: [feabf4148fc00a8535714ff72d9caa8303a73eaf] Drop support for openssl 0.9.7
git bisect good feabf4148fc00a8535714ff72d9caa8303a73eaf
# skip: [e4a11b890e7c5fe45dc1f5f271fbd8130ccc9c55] Implement circuitmux_alloc()/circuitmux_free() and chanid/circid->muxinfo hash table
git bisect skip e4a11b890e7c5fe45dc1f5f271fbd8130ccc9c55
# good: [dc014c97472e3adf2306938841c13de0040a2ff0] Merge branch 'maint-0.2.3'
git bisect good dc014c97472e3adf2306938841c13de0040a2ff0
# skip: [3c41d7f414511aeb6e9e0fd6bfb9be1af539840a] Implement circuitmux_attach_circuit() in circuitmux.c
git bisect skip 3c41d7f414511aeb6e9e0fd6bfb9be1af539840a
# bad: [965c9de498ab7f6c7ce3dce133bb34456f3d668e] Abolish superfluous channel_find_by_remote_nickname()
git bisect bad 965c9de498ab7f6c7ce3dce133bb34456f3d668e
# good: [751b3aabb5ab88fca16834e559a8d9835831b05f] Merge remote-tracking branch 'public/openssl_1_is_best'
git bisect good 751b3aabb5ab88fca16834e559a8d9835831b05f
# skip: [28f108bcceab59fcf9f27e33065f64bfdb0f159a] Use dirreq_id from channel_t when appropriate
git bisect skip 28f108bcceab59fcf9f27e33065f64bfdb0f159a
# bad: [cb62a0b69a7d67b427224ca4c3075b49853a3a1f] Use channel_is_bad_for_new_circs(), connection_or_get_num_circs() in main.c
git bisect bad cb62a0b69a7d67b427224ca4c3075b49853a3a1f
# skip: [35924435d22c2469ecbe06156c8069a928859d63] Make reachabiity test in dirserv.c use channel_t
git bisect skip 35924435d22c2469ecbe06156c8069a928859d63
# skip: [8b14db9628f0e8982e894034e86c8efdd78cff32] Switch onion.c over to channel_t
git bisect skip 8b14db9628f0e8982e894034e86c8efdd78cff32
# skip: [e136f7ccb4e671e33b6c92a48df819082291f5c1] Convert relay.c/relay.h to channel_t
git bisect skip e136f7ccb4e671e33b6c92a48df819082291f5c1
# skip: [15303c32ec9d84aff8de5ed9df28e779c36c6e5c] Initial channeltls.c/channeltls.h for bug 6465
git bisect skip 15303c32ec9d84aff8de5ed9df28e779c36c6e5c
# skip: [4768c0efe3e9471cc367c3740d1a4ba0ab79626c] Support channel_t in connection_edge.c
git bisect skip 4768c0efe3e9471cc367c3740d1a4ba0ab79626c
# skip: [32337502f11e6c84e4db8591f5f81c4fc6d1da58] Use channel_t rather than or_connection_t for circuits
git bisect skip 32337502f11e6c84e4db8591f5f81c4fc6d1da58
# skip: [6cce6241dd4405f6cf21057f9913e07633cf18bb] Query circuit count from associated channel of or_conn in control.c
git bisect skip 6cce6241dd4405f6cf21057f9913e07633cf18bb
# skip: [f0f87cb68a22feaf552a18b521d3313d843f8793] Convert rendmid.c to channel_t
git bisect skip f0f87cb68a22feaf552a18b521d3313d843f8793
# skip: [519c971f6a3b89f1e81cda3c0290d4d943ec0d78] Use channel_t in cmd.c
git bisect skip 519c971f6a3b89f1e81cda3c0290d4d943ec0d78
# skip: [7f952da55334d3a3693d1c6e8531fd96730265db] Fix make check-spaces in circuitbuild.c and router.h
git bisect skip 7f952da55334d3a3693d1c6e8531fd96730265db
# skip: [77dac97354974e8a819d8e35ad4e7a76199999b4] Use channel_t in cpuworker.c
git bisect skip 77dac97354974e8a819d8e35ad4e7a76199999b4
# skip: [838743654c1bed2bfe22789ff53a1993c005f176] Add channel.c/channel.h for bug 6465
git bisect skip 838743654c1bed2bfe22789ff53a1993c005f176
# skip: [9ad7ba9f2267a9ee34fafda9356f1fa86633f00f] Use connection_or_get_num_circuits() in control.c
git bisect skip 9ad7ba9f2267a9ee34fafda9356f1fa86633f00f

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
35924435d22c2469ecbe06156c8069a928859d63
e136f7ccb4e671e33b6c92a48df819082291f5c1
4768c0efe3e9471cc367c3740d1a4ba0ab79626c
6cce6241dd4405f6cf21057f9913e07633cf18bb
519c971f6a3b89f1e81cda3c0290d4d943ec0d78
77dac97354974e8a819d8e35ad4e7a76199999b4
32337502f11e6c84e4db8591f5f81c4fc6d1da58
8b14db9628f0e8982e894034e86c8efdd78cff32
15303c32ec9d84aff8de5ed9df28e779c36c6e5c
28f108bcceab59fcf9f27e33065f64bfdb0f159a
7f952da55334d3a3693d1c6e8531fd96730265db
f0f87cb68a22feaf552a18b521d3313d843f8793
838743654c1bed2bfe22789ff53a1993c005f176
9ad7ba9f2267a9ee34fafda9356f1fa86633f00f
cb62a0b69a7d67b427224ca4c3075b49853a3a1f
We cannot bisect more!

I've also tested release-0.2.3 branch and e318ab14b10f353da1ebcece0d6490191517e21a works fine.

comment:2 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-final

comment:3 Changed 5 years ago by aexl

I'm not behind a proxy but I get the same results.
I don't know why this is prioritized normal. This breaks Tor.
And an important feature of circumventing FascistFirewalls.

comment:4 Changed 5 years ago by andrea

Owner: set to andrea
Status: newassigned

comment:5 Changed 5 years ago by andrea

Kargig: please provide debug-level logs for the 0.2.4.22 case. You may have found a bug somewhere in the listener side of channel_tls_t, but it's hard to isolate these without more detail.

comment:6 Changed 5 years ago by kargig

you can get the log from here: http://83.212.168.186/debug-0.2.4.22-haproxy.log

I can also give ssh access to the test environment (haproxy+tor) to any developer interested in taking a look.

comment:7 Changed 5 years ago by andrea

Hmm, this is interesting. Finally got a test setup going that I can try to repro this on; your log file has calls to circuit_testing_failed(), and I can't repro that. I am seeing calls to circuit_testing_opened() that don't then lead to the orport being marked reachable, though. There's a bug here but I don't know for sure what it is yet.

comment:8 Changed 5 years ago by andrea

I think I understand a bit more: the connections all have 127.0.0.1 for their address when channel_tls_handle_incoming() gets called, so they get classified as local and the condition (!channel_is_local(circ->p_chan) && !channel_is_outgoing(circ->p_chan)) always fails. These incorrect channel remote addrs can also be observed in the output of channel_dumpstats() sending SIGUSR1 to the Tor process.

On a comparison relay running 0.2.4 without any NoListen/NoAdvertise, though, the channel remote addresses are all correct. I still have to trace down why this worked in the 0.2.3 case, too. The old test was (!is_local_addr(&circ->p_conn->_base.addr) && !connection_or_nonopen_was_started_here(circ->p_conn)).

comment:9 Changed 5 years ago by andrea

Yeah, it just gets 127.0.0.1 for the remote address in connection_handle_listener_read(); the mystery is why this ever worked. I think I'll have to build an 0.2.3 and breakpoint router_orport_found_reachable().

comment:10 Changed 5 years ago by andrea

Testing the matching configuration with 0.2.3, the connection_t addr field is set to a proper remote address and then there's an or_connection_t real_addr set to 127.0.0.1; where is this happening?

comment:11 Changed 5 years ago by andrea

It looks like the real_addr field gets set and the _base.addr field updated by connection_or_init_conn_from_address() based on the address advertised by the remote node as looked up in the consensus after it's been authenticated, and the bug occurs because this happens after channel_tls_handle_incoming() inits the channel marks in the new version. The best fix is probably to add something to channeltls.c that connection_or_init_conn_from_address() can call to update channel marks.

comment:12 Changed 5 years ago by andrea

Status: assignedneeds_review

Proposed fixes in my bug12160_024, bug12160_025 and bug12160 branches against maint-0.2.4, maint-0.2.5 and master respectively. These are tested and verified to work.

comment:13 in reply to:  12 Changed 5 years ago by kargig

Replying to andrea:

Proposed fixes in my bug12160_024, bug12160_025 and bug12160 branches against maint-0.2.4, maint-0.2.5 and master respectively. These are tested and verified to work.

I've tried commit 9682bc2abb98dbe9f9ece6dce1866d188506f53d (bug12160_debug_024 branch) and it still does not work for me.

debug logs: http://83.212.168.186/debug-0.2.4.23-haproxy.log


I just tried bug12160_024 branch and it seems to be able to finish ORPort testing.

Sep 06 22:28:53.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
Last edited 5 years ago by kargig (previous) (diff)

comment:14 Changed 5 years ago by nickm

Keywords: tor-relay 024-backport added
Milestone: Tor: 0.2.5.x-finalTor: 0.2.4.x-final

This code seems okay to me. Merging to 0.2.5 and master, marking for possible 0.2.4 merge.

comment:15 Changed 5 years ago by arma

On the theory that 0.2.5.x is going stable pretty soon, I don't think a backport to 0.2.4 is a critical security / broadly-applicable-to-users issue.

comment:16 Changed 5 years ago by kargig

I have been running branch bug12160_024 for more than 10 days, and I was unable to get the Fast flag using the haproxy setup

ORPort 443 NoListen                                                                                   
ORPort 127.0.0.1:444 NoAdvertise

As soon as I switched to a setup without haproxy in front:

ORPort 443

I immediately got the Fast flag in less than a day.

I am pretty sure haproxy is not the bottleneck is this setup because it's designed to handle thousands of tcp connections really really fast. If I add the relay in an EntryNodes line in my client it seems to be working properly in either setup.

Can you advise on how I should proceed with debugging this ?

comment:17 Changed 5 years ago by weasel

User is actually running more than one tor instance on the same port with the same keys using this setup.

comment:18 Changed 4 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: 0.2.5.x-final
Resolution: fixed
Status: needs_reviewclosed

Resolved in 0.2.5; wontfix in 0.2.4.

Note: See TracTickets for help on using tickets.