Opened 5 years ago

Closed 18 months ago

Last modified 8 months ago

#12020 closed defect (duplicate)

Bootstrap gets stuck at 20% when connecting through a bridge.

Reported by: yawning Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.2.5.4-alpha
Severity: Normal Keywords: tor-bridge, pt, sponsor8-maybe bootstrap 20%
Cc: saint, isis, brade, mcs Actual Points:
Parent ID: #4847 Points:
Reviewer: Sponsor:

Description

I believe this is different from all the other instances of this bug (#11965 and friends), because the client never recovers (I am using a pluggable transport that is experimental, but the symptoms don't point at my code at first glance).

Client debug log:

May 15 19:36:24.000 [debug] connection_dir_client_reached_eof(): Received response from directory server '127.0.0.1:52810': 404 "Not found" (purpose: 6)
May 15 19:36:24.000 [info] connection_dir_client_reached_eof(): Received server info (size 0) from server '127.0.0.1:52810'
May 15 19:36:24.000 [info] connection_dir_client_reached_eof(): Received http status code 404 ("Not found") from server '127.0.0.1:52810' while fetching "/tor/server/authority.z". I'll try again soon.
May 15 19:36:24.000 [debug] conn_close_if_marked(): Cleaning up connection (fd -1).
May 15 19:36:24.000 [debug] connection_remove(): removing socket -1 (type Directory), n_conns now 3

The bridge is fully bootstrapped at this point according to the logs. Bridge functionality should be fully working once the bridge bootstraps to 100% right? This does seem to happen most after I restart both the client and bridge to pick up a new build of the pt binary...

The only notable config option besides the PT is "PublishServerDescriptor 0" (A cursory search for authority.z brings up #9366).

Child Tickets

Attachments (2)

tor-client-debug-obfs4.log.gz (7.2 KB) - added by yawning 5 years ago.
Client debug log.
tor-bridge-debug-obfs4.log.gz (344.3 KB) - added by yawning 5 years ago.
Bridge debug log.

Download all attachments as: .zip

Change History (21)

Changed 5 years ago by yawning

Client debug log.

Changed 5 years ago by yawning

Bridge debug log.

comment:1 Changed 5 years ago by yawning

Keywords: tor-bridge added

comment:2 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-final

Like all the others, what will most help this get solved is repro instructions, if you can find a way to make this reproducible.

comment:3 Changed 5 years ago by arma

Your bridge may have bootstrapped to 100%, but that doesn't mean it could learn its address. The lines you quoted indicated that your bridge returned 404 when asked for its bridge descriptor. That is, it hasn't generated its descriptor yet. Perhaps if you set its address explicitly? What are the torrc lines for bridge and client?

comment:4 in reply to:  3 Changed 5 years ago by yawning

Replying to arma:

Your bridge may have bootstrapped to 100%, but that doesn't mean it could learn its address. The lines you quoted indicated that your bridge returned 404 when asked for its bridge descriptor. That is, it hasn't generated its descriptor yet. Perhaps if you set its address explicitly? What are the torrc lines for bridge and client?

This is indeed possible, I do not have Address set in the bridge side torrc (Would specifying the loopback address break anything here?).

Bridge:

SocksPort 0
ORPort 9001
ExtORPort 6669
DataDirectory /tmp/tor-bridge/
BridgeRelay 1
PublishServerDescriptor 0
ServerTransportListenAddr obfs4 127.0.0.1:52810
#ServerTransportPlugin <Unrelated obfs4 stuff>
#ServerTransportOptions <Unrelated obfs4 stuff>

Client:

SocksPort 9150
DataDirectory /tmp/tor-client-obfs4
UseBridges 1
Bridge obfs4 127.0.0.1:52810 <Unrelated obfs4 stuff>
ClientTransportPlugin <Unrelated obfs4 stuff>

I assume a configuration like this isn't something that's seen in the wild (probably only PT developers do this sort of thing?). FWIW I haven't seen the client actually retry to fetch the bridge descriptor when I've triggered this in the past, and I've waited > 60 sec because I thought it might be the other bugs.

comment:5 Changed 5 years ago by arma

Tor won't retry fetching the bridge descriptor just 60 seconds later -- it failed, so it's unlikely to succeed again so soon after:

  V(TestingBridgeDownloadSchedule, CSV_INTERVAL, "3600, 900, 900, 3600"),

If indeed it hasn't generated its descriptor yet, then just using it as a vanilla bridge on its orport should fail too. That should remove some components from your situation.

What does the bridge say about its reachability testing? Or does it not even get to that because it thinks it has no publicly routable address?

comment:6 in reply to:  5 Changed 5 years ago by yawning

Replying to arma:

Tor won't retry fetching the bridge descriptor just 60 seconds later -- it failed, so it's unlikely to succeed again so soon after:

  V(TestingBridgeDownloadSchedule, CSV_INTERVAL, "3600, 900, 900, 3600"),

If indeed it hasn't generated its descriptor yet, then just using it as a vanilla bridge on its orport should fail too. That should remove some components from your situation.

I have observed this in the past when testing (ORport not working).

What does the bridge say about its reachability testing? Or does it not even get to that because it thinks it has no publicly routable address?

Haven't tried letting it go that far (and it would fail anyway since I don't forward the port).

I've been doing my development with Address set and haven't ran into this again, so I believe your diagnosis is correct.

In light of that, I'm not sure if there's a bug here, it may be nice to have a warning when fetching the bridge descriptor fails, but bridges in the wild presumably have a real address and won't trigger this issue in the first place.

Sorry for taking up your time, and please feel free to close this if all of this behaviour is ok.

comment:7 Changed 5 years ago by nickm

Arma, do you think this is notabug, or something else? Please close/adjust as appropriate.

comment:8 Changed 5 years ago by nickm

Milestone: Tor: 0.2.5.x-finalTor: 0.2.???

We should definitely give more helpful messages in this case, but it doesn't (IIUC) look like a must-fix-in-0.2.5 issue.

comment:9 Changed 5 years ago by saint

Cc: saint added

Typically, when things get stuck at ~20% in my testing, there is a port-blocking issue. As in, the local network is restricting ports to a whitelisted set.

comment:10 Changed 5 years ago by dcf

Just a note as I'm searching trac, you get the same symptom (bootstrapping stops at 20%) if you are a client with UseBridges set and connect to a bridge that has neither BridgeRelay nor DirPort set. info logging is:

[notice] Bootstrapped 20%: Asking for networkstatus consensus
[info] internal circ (length 1): $0000000000000000000000000000000000000000(open)
[info] connection_ap_handshake_send_begin(): Sending relay cell 1 to begin stream 21318.
[info] connection_ap_handshake_send_begin(): Address/port sent, ap socket -1, n_circ_id 3101216939
[info] connection_ap_process_end_not_open(): Edge got end (not a directory) before we're connected. Marking for close.
[info] internal circ (length 1): $0000000000000000000000000000000000000000(open)
[info] stream_end_reason_to_socks5_response(): Reason for ending (526) not recognized; sending generic socks error.
[info] connection_free_(): Freeing linked Socks connection [waiting for connect response] with 57 bytes on inbuf, 0 on outbuf.
[info] connection_dir_client_reached_eof(): 'fetch' response not all here, but we're at eof. Closing.
[info] connection_dir_request_failed(): Giving up on serverdesc/extrainfo fetch from directory server at '0.0.2.0'; retrying
[info] connection_free_(): Freeing linked Directory connection [client reading] with 0 bytes on inbuf, 0 on outbuf.
[info] compute_weighted_bandwidths(): Empty routerlist passed in to consensus weight node selection for rule weight as guard
[info] smartlist_choose_node_by_bandwidth(): Empty routerlist passed in to old node selection for rule weight as guard
[info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)
[info] compute_weighted_bandwidths(): Empty routerlist passed in to consensus weight node selection for rule weight as guard
[info] smartlist_choose_node_by_bandwidth(): Empty routerlist passed in to old node selection for rule weight as guard
[info] should_delay_dir_fetches(): Delaying dir fetches (no running bridges known)

I guess that #12538 will make it work to use any old relay as a bridge.

comment:11 Changed 5 years ago by s7r

This is a strange problem that I faced today as well. Here is some information about my setup:
OpenVZ container with virtual network interface (venet)
OS: Debian 7 Wheezy
Network configuration:
lo (local loopback)
venet0 - inet addr 127.0.0.2 and inet6 addr <public v6 address>/128 scope global
venet0:0 -inet addr <public v4 address)

Trying to build a bridge with obfs3 and obfs4 pluggable transports on Tor 0.2.5.8-rc and obfs4proxy installed from Debian sid repo.

Wanted the bridge to listen on v4 and v6 too, so I have stated in torrc:
ORPort [::]:443
the same for obfs3 and obfs4 listen addr [::]:port
(this opened ORPort on both versions of IP. Checked via a remote server port checker and ports open, everything working fine). OR and obfuscated ports open on v4 and v6.

Trying to connect to the bridge via IPv4, it stucks at loading network status. Takes forever, does not go on. Tried to connect to it as a normal bridge, obfs3 bridge and obfs4 bridge and the same, not loading network status, like bridge is not working or cannot further build Tor circuits.

Added "Address <public v4 address>" in torrc. Restarted Tor daemon. The same, nothing fixed.

Finally, I have removed ORPort [::]:443 from torrc and put instead 2 entries as follows:
ORPort <public v4 address>:443
ORPort <public v6 address>:443
Left the obfuscated listening ports untouched but also removed "Address <v4 address" entry. Restarted Tor daemon.

And... it worked. connected to the Tor network just fine. Connected via regular bridge, obfs3 and obfs4 - all working fine. Ports open on v6 too.

I might add the fact that these are private bridges configured not to send anything to the bridge authorities (maybe this is somehow relevant) [PublishServerDescriptor 0]

Since all this happened within something between 30 mins and 1 hour, I am not sure if my modification of torrc did the fix or the bridge actually learned its own IP and started to work. I don't understan why a bridge will take this long in order to become functional (the bridge was finished bootstraping when tried to connect to it).

Last edited 5 years ago by s7r (previous) (diff)

comment:12 Changed 4 years ago by isis

Cc: isis added
Keywords: pt added
Status: newneeds_information

Can anyone provide steps for reproducing this?

comment:13 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:14 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:15 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:16 Changed 2 years ago by nickm

Keywords: sponsor8-maybe bootstrap added
Severity: Normal

comment:17 Changed 2 years ago by mcs

Cc: brade mcs added

comment:18 Changed 18 months ago by dcf

Keywords: 20% added

Adding a keyword because I lost some time today when I ran into this issue again.

Summary of workaround: add this to your server torrc (the exact address doesn't matter, but 127.0.0.1 didn't work):

Address 1.2.3.4

comment:19 Changed 18 months ago by teor

Parent ID: #4847
Resolution: duplicate
Status: needs_informationclosed

This workaround gives false information to the bridge authority, so it must be used with PublishServerDescriptor 0.
Also, this workaround makes this issue a duplicate of #4847.

Note: See TracTickets for help on using tickets.