Opened 23 months ago

Last modified 15 months ago

#23116 new defect

tor stops responding to Ctrl-C and circuits while in infinite descriptor download loop

Reported by: teor Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.3.0.9
Severity: Normal Keywords: needs-insight, needs-analysis, 033-triage-20180320, 033-removed-20180320
Cc: Actual Points:
Parent ID: #16844 Points: 1
Reviewer: Sponsor:

Description

I'm running tor master (ce07b4dd9) and 0.3.0.9 on macOS 10.12.5 x86_64 on a relay with a private IP address over a slow link.

It seems to get in an infinite loop with these kinds of messages, and stops responding on the ORPort and to Ctrl-C (shutdown):

Aug 05 09:04:10.000 [info] handle_response_fetch_microdesc: Received answer to microdescriptor request (status 200, body size 40905) from server '149.172.149.170:9030'
Aug 05 09:04:10.000 [info] I learned some more directory information, but not enough to build a circuit: We're missing descriptors for some of our primary entry guards
Aug 05 09:04:10.000 [info] update_consensus_router_descriptor_downloads: 0 router descriptors downloadable. 0 delayed; 0 present (0 of those were in old_routers); 0 would_reject; 0 wouldnt_use; 6960 in progress.

Child Tickets

Change History (10)

comment:1 Changed 23 months ago by nickm

Does it work when you try to attach a gdb (or lldb, since it's a mac) to the process?

comment:2 Changed 23 months ago by nickm

(You'll need to set DisableDebuggerAttachment 0)

comment:3 Changed 22 months ago by teor

Parent ID: #16844

The relay spends most of its time blocking on directory download sockets, but for some reason never answers the ORPort request. In a 20 second sample:

126.00 ms   84.0%	0 s	 	           connection_dir_client_reached_eof
79.00 ms   52.6%	0 s	 	            update_all_descriptor_downloads
21.00 ms   14.0%	0 s	 	            microdescs_add_to_cache
20.00 ms   13.3%	0 s	 	            count_loading_descriptors_progress

The client hangs when trying to do the SSL handshake with the ORPort:

  File "/Users/twilsonb/tor/endosome/link.py", line 236, in link_request_cell_list
    max_response_len=max_response_len)
  File "/Users/twilsonb/tor/endosome/link.py", line 85, in link_open
    context = ssl_open(ip, port)
  File "/Users/twilsonb/tor/endosome/connect.py", line 73, in ssl_open
    ssl_socket = ssl.wrap_socket(context['tcp_socket'])
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 943, in wrap_socket
    ciphers=ciphers)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 611, in __init__
    self.do_handshake()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.py", line 840, in do_handshake
    self._sslobj.do_handshake()
  • Are ORPort requests delayed until bootstrap completes?
  • Is this a feature?
  • Did we do something to make this much worse in 0.3.0/0.3.1/master?

This seems similar to the directory request expiry issues in #16844: I eventually see directory request expiry log messages from tor.

I wonder if downloading more microdescs per batch will make this slightly less pathological. (I think there's a ticket for this, but I can't find it.)

comment:4 Changed 22 months ago by nickm

Keywords: needs-insight needs-analysis added
Priority: MediumHigh

comment:5 Changed 22 months ago by nickm

Hrmg. I'm trying to figure out what's going on looking at those functions, but without a lot of luck. Maybe a debug log could help?

Does this happen on other versions as well?

comment:6 Changed 21 months ago by teor

#21789 may mitigate this issue

comment:7 Changed 21 months ago by teor

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

I suspect that #23470 might fix this issue, or at least make it less pathological, but I haven't tested it yet. And I'm not sure I'll have time to in 0.3.2.

comment:8 Changed 15 months ago by nickm

Keywords: 033-triage-20180320 added

Marking all tickets reached by current round of 033 triage.

comment:9 Changed 15 months ago by nickm

Keywords: 033-removed-20180320 added

Mark all not-already-included tickets as pending review for removal from 0.3.3 milestone.

comment:10 Changed 15 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: unspecified

These tickets were marked as removed, and nobody has said that they can fix them. Let's remember to look at 033-removed-20180320 as we re-evaluate our triage process, to see whether we're triaging out unnecessarily, and to evaluate whether we're deferring anything unnecessarily. But for now, we can't do these: we need to fix the 033-must stuff now.

Note: See TracTickets for help on using tickets.