connection_ap_attach_pending: Bug: 0x613000b8d680 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.2.9.1-alpha f6c7e131a1ceb178)
Yeah, why?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Thanks for letting us know about this bug. We're working on fixing these up.
/** A list of all the entry_connection_t * objects that are not marked * for close, and are in AP_CONN_STATE_CIRCUIT_WAIT. * * (Right now, we check in several places to make sure that this list is * correct. When it's incorrect, we'll fix it, and log a BUG message.) */static smartlist_t *pending_entry_connections = NULL;
I think that the problem lies in the test in conmnection_exit_connect(). The parentheses are wrong. We should allow rendezvous connections to IPv6 addresses even when IPv6Exit is false, right?
I've tried rewriting this test to be correct and easier to read; please review and test my branch bug19926?
Trac: Status: new to accepted Owner: N/Ato nickm Actualpoints: N/Ato .1
Have we audited all the cases where a stream can go from AP_CONN_STATE_CIRCUIT_WAIT to AP_CONN_STATE_RENDDESC_WAIT, to make sure that we frob pending_entry_connections appropriately in all of them?
Have we audited all the cases where a stream can go from AP_CONN_STATE_CIRCUIT_WAIT to AP_CONN_STATE_RENDDESC_WAIT, to make sure that we frob pending_entry_connections appropriately in all of them?
There are four places where we transition from some state into AP_CONN_STATE_RENDDESC_WAIT: in circuit_get_open_circ_or_launch() in circuituse.c, in connection_ap_handshake_rewrite_and_attach() in connection_edge.c, in rend_client_send_introduction() in rendclient.c, and in rend_client_report_intro_point_failure() in rendclient.c. But in all four of these cases, we call connection_ap_mark_as_non_pending_circuit() right before. So I think we should be ok there.
Next, have we audited all the cases where we add a stream to pending_entry_connections, to make sure we aren't in state AP_CONN_STATE_RENDDESC_WAIT when we do it?
have we audited all the cases where we add a stream to pending_entry_connections, to make sure we aren't in state AP_CONN_STATE_RENDDESC_WAIT when we do it?
There are two places where a stream can get added onto pending_entry_connections:
In connection_ap_mark_as_pending_circuit(), which does a tor_assert to make sure that the state is AP_CONN_STATE_CIRCUIT_WAIT.
In connection_ap_attach_pending(), which only does it if the state is AP_CONN_STATE_CIRCUIT_WAIT.
So I don't think that can be it either.
(I should clarify that the above analysis is looking at git branch f6c7e131a, like in the original report.)
I am now out of places to look.
Git commit 0a701e53 looks like it fixed a few bugs like this, but as far as I can tell, that went into 0.2.8.1-alpha, so way before 0.2.9.1-alpha.
I wonder if there are any cases where we call connection_ap_mark_as_non_pending_circuit() with the wrong pointer, e.g. that we mis-cast it because we're confused about our OO tricks? Then that function silently doesn't remove anything, leaving us in the situation where the stream is still on pending_entry_connections?
Maybe Nick has other guesses, based on debugging past bugs on this pending_entry_connections thing?
For 0.3.0, I would suggest that we refactor everything that changes an AP_CONN's state, and everything that changes the pending_entry_connections list, to make sure they cannot be wrong.
For 0.2.9, that seems kinda big. Maybe we should just tune this warning down a little bit in 0.2.9?
bug19926_029 downgrades the warning to info, for 0.2.9.
I'm also wondering whether we have any places where we confuse this connection for another type of connection, and set its state to some other "6" other than AP_CONN_STATE_RENDDESC_WAIT. But I can't find any place like that, after hunting seriously for a while.
If we turn DEBUGGING_17659 on again, that might help?
Merged to 0.2.9; not taken in master. Thanks! I'm moving this ticket to maint-0.3.0 and needs_information, with the question being: Does this occur with 0.3.0 or later?
Trac: Milestone: Tor: 0.2.9.x-final to Tor: 0.3.0.x-final Status: merge_ready to needs_information
Just got it again recently. That HS was ddosed and not accessible during many minutes for me. When I had reloaded the tabs I immediately got this:
[warn] connection_ap_attach_pending(): Bug: 0x561db4544e70 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.3.1.7 )
The same day from the same HS I got many messages of the form:
[warn] Query '[scrubbed]' didn't have valid rend desc in cache. Failing.
May it be HS admins doing some type of attack against their users?
P.S. As I see, it happens when tor is waiting long for a HS descriptor and cannot get it yet. If in this moment both kill -1 signal is issued to tor and NEWNYM is sent, then, in some cases (not always!), this error appears in log files (just 1 second later after NEWNYM and kill -1).
A connection in the RENDDESC_WAIT state can only get out of that state if the descriptor actually arrives at the client or if the SOCKS connection is closed.
We need to investigate what happens if the directory connection times out for instance, do we close pending RENDDESC_WAIT connections or do they stay in limbo? What happens if a second SOCKS connection comes in for the same .onion but we still have one in that state?
I've gone over every transition, every callsite we can modify the pending connection list and I can't find anything that would result in a connection in RENDDESC_WAIT to be in that list...
As arma/nickm suggested, I think we are at the point of refactoring every RENDDESC_WAIT transition and how it is added to that list so we can use it in 033 or at least as an exercise to try to find the faulting code path.
Oh, and print more information in case we happen to get that bug.
Deferring to 033 for now as I have no clue how to fix this without serious amount of work.
Trac: Milestone: Tor: 0.3.2.x-final to Tor: 0.3.3.x-final
Well, I don't know. I was that user who reported it in tor blog recently and who wrote 2 comments after gk reopened the ticket.
A connection in the RENDDESC_WAIT state can only get out of that state if the descriptor actually arrives at the client or if the SOCKS connection is closed.
I don't know, but... can it be triggered by my iptables rules? It is well known that iptables is too buggy with --owner option. It blocks many valid packets because of so many reasons... that nobody wants to fix them all yet. In my case it is the case too, so I manually block these extra packets that do not pass iptables anyway. Normally this blocking has no effect on any application, but I cannot sure that. If HS is accessed many times, some attempts are interrupted and too many fails... does it mean that SOCKS connections can be sometimes suddenly interrupted and closed?
I would like to see more detailed tor log options, that will give more information about what happens, but which is not so detailed as info, because info gives too much irrelevant information and that information cannot be shared because it contains sensitive information. Ideally I would be glad to see debug-level logs which can be safely shared with tor community without disclosing my tor chains, my guards, and HS I visit.
I'm a bit doubtful that iptables can break the SOCKS requests here unless it is applied on localhost and the SocksPort... would be a bit surprising?
I use iptables everywhere, but in this particular issue it was tor browser inside VM (guest OS) connected to system tor running on host system (host OS). Thus, tor's SocksPort is listening on some bridge interface, which is seen also inside VM (transparent tor proxy is not used). Connections on this bridge are filtered with iptables.
To hunt down this, it is mostly refactoring at this point to try to hunt it down.
Well, let us see whether it can be caught after refactoring of tor code...
The symptoms are exactly the same with the current 0.3.3.9:
Aug 10 11:34:59.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up.Aug 10 11:35:05.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.Aug 10 11:35:05.000 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc".Aug 10 11:35:05.000 [notice] Read configuration file "/etc/tor/torrc".Aug 10 11:35:05.000 [notice] Tor 0.3.3.9 (git-ca1a436fa8e53a32) opening log file.Aug 10 11:35:05.000 [warn] connection_ap_attach_pending(): Bug: 0x55fdfbbe4ed0 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.3.3.9 )Aug 10 11:35:05.000 [warn] connection_ap_attach_pending(): Bug: 0x55fdfb703df0 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.3.3.9 )Aug 10 11:37:06.000 [notice] Tried for 120 seconds to get a connection to [scrubbed]:80. Giving up.
We need more details to be logged in order to track this problem.
Sep 24 07:57:17.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.Sep 24 07:57:17.000 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc".Sep 24 07:57:17.000 [notice] Read configuration file "/etc/tor/torrc".Sep 24 07:57:17.000 [notice] Tor 0.3.4.8 (git-5da0e95e4871a0a1) opening log file.Sep 24 07:57:17.000 [warn] connection_ap_attach_pending(): Bug: 0x55a743e17440 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.3.4.8 )Sep 24 07:57:17.000 [warn] connection_ap_attach_pending(): Bug: 0x55a74403e900 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.3.4.8 )
17:28:31.000 [notice] New control connection opened from 127.0.0.1.17:28:31.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.17:28:31.000 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc".17:28:31.000 [notice] Read configuration file "/etc/tor/torrc".17:28:31.000 [notice] Tor 0.3.5.7 opening log file.17:28:31.000 [warn] connection_ap_attach_pending(): Bug: 0x55e977678200 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.3.5.7 )