Opened 16 months ago

Last modified 4 weeks ago

#19926 accepted defect

BUG warning in connection_ap_attach_pending: waiting for rendezvous desc :*

Reported by: cypherpunks Owned by: dgoulet
Priority: High Milestone: Tor: 0.3.3.x-final
Component: Core Tor/Tor Version: Tor: 0.2.9.1-alpha
Severity: Normal Keywords: bug, regression, tor-hs
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

connection_ap_attach_pending: Bug: 0x613000b8d680 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.2.9.1-alpha f6c7e131a1ceb178)
Yeah, why?

Child Tickets

Change History (27)

comment:1 Changed 15 months ago by teor

Keywords: bug regression? added
Milestone: Tor: 0.2.9.x-final

Thanks for letting us know about this bug. We're working on fixing these up.

/** A list of all the entry_connection_t * objects that are not marked
 * for close, and are in AP_CONN_STATE_CIRCUIT_WAIT.
 *
 * (Right now, we check in several places to make sure that this list is
 * correct.  When it's incorrect, we'll fix it, and log a BUG message.)
 */
static smartlist_t *pending_entry_connections = NULL;

comment:2 Changed 15 months ago by nickm

Summary: waiting for rendezvous desc :*BUG warning in connection_ap_attach_pending: waiting for rendezvous desc :*

comment:3 Changed 14 months ago by nickm

Actual Points: .1
Owner: set to nickm
Status: newaccepted

I think that the problem lies in the test in conmnection_exit_connect(). The parentheses are wrong. We should allow rendezvous connections to IPv6 addresses even when IPv6Exit is false, right?

I've tried rewriting this test to be correct and easier to read; please review and test my branch bug19926?

Last edited 14 months ago by nickm (previous) (diff)

comment:4 Changed 14 months ago by nickm

Status: acceptedneeds_review

comment:5 Changed 14 months ago by nickm

Owner: nickm deleted
Status: needs_reviewassigned

oh darn that was the wrong ticket. Please ignore previous messages.

comment:6 Changed 14 months ago by nickm

Status: assignednew

comment:7 Changed 14 months ago by dgoulet

Actual Points: .1

@teor, did we fixed your comment:1 ? If not, which ticket does this referred to?

comment:8 Changed 13 months ago by arma

Have we audited all the cases where a stream can go from AP_CONN_STATE_CIRCUIT_WAIT to AP_CONN_STATE_RENDDESC_WAIT, to make sure that we frob pending_entry_connections appropriately in all of them?

comment:9 in reply to:  7 Changed 13 months ago by teor

Replying to dgoulet:

@teor, did we fixed your comment:1 ? If not, which ticket does this referred to?

I have no idea, I just read the comment in the code.

comment:10 in reply to:  8 ; Changed 13 months ago by arma

Replying to arma:

Have we audited all the cases where a stream can go from AP_CONN_STATE_CIRCUIT_WAIT to AP_CONN_STATE_RENDDESC_WAIT, to make sure that we frob pending_entry_connections appropriately in all of them?

There are four places where we transition from some state into AP_CONN_STATE_RENDDESC_WAIT: in circuit_get_open_circ_or_launch() in circuituse.c, in connection_ap_handshake_rewrite_and_attach() in connection_edge.c, in rend_client_send_introduction() in rendclient.c, and in rend_client_report_intro_point_failure() in rendclient.c. But in all four of these cases, we call connection_ap_mark_as_non_pending_circuit() right before. So I think we should be ok there.

Next, have we audited all the cases where we add a stream to pending_entry_connections, to make sure we aren't in state AP_CONN_STATE_RENDDESC_WAIT when we do it?

comment:11 in reply to:  10 Changed 13 months ago by arma

Replying to arma:

have we audited all the cases where we add a stream to pending_entry_connections, to make sure we aren't in state AP_CONN_STATE_RENDDESC_WAIT when we do it?

There are two places where a stream can get added onto pending_entry_connections:

1) In connection_ap_mark_as_pending_circuit(), which does a tor_assert to make sure that the state is AP_CONN_STATE_CIRCUIT_WAIT.

2) In connection_ap_attach_pending(), which only does it if the state is AP_CONN_STATE_CIRCUIT_WAIT.

So I don't think that can be it either.

(I should clarify that the above analysis is looking at git branch f6c7e131a, like in the original report.)

I am now out of places to look.

Git commit 0a701e53 looks like it fixed a few bugs like this, but as far as I can tell, that went into 0.2.8.1-alpha, so way before 0.2.9.1-alpha.

I wonder if there are any cases where we call connection_ap_mark_as_non_pending_circuit() with the wrong pointer, e.g. that we mis-cast it because we're confused about our OO tricks? Then that function silently doesn't remove anything, leaving us in the situation where the stream is still on pending_entry_connections?

Maybe Nick has other guesses, based on debugging past bugs on this pending_entry_connections thing?

comment:12 Changed 12 months ago by nickm

For 0.3.0, I would suggest that we refactor everything that changes an AP_CONN's state, and everything that changes the pending_entry_connections list, to make sure they cannot be wrong.

For 0.2.9, that seems kinda big. Maybe we should just tune this warning down a little bit in 0.2.9?

comment:13 Changed 12 months ago by dgoulet

At this point of 029 in rc and almost stable, I would agree on warning down the log but still this should stay on our stack for 030.

comment:14 Changed 12 months ago by nickm

Status: newneeds_review

bug19926_029 downgrades the warning to info, for 0.2.9.

I'm also wondering whether we have any places where we confuse this connection for another type of connection, and set its state to some other "6" other than AP_CONN_STATE_RENDDESC_WAIT. But I can't find any place like that, after hunting seriously for a while.

If we turn DEBUGGING_17659 on again, that might help?

comment:15 Changed 12 months ago by dgoulet

Status: needs_reviewmerge_ready

lgtm;

comment:16 Changed 12 months ago by nickm

Milestone: Tor: 0.2.9.x-finalTor: 0.3.0.x-final
Status: merge_readyneeds_information

Merged to 0.2.9; not taken in master. Thanks! I'm moving this ticket to maint-0.3.0 and needs_information, with the question being: Does this occur with 0.3.0 or later?

comment:17 Changed 10 months ago by nickm

Milestone: Tor: 0.3.0.x-finalTor: unspecified

5 weeks with no updates.

comment:18 Changed 5 months ago by nickm

Resolution: user disappeared
Status: needs_informationclosed

comment:19 Changed 7 weeks ago by gk

Resolution: user disappeared
Status: closedreopened

A user on our blog is reporting this problem (again) with 0.3.1.7: https://blog.torproject.org/comment/271747#comment-271747.

comment:20 in reply to:  19 Changed 7 weeks ago by gk

Replying to gk:

A user on our blog is reporting this problem (again) with 0.3.1.7: https://blog.torproject.org/comment/271747#comment-271747.

Sorry, copy and paste fail: the correct link is https://blog.torproject.org/comment/271751#comment-271751 (thanks ry5 for pointing that out).

comment:21 Changed 7 weeks ago by dgoulet

Keywords: regression tor-hs added; regression? removed
Milestone: Tor: unspecifiedTor: 0.3.2.x-final
Priority: MediumHigh

comment:22 Changed 6 weeks ago by cypherpunks

Just got it again recently. That HS was ddosed and not accessible during many minutes for me. When I had reloaded the tabs I immediately got this:

[warn] connection_ap_attach_pending(): Bug: 0x561db4544e70 is no longer in circuit_wait. Its current state is waiting for rendezvous desc. Why is it on pending_entry_connections? (on Tor 0.3.1.7 )

The same day from the same HS I got many messages of the form:

[warn] Query '[scrubbed]' didn't have valid rend desc in cache. Failing.

May it be HS admins doing some type of attack against their users?

comment:23 Changed 6 weeks ago by cypherpunks

P.S. As I see, it happens when tor is waiting long for a HS descriptor and cannot get it yet. If in this moment both kill -1 signal is issued to tor and NEWNYM is sent, then, in some cases (not always!), this error appears in log files (just 1 second later after NEWNYM and kill -1).

comment:24 Changed 5 weeks ago by dgoulet

A connection in the RENDDESC_WAIT state can only get out of that state if the descriptor actually arrives at the client or if the SOCKS connection is closed.

We need to investigate what happens if the directory connection times out for instance, do we close pending RENDDESC_WAIT connections or do they stay in limbo? What happens if a second SOCKS connection comes in for the same .onion but we still have one in that state?

comment:25 Changed 5 weeks ago by dgoulet

Owner: set to dgoulet
Status: reopenedaccepted

comment:26 Changed 4 weeks ago by dgoulet

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

I've gone over every transition, every callsite we can modify the pending connection list and I can't find anything that would result in a connection in RENDDESC_WAIT to be in that list...

As arma/nickm suggested, I think we are at the point of refactoring every RENDDESC_WAIT transition and how it is added to that list so we can use it in 033 or at least as an exercise to try to find the faulting code path.

Oh, and print more information in case we happen to get that bug.

Deferring to 033 for now as I have no clue how to fix this without serious amount of work.

comment:27 Changed 4 weeks ago by cypherpunks

Well, I don't know. I was that user who reported it in tor blog recently and who wrote 2 comments after gk reopened the ticket.

A connection in the RENDDESC_WAIT state can only get out of that state if the descriptor actually arrives at the client or if the SOCKS connection is closed.

I don't know, but... can it be triggered by my iptables rules? It is well known that iptables is too buggy with --owner option. It blocks many valid packets because of so many reasons... that nobody wants to fix them all yet. In my case it is the case too, so I manually block these extra packets that do not pass iptables anyway. Normally this blocking has no effect on any application, but I cannot sure that. If HS is accessed many times, some attempts are interrupted and too many fails... does it mean that SOCKS connections can be sometimes suddenly interrupted and closed?

I would like to see more detailed tor log options, that will give more information about what happens, but which is not so detailed as info, because info gives too much irrelevant information and that information cannot be shared because it contains sensitive information. Ideally I would be glad to see debug-level logs which can be safely shared with tor community without disclosing my tor chains, my guards, and HS I visit.

Note: See TracTickets for help on using tickets.