Opened 13 years ago

Last modified 7 years ago

#367 closed defect (Fixed)

More STREAM events w/ 0 circ

Reported by: mikeperry Owned by: nickm
Priority: Low Milestone: 0.1.2.x-final
Component: Core Tor/Tor Version: 0.1.2.3-alpha
Severity: Keywords:
Cc: mikeperry, arma Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

While the fix in r9021 got a lot of STREAM 0 events, there still are a couple more. Detached streams are not reporting
the circuit they are detaching from, and neither are FAILED:RESOLVEFAILED or CLOSED:TIMEOUT or CLOSED:INTERNAL.

Roger confirmed that in both cases it seems like the code was at least trying to present this information to the
controller, so he said this is a bug rather than a feature request. Fine with me ;)

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (17)

comment:1 Changed 13 years ago by mikeperry

CLOSED REASON=DESTROY also.. Sorry about the linewrap, assumed flyspray would do it.

comment:2 Changed 13 years ago by arma

I listened to some events for a while, and they look fine to me.

Can you put a tor_assert() at the bottom of control_event_stream_status()
to catch the case when it happens for you, and show me a backtrace?

The fix is to hand control_event_stream_status() a circ if you have it,
rather than hoping it will find the right one. But first, convince me it's
broken. :)

comment:3 Changed 13 years ago by mikeperry

(BUG 1)

Here's one I didn't notice in my controller that just happened to fire before the ones I've mentioned
above. I ignore the circ number for SUCCEEDED events, but it probably should be non-zero, eh?

#0 0x0058f402 in kernel_vsyscall ()
#1 0x00149f90 in raise () from /lib/libc.so.6
#2 0x0014b678 in abort () from /lib/libc.so.6
#3 0x0806ed03 in control_event_stream_status (conn=0x9b72238,

tp=STREAM_EVENT_SUCCEEDED, reason_code=0) at or.h:1392

#4 0x08066fad in connection_ap_handshake_socks_reply (conn=0x9b72238,

reply=0xbfa6792c "", replylen=8, endreason=0) at connection_edge.c:1881

#5 0x08067198 in connection_ap_handshake_socks_resolved (conn=0x9b72238,

answer_type=4, answer_len=4, answer=0xbfa67af8 "XVjn

comment:4 Changed 13 years ago by mikeperry

(BUG 2)

Hrmm, So its possible this one is a race condition on attaching vs timeout or something? The reason code
is SOCKS_ALREADY_REPLIED...

#0 0x008e2402 in kernel_vsyscall ()
#1 0x00149f90 in raise () from /lib/libc.so.6
#2 0x0014b678 in abort () from /lib/libc.so.6
#3 0x0806edae in control_event_stream_status (conn=0x9501a70,

tp=STREAM_EVENT_CLOSED, reason_code=256) at or.h:1392

#4 0x08062581 in connection_about_to_close_connection (conn=0x9501a70)

at connection.c:478

#5 0x080831fe in connection_unlink (conn=0x9501a70, remove=1) at main.c:211
#6 0x080833e1 in close_closeable_connections () at main.c:536
#7 0x0090f3e4 in event_base_loop (base=0x913afa8, flags=0) at event.c:256
#8 0x0090f64f in event_loop (flags=0) at event.c:305
#9 0x0090f66e in event_dispatch () at event.c:268
#10 0x0808479b in tor_main (argc=3, argv=0xbfbcf894) at main.c:1193
#11 0x080a08b3 in main (argc=3, argv=0xbfbcf894) at tor_main.c:22

comment:5 Changed 13 years ago by mikeperry

(BUG 3)

I think this is the last one, got a STEAM CLOSED REASON=DESTROY (not remote) w/o circuit.

#0 0x00484402 in kernel_vsyscall ()
#1 0x00149f90 in raise () from /lib/libc.so.6
#2 0x0014b678 in abort () from /lib/libc.so.6
#3 0x0806edb4 in control_event_stream_status (conn=0x8e13fe8,

tp=STREAM_EVENT_CLOSED, reason_code=5) at or.h:1392

#4 0x08062581 in connection_about_to_close_connection (conn=0x8e13fe8)

at connection.c:478

#5 0x08083206 in connection_unlink (conn=0x8e13fe8, remove=1) at main.c:211
#6 0x080833e9 in close_closeable_connections () at main.c:536
#7 0x007e73e4 in event_base_loop (base=0x8a3afa8, flags=0) at event.c:256
#8 0x007e764f in event_loop (flags=0) at event.c:305
#9 0x007e766e in event_dispatch () at event.c:268
#10 0x080847a3 in tor_main (argc=3, argv=0xbf979e34) at main.c:1193
#11 0x080a08bb in main (argc=3, argv=0xbf979e34) at tor_main.c:22

I'm not sure exactly what was going on to make this happen b/c the assert prevents my log line about
what stream this was..

comment:6 Changed 13 years ago by mikeperry

Hrmm, that second to last stack trace (CLOSED REASON=INTERNAL) is still relatively frequent,
even with a large SocksTimeout value (300 seconds), it happened 17 times in about 7 hours, each time
a stream was actually successfully attached to a circuit first, often on the first try (so no timeout
was involved). I can't tell if each one was the same SOCKS_ALREADY_REPLIED or just misc INTERNAL, since
this is not given to the controller.

In that same time period I saw only one of the CLOSED REASON=DESTROY from above, which happened
when the circuit the stream was on was closed with OR_CONN_CLOSED with the stream still on it.

comment:7 Changed 13 years ago by nickm

The first case (which I've tagged BUG 1) is probably not a bug: it happens when the user resolves a hostname
that we already have a mapping for in the cache. In this case, the stream is never attached to a circuit at all.

Still trying to make case of BUG 2 and BUG 3.

comment:8 Changed 13 years ago by nickm

For the third case (BUG 3), I think it's coming from circuit_mark_for_close, in this block (currently at line 941
of circuitlist.c):

while (or_circ->resolving_streams) {
conn = or_circ->resolving_streams;
or_circ->resolving_streams = conn->next_stream;
if (!conn->_base.marked_for_close) {

/* The other side will see a DESTROY, and infer that the connections

  • are closing because the circuit is getting torn down. No need
  • to send an end cell. */

conn->_base.edge_has_sent_end = 1;
conn->end_reason = END_STREAM_REASON_DESTROY;
connection_mark_for_close(TO_CONN(conn));

}
conn->on_circuit = NULL;

}

It might also be in connection_edge_destroy, which also sets on_circuit to NULL.

comment:9 Changed 13 years ago by nickm

Okay. I have a fix for bug 3 in a local branch, and I'm going to call bug 1 notabug unless more info appears.
Only bug 2 remains.

comment:10 Changed 13 years ago by nickm

wrt bug 2, It's either a socks protocol error, or the result of a resolve attempt. I have a patch in
the same local branch that will make the controller get the right reason: it's okay if SOCKPROTOCOL
events aren't attached to circuits, and it's okay if resolves that are answered from the cache aren't
attached to circuits, but it's no good if we have a 0 circuit for a resolve that had to hit the network.

comment:11 Changed 13 years ago by nickm

BUG 3 fixed in svn (I believe) as of r9510.
BUG 2 cleaned up a bit (as described above) as of r9511.
BUG 1 is (IMO) notabug.

comment:12 Changed 13 years ago by mikeperry

Just got a handful of these on my latest run of soat.pl. Seems to be happening with tor-resolve fairly frequently
(pretty much every time it seems). After attaching a circ to a NEWRESOLVE stream, the REMAP, SUCCEEDED, and CLOSED
events all come back with zero for that stream's circuit. This is with r9577.

comment:13 Changed 13 years ago by nickm

Hmmm. That's to be expected for cached resolves, but it's not supposed to happen with ones that get attached to
circuits. Fortunately, the fix shouldn't be too hard.

comment:14 Changed 13 years ago by nickm

Say, hang on. The REMAP event is only generated for resolving streams in the case where we find an
answer in the cache. When I spot-checked this by running two resolves in a row, I got:

650 STREAM 1 NEWRESOLVE 0 moria.seul.org:0
650 STREAM 1 SENTRESOLVE 3 moria.seul.org:0
650 STREAM 1 SUCCEEDED 3 moria.seul.org:0
650 STREAM 1 CLOSED 3 moria.seul.org:0
650 STREAM 2 NEWRESOLVE 0 moria.seul.org:0
650 STREAM 2 REMAP 0 18.244.0.188:0
650 STREAM 2 SUCCEEDED 0 18.244.0.188:0
650 STREAM 2 CLOSED 0 18.244.0.188:0

So it looks like the circuit is set correctly for the case where it's attached to a circuit,
and also in the case where it isn't.

Can you give me some more info about what you're seeing that looks wrong?

comment:15 Changed 13 years ago by nickm

Mike said by email:

The RESOLVE thing was the only 0 circuit deal I was seeing though, so
you probably can just close it.

So, closing. Please reopen if the problem recurs.

comment:16 Changed 13 years ago by nickm

flyspray2trac: bug closed.

comment:17 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.