Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#7267 closed defect (fixed)

src/or/channel.c:1315 channel_closed: Assertion chan->state == CHANNEL_STATE_CLOSING || chan->state == CHANNEL_STATE_CLOSED || chan->state == CHANNEL_STATE_ERROR failed;

Reported by: asn Owned by:
Priority: High Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.4-alpha
Severity: Keywords: tor-relay
Cc: andrea Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

aagbsn reported the following bug:

"torrc" 191L, 8688C written
tor24@DC03R05DS10:~$ tor -f torrc
Oct 27 15:32:59.306 [notice] Tor v0.2.4.5-alpha (git-17a6aa56cefe9d83) running on Linux with Libevent 2.0.16-stable and OpenSSL 0.9.8o.
Oct 27 15:32:59.307 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/downlo
ad#warning
Oct 27 15:32:59.307 [notice] This version is not a stable Tor release. Expect more bugs than usual.
Oct 27 15:32:59.307 [notice] Read configuration file "/home/tor24/torrc".
Oct 27 15:32:59.307 [notice] Your ContactInfo config option is not set. Please consider setting it, so we can contact you if your serve
r is misconfigured or something else goes wrong.
Oct 27 15:32:59.307 [notice] Opening OR listener on 0.0.0.0:2424
Oct 27 15:32:59.307 [notice] Opening Directory listener on 0.0.0.0:9024
Oct 27 15:32:59.000 [notice] Configured to measure directory request statistics, but no GeoIP database found. Please specify a GeoIP da
tabase using the GeoIPFile option.
Oct 27 15:32:59.000 [notice] Your OpenSSL version seems to be 0.9.8o. We recommend 1.0.0 or later.
Oct 27 15:32:59.000 [notice] Your Tor server's identity key fingerprint is 'lna24 E02894BC4F3B8E74CC0182866BC74F044D1C75A9'
Oct 27 15:33:00.000 [notice] We now have enough directory information to build circuits.
Oct 27 15:33:00.000 [notice] Bootstrapped 80%: Connecting to the Tor network.
Oct 27 15:33:01.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
Oct 27 15:33:02.000 [notice] Bootstrapped 85%: Finishing handshake with first hop.
Oct 27 15:33:02.000 [notice] We weren't able to find support for all of the TLS ciphersuites that we wanted to advertise. This won't hu
rt security, but it might make your Tor (if run as a client) more easy for censors to block.
Oct 27 15:33:02.000 [notice] To correct this, use a more recent OpenSSL, built without disabling any secure ciphers or features.
Oct 27 15:33:02.000 [notice] Bootstrapped 90%: Establishing a Tor circuit.
Oct 27 15:33:06.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Oct 27 15:33:06.000 [notice] Bootstrapped 100%: Done.
Oct 27 15:34:05.000 [notice] Self-testing indicates your DirPort is reachable from the outside. Excellent.
Oct 27 15:36:07.000 [notice] Performing bandwidth self-test...done.
Oct 27 21:33:01.000 [notice] Heartbeat: Tor's uptime is 6:00 hours, with 138 circuits open. I've sent 4.34 GB and received 4.07 GB.
Oct 28 03:33:01.000 [notice] Heartbeat: Tor's uptime is 12:00 hours, with 260 circuits open. I've sent 9.07 GB and received 8.49 GB.
Oct 28 09:33:01.000 [notice] Heartbeat: Tor's uptime is 18:00 hours, with 406 circuits open. I've sent 17.43 GB and received 16.27 GB.
Oct 28 15:33:01.000 [notice] Heartbeat: Tor's uptime is 1 day 0:00 hours, with 949 circuits open. I've sent 32.61 GB and received 30.51 GB.
Oct 28 21:33:01.000 [notice] Heartbeat: Tor's uptime is 1 day 6:00 hours, with 1293 circuits open. I've sent 81.66 GB and received 77.32 GB.
Oct 29 03:33:01.000 [notice] Heartbeat: Tor's uptime is 1 day 12:00 hours, with 3153 circuits open. I've sent 137.15 GB and received 129.72 GB.
Oct 29 09:33:01.000 [notice] Heartbeat: Tor's uptime is 1 day 18:00 hours, with 4141 circuits open. I've sent 223.18 GB and received 209.59 GB.
Oct 29 11:01:44.000 [warn] Tried to establish rendezvous on non-OR or non-edge circuit.
Oct 29 15:33:01.000 [notice] Heartbeat: Tor's uptime is 2 days 0:00 hours, with 13439 circuits open. I've sent 365.67 GB and received 339.01 GB.
Oct 29 16:24:15.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy.
Oct 29 19:53:18.000 [warn] Tried to establish rendezvous on non-OR or non-edge circuit.
Oct 29 19:53:40.000 [warn] Tried to establish rendezvous on non-OR or non-edge circuit.
Oct 29 21:33:01.000 [notice] Heartbeat: Tor's uptime is 2 days 6:00 hours, with 6755 circuits open. I've sent 544.70 GB and received 502.54 GB.
Oct 30 03:33:01.000 [notice] Heartbeat: Tor's uptime is 2 days 12:00 hours, with 7723 circuits open. I've sent 653.13 GB and received 598.68 GB.
Oct 30 09:02:42.000 [notice] Received reload signal (hup). Reloading config and resetting internal state.
Oct 30 09:02:42.000 [notice] Read configuration file "/home/tor24/torrc".
Oct 30 09:02:42.000 [notice] Your ContactInfo config option is not set. Please consider setting it, so we can contact you if your server is misconfigured or something else goes wrong.
Oct 30 09:02:42.000 [notice] Configured hibernation. This interval begins at 2012-10-30 00:00:00 and ends at 2012-10-31 00:00:00. We have no prior estimate for bandwidth, so we will start out awake and hibernate when we exhaust our quota.
Oct 30 09:02:42.000 [notice] Configured to measure directory request statistics, but no GeoIP database found. Please specify a GeoIP database using the GeoIPFile option.
Oct 30 09:02:43.000 [notice] Not advertising DirPort (Reason: AccountingMax enabled)
Oct 30 09:33:01.000 [notice] Heartbeat: Tor's uptime is 2 days 18:00 hours, with 4635 circuits open. I've sent 733.72 GB and received 668.67 GB.
Oct 30 12:29:02.000 [warn] Your computer is too slow to handle this many circuit creation requests! Please consider using the MaxAdvertisedBandwidth config option or choosing a more restricted exit policy. [32 similar message(s) suppressed in last 60 seconds]
Oct 30 15:33:01.000 [notice] Heartbeat: Tor's uptime is 3 days 0:00 hours, with 6767 circuits open. I've sent 794.36 GB and received 726.46 GB.
Oct 30 19:28:50.000 [warn] Tried to establish rendezvous on non-OR or non-edge circuit.
Oct 30 20:35:13.000 [notice] Bandwidth soft limit reached; commencing hibernation. No new connections will be accepted
Oct 30 20:36:51.000 [notice] Going dormant. Blowing away remaining connections.
Oct 30 20:36:51.000 [err] channel_closed(): Bug: src/or/channel.c:1315: channel_closed: Assertion chan->state == CHANNEL_STATE_CLOSING || chan->state == CHANNEL_STATE_CLOSED || chan->state == CHANNEL_STATE_ERROR failed; aborting.
src/or/channel.c:1315 channel_closed: Assertion chan->state == CHANNEL_STATE_CLOSING || chan->state == CHANNEL_STATE_CLOSED || chan->state == CHANNEL_STATE_ERROR failed; aborting.
Aborted
$ tor --version
Tor version 0.2.4.5-alpha (git-17a6aa56cefe9d83).

Child Tickets

Change History (36)

comment:2 Changed 6 years ago by nickm

Cc: andrea added

stack traces would be really helpful here, especially if they include the contents of *chan at the time of the crash.

comment:3 Changed 6 years ago by eseifert

I've reproduced the issue again and tried to get backtrace output this time. I'm not very experienced in using gdb, so I hope my following session is useful:

(gdb) r
Starting program: /usr/bin/tor 
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Nov 02 11:18:32.322 [notice] Tor v0.2.4.5-alpha (git-17a6aa56cefe9d83) (with bufferevents) running on Linux with Libevent 2.0.20-stable and OpenSSL 1.0.0j.
Nov 02 11:18:32.322 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Nov 02 11:18:32.322 [notice] This version is not a stable Tor release. Expect more bugs than usual.
Nov 02 11:18:32.322 [notice] Read configuration file "/etc/tor/torrc".
Nov 02 11:18:32.331 [warn] Skipping obsolete configuration option 'Group'
Nov 02 11:18:32.331 [notice] Opening Socks listener on 127.0.0.1:9050
src/or/channel.c:1315 channel_closed: Assertion chan->state == CHANNEL_STATE_CLOSING || chan->state == CHANNEL_STATE_CLOSED || chan->state == CHANNEL_STATE_ERROR failed; aborting.

Program received signal SIGABRT, Aborted.
0x00007ffff68cdc15 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff68cdc15 in raise () from /lib64/libc.so.6
#1  0x00007ffff68cefba in abort () from /lib64/libc.so.6
#2  0x0000000000464f34 in channel_closed (chan=0x1355950) at src/or/channel.c:1313
#3  0x00000000004a2390 in connection_or_about_to_close (or_conn=0x13b6450) at src/or/connection_or.c:601
#4  0x000000000040bb2d in connection_unlink (conn=0x13b6450) at src/or/main.c:385
#5  0x000000000040c472 in conn_close_if_marked (i=<optimized out>) at src/or/main.c:932
#6  close_closeable_connections () at src/or/main.c:702
#7  0x000000000040cd49 in run_scheduled_events (now=1351851760) at src/or/main.c:1535
#8  second_elapsed_callback (timer=<optimized out>, arg=<optimized out>) at src/or/main.c:1691
#9  0x00007ffff748e1ec in event_base_loop () from /usr/lib64/libevent-2.0.so.5
#10 0x000000000040e190 in do_main_loop () at src/or/main.c:1987
#11 0x000000000040f70e in tor_main (argc=<optimized out>, argv=<optimized out>) at src/or/main.c:2699
#12 0x00007ffff68ba4cc in __libc_start_main () from /lib64/libc.so.6
#13 0x000000000040a329 in _start ()
(gdb) print 0x1355950
$1 = 20273488

comment:4 Changed 6 years ago by nickm

Instead of the "print 0x1355950", try "print *chan". You might need to say "up 2" first so that "chan" is in scope.

comment:5 Changed 6 years ago by eseifert

luckily I still I hadn't closed the debugging session yet. Here you are:

(gdb) up 2
#2  0x0000000000464f34 in channel_closed (chan=0x1355950) at src/or/channel.c:1313
1313    src/or/channel.c: No such file or directory.
        in src/or/channel.c
(gdb) print *chan
$2 = {magic = 2316903463, state = CHANNEL_STATE_OPENING, global_identifier = 10,
  registered = 1 '\001', reason_for_closing = CHANNEL_NOT_CLOSING,
  timestamp_created = 1351851696, timestamp_active = 0, free = 0,
  close = 0x4689c0 <channel_tls_close_method>,
  describe_transport = 0x4688e0 <channel_tls_describe_transport_method>,
  dumpstats = 0, cell_handler = 0x481740 <command_process_cell>,
  var_cell_handler = 0x481640 <command_process_var_cell>,
  get_remote_addr = 0x468780 <channel_tls_get_remote_addr_method>, 
  get_remote_descr = 0x468580 <channel_tls_get_remote_descr_method>,
  has_queued_writes = 0x468460 <channel_tls_has_queued_writes_method>,
  is_canonical = 0x468340 <channel_tls_is_canonical_method>,
  matches_extend_info = 0x468220 <channel_tls_matches_extend_info_method>, 
  matches_target = 0x4680c0 <channel_tls_matches_target_method>,
  write_cell = 0x467f60 <channel_tls_write_cell_method>,
  write_packed_cell = 0x467e00 <channel_tls_write_packed_cell_method>,
  write_var_cell = 0x467ca0 <channel_tls_write_var_cell_method>, 
  identity_digest = "\271=\314\005=\177\004r\273\027\244QN\006\376v\331\373qK",
  nickname = 0x0, next_with_same_id = 0x0, prev_with_same_id = 0x0,
  incoming_queue = 0x0, outgoing_queue = 0x0, cmux = 0x10b9470,
  circ_id_type = CIRC_ID_TYPE_LOWER, next_circ_id = 26634, 
  num_n_circuits = 0, num_p_circuits = 0, is_bad_for_new_circs = 0,
  is_client = 0, is_incoming = 0, is_local = 0, timestamp_client = 0,
  timestamp_drained = 0, timestamp_recv = 0, timestamp_xmit = 0,
  timestamp_last_added_nonpadding = 1351851696, dirreq_id = 0,
  n_cells_recved = 0, n_cells_xmitted = 0}

comment:6 Changed 6 years ago by nickm

Interesting! Try one more "up" then have a look at the results of "p *or_conn" ?

comment:7 Changed 6 years ago by eseifert

Here's the output:

(gdb) up
#3  0x00000000004a2390 in connection_or_about_to_close (or_conn=0x13b6450) at src/or/connection_or.c:601
601     src/or/connection_or.c: No such file or directory.
        in src/or/connection_or.c
(gdb) p *or_conn
$3 = {base_ = {magic = 2100428547, state = 1 '\001', type = 4, purpose = 0,
    read_blocked_on_bw = 0, write_blocked_on_bw = 0, hold_open_until_flushed = 0,
    inbuf_reached_eof = 0, in_flushed_some = 0, linked = 0,
    reading_from_linked_conn = 0, writing_to_linked_conn = 0, 
    active_on_link = 0, linked_conn_is_closed = 0, proxy_state = 0, s = -1,
    conn_array_index = 9, read_event = 0x0, write_event = 0x0, inbuf = 0x0,
    outbuf = 0x0, outbuf_flushlen = 0, timestamp_lastread = 1351851696,
    timestamp_lastwritten = 1351851759, bufev = 0x0,
    timestamp_created = 1351851696, socket_family = 2, addr = {family = 2,
        addr = {dummy_ = 2780165965, in_addr = {s_addr = 2780165965},
            in6_addr = {__in6_u = {
                __u6_addr8 = "M\367\265\245", '\000' <repeats 11 times>,
                __u6_addr16 = {63309, 42421, 0, 0, 0, 0, 0, 0}, 
                __u6_addr32 = {2780165965, 0, 0, 0}}}}},
    port = 110, marked_for_close = 3080,
    marked_for_close_file = 0x532604 "src/or/connection.c",
    address = 0x10180e0 "77.247.181.165", linked_conn = 0x0,
    global_identifier = 237}, 
  identity_digest = "\271=\314\005=\177\004r\273\027\244QN\006\376v\331\373qK",
  nickname = 0xf4bdc0 "politkovskaja2", tls = 0x0, tls_error = 0,
  chan = 0x1355950, real_addr = {family = 2, addr = {dummy_ = 2780165965,
            in_addr = {s_addr = 2780165965}, in6_addr = {__in6_u = {
                __u6_addr8 = "M\367\265\245", '\000' <repeats 11 times>,
                __u6_addr16 = {63309, 42421, 0, 0, 0, 0, 0, 0},
                __u6_addr32 = {2780165965, 0, 0, 0}}}}},
  is_canonical = 1, is_connection_with_client = 0, is_outgoing = 1,
  proxy_type = 0, link_proto = 0 '\000', handshake_state = 0x0,
  timestamp_lastempty = 0, timestamp_last_added_nonpadding = 1351851696,
  bandwidthrate = 1073741824, bandwidthburst = 1073741824,
  bucket_cfg = 0x1423a80, next_with_same_id = 0x0}

comment:8 Changed 6 years ago by nickm

Looks like you're running with bufferevents enabled. (Otherwise I don't see how marked_for_close==3080, marked_for_close_file="src/or/connection.c" is possible.) Does this still happen if you run with a copy of Tor built without that option?

comment:9 Changed 6 years ago by nickm

(aagbsn reports for one that he built his Tor without the bufferents option, he thinks. I'd be interested to see how the stack trace and the data structures look when there's no bufferevent support in Tor)

comment:10 Changed 6 years ago by aagbsn

My system is debian squeeze.
I built Tor as follows:

./autogen.sh; ./configure --disable-asciidoc; make clean; make

My torrc is as follows:
SocksPort 0
Nickname lna24
AccountingMax 150 GB
AccountingStart day 00:00
DirPort 9024
MyFamily $8644744F908727C42720DB549F3D2BDB4D0B129A
ExitPolicy reject *:*
ORPort 2424
DisableDebuggerAttachment 0

I'm running tor under gdb for the past 6hrs or so; so far nothing unusual has happened.

comment:11 Changed 6 years ago by nickm

Andrea, it looks like this is the kind of error we would expect to see when a connection_or is marked for close directly, and not via connection_or_close_for_error() or connection_or_close_normally(). Does that sound plausible to you? If so I wonder whether we couldn't hack up a LD_BUG log message for that case... or detect it, issue an LD_BUG message with the offending line number, and allow Tor to continue running normally, seeing as that's the kind of error we're likely to encounter in the future.

comment:12 Changed 6 years ago by andrea

Yeah, this is what would happen if an or_connection_t got marked for close without going through the proper, ahem, channels for channels. Not awake enough to comment on whether the LD_BUG thing is feasible yet; gimme a few minutes.

comment:13 Changed 6 years ago by eseifert

I've compiled tor without bufferevents this time, and it's running fine for a little over an hour now. I'll test it some more and I'll report on the issue if it happens again.

comment:14 Changed 6 years ago by eseifert

Well, I didn't have to wait for too long. I guess it's a different issue though:

(gdb) r
Starting program: /usr/bin/tor 
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Nov 02 16:59:30.734 [notice] Tor v0.2.4.5-alpha (git-17a6aa56cefe9d83) running on Linux with Libevent 2.0.20-stable and OpenSSL 1.0.0j.
Nov 02 16:59:30.735 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Nov 02 16:59:30.735 [notice] This version is not a stable Tor release. Expect more bugs than usual.
Nov 02 16:59:30.735 [notice] Read configuration file "/etc/tor/torrc".
Nov 02 16:59:30.741 [warn] Skipping obsolete configuration option 'Group'
Nov 02 16:59:30.742 [notice] Opening Socks listener on 127.0.0.1:9050

Program received signal SIGPIPE, Broken pipe.
0x00007ffff6e5cd42 in send () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff6e5cd42 in send () from /lib64/libpthread.so.0
#1  0x0000000000458a1c in flush_chunk (buf_flushlen=0x140bfc0, sz=3938, 
    chunk=<optimized out>, buf=0xf00150, s=20) at src/or/buffers.c:789
#2  flush_buf (s=20, buf=0xf00150, sz=<optimized out>, buf_flushlen=0x140bfc0)
    at src/or/buffers.c:875
#3  0x000000000040b022 in conn_close_if_marked (i=<optimized out>)
    at src/or/main.c:857
#4  close_closeable_connections () at src/or/main.c:702
#5  0x00007ffff76951ec in event_base_loop () from /usr/lib64/libevent-2.0.so.5
#6  0x000000000040cab0 in do_main_loop () at src/or/main.c:1987
#7  0x000000000040e06e in tor_main (argc=<optimized out>, argv=<optimized out>)
    at src/or/main.c:2699
#8  0x00007ffff6ac14cc in __libc_start_main () from /lib64/libc.so.6
#9  0x0000000000409229 in _start ()

comment:15 Changed 6 years ago by nickm

false alarm; sigpipe is nothing to worry about. Before you restart tor in gdb, you need to tell gdb to ignore it with "handle sigpipe nostop"

comment:16 Changed 6 years ago by aagbsn

Nov 02 06:37:39.000 [notice] We now have enough directory information to build circuits.
Nov 02 06:37:39.000 [notice] Bootstrapped 80%: Connecting to the Tor network.
Nov 02 06:37:39.000 [notice] Bootstrapped 85%: Finishing handshake with first hop.
Nov 02 06:37:40.000 [notice] Bootstrapped 90%: Establishing a Tor circuit.
Nov 02 06:37:46.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Nov 02 06:37:46.000 [notice] Bootstrapped 100%: Done.
Nov 02 06:37:55.000 [notice] Performing bandwidth self-test...done.
Nov 02 06:38:04.000 [notice] Self-testing indicates your DirPort is reachable from the outside. Excellent.
Nov 02 06:38:04.000 [notice] Not advertising DirPort (Reason: AccountingMax enabled)
Nov 02 12:37:24.000 [notice] Heartbeat: Tor's uptime is 6:00 hours, with 7676 circuits open. I've sent 103.68 GB and received 102.47 GB.
Nov 02 14:06:28.000 [notice] Bandwidth soft limit reached; commencing hibernation. No new connections will be accepted
Nov 02 14:07:30.000 [notice] Going dormant. Blowing away remaining connections.
Nov 02 14:07:30.000 [err] channel_closed(): Bug: src/or/channel.c:1315: channel_closed: Assertion chan->state == CHANNEL_STATE_CLOSING || chan->state == CHANNEL_STATE_CLOSED || chan->state == CHANNEL_STATE_ERROR failed; aborting.
src/or/channel.c:1315 channel_closed: Assertion chan->state == CHANNEL_STATE_CLOSING || chan->state == CHANNEL_STATE_CLOSED || chan->state == CHANNEL_STATE_ERROR failed; aborting.

Program received signal SIGABRT, Aborted.
0x00007ffff68341b5 in ?? ()
(gdb) bt
#0  0x00007ffff68341b5 in ?? ()
#1  0x00007ffff6836fc0 in ?? ()
#2  0xd5e900db8fa92fdd in ?? ()
#3  0x00007ffff6b5e860 in ?? ()
#4  0x0000000050940be2 in ?? ()
#5  0x00007ffff82065cc in ?? ()
#6  0x00000000000005a2 in ?? ()
#7  0x00007fffffffe06e in ?? ()
#8  0x0000000000000001 in ?? ()
#9  0x00007ffff68e4b1b in ?? ()
#10 0x0000003000000030 in ?? ()
#11 0x00007fffffffdbe8 in ?? ()
#12 0x00007fffffffdb10 in ?? ()
#13 0x00007fffffffdbf0 in ?? ()
#14 0x00007fffffffdb20 in ?? ()
#15 0x85b5c692cdd36500 in ?? ()
#16 0x2d8d08760144972b in ?? ()
#17 0x00007ffff7fad806 in ?? ()
#18 0x0000000000000523 in ?? ()
#19 0x00007ffff7fad148 in ?? ()
#20 0x00007ffff7fad806 in ?? ()
#21 0x0000000000000523 in ?? ()
#22 0x0000000000000020 in ?? ()
---Type <return> to continue, or q <return> to quit---
#23 0x0000000000000000 in ?? ()

Crap. Should I have built a static binary? I didn't notice this message:

warning: The current binary is a PIE (Position Independent Executable), which
GDB does NOT currently support.  Most debugger features will fail if used
in this session.

Reading symbols from /home/tor24/bin/tor...done.

I thought perhaps I could dereference the pointer to chan, but this did not seem to work:

(gdb) info frame
Stack level 0, frame at 0x7fffffffdab0:
 rip = 0x7ffff68341b5; saved rip 0x7ffff6836fc0
 called by frame at 0x7fffffffdab8
 Arglist at 0x7fffffffdaa0, args: 
 Locals at 0x7fffffffdaa0, Previous frame's sp is 0x7fffffffdab0
 Saved registers:
  rip at 0x7fffffffdaa8
(gdb) x 0x7fffffffdaa0 
0x7fffffffdaa0: 0xffffe06e
(gdb) x 0xffffe06e
0xffffe06e:     Cannot access memory at address 0xffffe06e

comment:17 Changed 6 years ago by nickm

Just closed #7303 as a likely duplicate of this.

Andrea, I think for the solution, or at least part of it, we need to make it so that it's safe to call connection_mark_for_close on an OR connection without necessarily also invoking connection_or_close{_for_error,_normally}. So far there seem to be at least three cases where we're marking generic connections without going through the right "ahem, channels" -- there will probably be more.

comment:18 in reply to:  17 Changed 6 years ago by andrea

Status: newneeds_review

Replying to nickm:

Just closed #7303 as a likely duplicate of this.

Andrea, I think for the solution, or at least part of it, we need to make it so that it's safe to call connection_mark_for_close on an OR connection without necessarily also invoking connection_or_close{_for_error,_normally}. So far there seem to be at least three cases where we're marking generic connections without going through the right "ahem, channels" -- there will probably be more.

See my check_for_orconn_on_close branch, which does this check and in those cases calls connection_or_close_for_error() instead and logs a warning.

comment:19 Changed 6 years ago by nickm

Hm. Looks about right. It needs two things afaict: a changes file, and explicit documentation for connection_mark_for_close{,_internal} explaining which one you call when, and when you need to call connection_or_close_.* instead, and when you would want to do what the connection_handle_{read,write}_impl functions do.

comment:20 Changed 6 years ago by nickm

Oh; I think the "log_warn(LD_CHANNEL" could be log_warn(LD_CHANNEL|LD_BUG or maybe just log_warn(LD_BUG

comment:21 in reply to:  19 Changed 6 years ago by andrea

Status: needs_reviewnew

Replying to nickm:

Hm. Looks about right. It needs two things afaict: a changes file, and explicit documentation for connection_mark_for_close{,_internal} explaining which one you call when, and when you need to call connection_or_close_.* instead, and when you would want to do what the connection_handle_{read,write}_impl functions do.

All these changes now made, tested and merged.

comment:22 Changed 6 years ago by andrea

This was caused by a bad call to connection_mark_for_close() in hibernate.c. See comments on bug 7212.

comment:23 Changed 6 years ago by andrea

Status: newneeds_review

Proposed fix is in my bug7267 branch now.

comment:24 in reply to:  23 Changed 6 years ago by mr-4

Replying to andrea:

Proposed fix is in my bug7267 branch now.

As someone who was on the receiving end of a duplicate of this bug (see #7303), I can confirm that your patch works so far (I couldn't even start tor before as the above error kicked in even before tor was able to retrieve its consensus). I have applied your patch against the newly-released .6-alpha.

If there are any further issues, I'll report them here.

comment:25 Changed 6 years ago by mr-4

OK, don't know whether it is related to this bug, but when I shutdown tor I get this warning:

Tor[2060]: Something tried to close an or_connection_t without going through channels at src/or/connection.c:3120
Tor[2060]: Catching signal TERM, exiting cleanly.

comment:26 Changed 6 years ago by nickm

From IRC:

02:21 <@nickm> - I think the indentation is wonky.
02:21 <@nickm> - should this be a pattern? Is there a lot of code that does 
               this "if it's an OR_CONN then channel_mark_for_close else 
               connection_mark_for_close" thing?
02:23 <@nickm> - channel_mark_for_close and not one of the others?
02:23 <@nickm> [I think I need a comment in channel.c, any time you can write 
               one, explaining what to do with a channel that should go away. 
               There seem to be a few of them.]
02:23 <@nickm> ok, out of observations.
02:26 <@nickm> enough to go on?  Too vague?

comment:27 Changed 6 years ago by mr-4

I've got a lot more of these messages now (over 20 in the past hour or so) - this happened when I activated my semi-reliable bridges (in other words bridges that are not always up/functional/working). I had them deactivated before as without this fix applied tor was bailing out. Now, I just get the above message in my logs.

comment:28 in reply to:  27 Changed 6 years ago by andrea

Replying to mr-4:

I've got a lot more of these messages now (over 20 in the past hour or so) - this happened when I activated my semi-reliable bridges (in other words bridges that are not always up/functional/working). I had them deactivated before as without this fix applied tor was bailing out. Now, I just get the above message in my logs.

That's the warning that got put in to track down 7212; not necessarily the same bug.
Could you show a copy of your torrc and possibly some debug-level logs? Are they all for connection.c:3120?

comment:29 Changed 6 years ago by mr-4

My torrc is massive - is there any section in particular you'd like me to post? Same for debug logs.

My messages are exactly the same: "Something tried to close an or_connection_t without going through channels at src/or/connection.c:3120"

comment:30 in reply to:  29 ; Changed 6 years ago by andrea

Replying to mr-4:

My torrc is massive - is there any section in particular you'd like me to post? Same for debug logs.

Not in particular; I'm just not able to repro these very well despite hearing a lot about them, so I'd like to know what I'm doing differently. I've been spending this whole thing grepping through multigigabyte debug logs myself... don't worry about it for now, I think I figured this out.

My messages are exactly the same: "Something tried to close an or_connection_t without going through channels at src/or/connection.c:3120"

Yeah, that's possible. Did you build with bufferevents enabled, by any chance?

comment:31 in reply to:  30 ; Changed 6 years ago by aagbsn

Replying to andrea:

Replying to mr-4:

My torrc is massive - is there any section in particular you'd like me to post? Same for debug logs.

Not in particular; I'm just not able to repro these very well despite hearing a lot about them, so I'd like to know what I'm doing differently. I've been spending this whole thing grepping through multigigabyte debug logs myself... don't worry about it for now, I think I figured this out.

My messages are exactly the same: "Something tried to close an or_connection_t without going through channels at src/or/connection.c:3120"

Yeah, that's possible. Did you build with bufferevents enabled, by any chance?

I get loads of similar error messages with Tor version 0.2.4.6-alpha (git-b13c6becc892d971).

Nov 13 21:32:06.000 [warn] Something tried to close an or_connection_t without going through channels at src/or/hibernate.c:850

My torrc is the same as before (above).

I did not build with bufferevents enabled. If you want access to my relay for debugging just let me know.

comment:32 in reply to:  31 Changed 6 years ago by andrea

Replying to aagbsn:

Replying to andrea:

Replying to mr-4:

My torrc is massive - is there any section in particular you'd like me to post? Same for debug logs.

Not in particular; I'm just not able to repro these very well despite hearing a lot about them, so I'd like to know what I'm doing differently. I've been spending this whole thing grepping through multigigabyte debug logs myself... don't worry about it for now, I think I figured this out.

My messages are exactly the same: "Something tried to close an or_connection_t without going through channels at src/or/connection.c:3120"

Yeah, that's possible. Did you build with bufferevents enabled, by any chance?

I get loads of similar error messages with Tor version 0.2.4.6-alpha (git-b13c6becc892d971).

Nov 13 21:32:06.000 [warn] Something tried to close an or_connection_t without going through channels at src/or/hibernate.c:850

Try 2cb82c33bcb59ce65c9738ff8ff4977f39fa3d9f; that just got fixed.

My torrc is the same as before (above).

I did not build with bufferevents enabled. If you want access to my relay for debugging just
let me know.

Okay. I have a list of potential causes of this in bug 7472 now; if I need to use your relay I'll let you know.

comment:33 in reply to:  30 Changed 6 years ago by mr-4

Replying to andrea:

My messages are exactly the same: "Something tried to close an or_connection_t without going through channels at src/or/connection.c:3120"

Yeah, that's possible. Did you build with bufferevents enabled, by any chance?

Indeed I have.

If it is of any help, when idle (tor not used to relay traffic), I get this message appearing in my logs in a batch of about 10 every 60 minutes *exactly*, so it must be something tor is attempting to do, which relates to the "dead" bridges in my torrc file.

If you include a bunch of "dead" bridges in your torrc file, you may be able to reproduce the problem (if you need a list of those, I'd gladly provide it - just let me know).

comment:34 Changed 6 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

It appears we merged this. Cool.

comment:35 Changed 6 years ago by arma

Am I right in thinking this was a bugfix on Tor 0.2.4.4-alpha?

comment:36 in reply to:  35 Changed 6 years ago by nickm

Replying to arma:

Am I right in thinking this was a bugfix on Tor 0.2.4.4-alpha?

Yes, I believe so.

Note: See TracTickets for help on using tickets.