Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#4697 closed defect (fixed)

Assertion connection_state_is_connecting(conn) failed; aborting

Reported by: keb Owned by:
Priority: Very High Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version: Tor: 0.2.3.9-alpha
Severity: Keywords: bufferevents tor-relay
Cc: asn, nickm, Sebastian, troll_un, ln5 Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

after only 1 hours my relay crashed:

Dec 10 20:27:59.000 [notice] Tor 0.2.3.9-alpha (git-e44205ee6c3816b8) opening new log file.
Dec 10 20:27:59.000 [notice] Generating fresh dynamic DH modulus. This might take a while...
Dec 10 20:28:00.000 [notice] Parsing GEOIP file /usr/local/share/tor/geoip.
Dec 10 20:28:00.000 [notice] Configured to measure statistics. Look for the *-stats files that will first be written to the data directory in 24 hours from now.
Dec 10 20:28:00.000 [notice] No AES engine found; using AES_* functions.
Dec 10 20:28:00.000 [notice] OpenSSL OpenSSL 0.9.8g 19 Oct 2007 [90807f] looks like it's older than 0.9.8l, but some vendors have backported 0.9.8l's renegotiation code to earlier versions, and some have backported the code from 0.9.8m or 0.9.8n. I'll set both SSL3_FLAGS and SSL_OP just to be safe.
Dec 10 20:28:00.000 [notice] Your Tor server's identity key fingerprint is 'Pasquino D3ED911183DF31FDB744A22F94BE6A856AC77D12'
Dec 10 20:28:01.000 [notice] Reloaded microdescriptor cache. Found 8524 descriptors.
Dec 10 20:28:01.000 [notice] Removed 2833/8524 microdescriptors as old.
Dec 10 20:28:05.000 [notice] We now have enough directory information to build circuits.
Dec 10 20:28:05.000 [notice] Bootstrapped 80%: Connecting to the Tor network.
Dec 10 20:28:06.000 [notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
Dec 10 20:28:06.000 [notice] Heartbeat: Tor's uptime is 0:00 hours, with 5 circuits open. I've sent 13 kB and received 9 kB.
Dec 10 20:28:06.000 [notice] Bootstrapped 85%: Finishing handshake with first hop.
Dec 10 20:28:06.000 [notice] Bootstrapped 90%: Establishing a Tor circuit.
Dec 10 20:28:09.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Dec 10 20:28:09.000 [notice] Bootstrapped 100%: Done.
Dec 10 20:29:10.000 [notice] Self-testing indicates your DirPort is reachable from the outside. Excellent.
Dec 10 20:29:40.000 [notice] Performing bandwidth self-test...done.
Dec 10 20:58:07.000 [notice] Removed 10/5693 microdescriptors as old.
Dec 10 21:25:17.000 [err] connection_handle_event_cb(): Bug: connection.c:2964: connection_handle_event_cb: Assertion connection_state_is_connecting(conn) failed; aborting.

i compiled with linker and gcc hardening so i will rebuild it and try again. is there any other info you will need?

Child Tickets

Attachments (4)

state (15.9 KB) - added by keb 8 years ago.
state.2 (12.3 KB) - added by keb 8 years ago.
state after dec 12 error
debug.log (32.0 KB) - added by keb 8 years ago.
3rd run. middle 187MB or so snipped.
debug.log.1.bz2 (61.0 KB) - added by keb 8 years ago.
log of last 2 minutes before abort from same relay, with rransom debugging patch (1st version)

Download all attachments as: .zip

Change History (21)

Changed 8 years ago by keb

Attachment: state added

comment:1 Changed 8 years ago by rransom

Cc: asn nickm Sebastian troll_un added
Priority: normalcritical

comment:2 Changed 8 years ago by rransom

Your relay won't hit this assertion failure (or this assertion) if you recompile without --enable-bufferevents, and it'll probably be faster and/or use less CPU too. (But we need to debug and fix this anyway.)

comment:3 Changed 8 years ago by rransom

Keywords: bufferevents added
Milestone: Tor: 0.2.3.x-final

comment:4 Changed 8 years ago by rransom

Which version of libevent are you using?

comment:5 in reply to:  4 Changed 8 years ago by keb

libevent 2.0.16-stable

comment:6 Changed 8 years ago by keb

Rebuilt with
./configure --disable-largefile --enable-bufferevents --enable-gcc-warnings --with-libevent-dir=/usr/local/lib
after 4 hours the only error in the log is
`
Dec 11 20:57:31.000 [warn] TLS error while handshaking (with bufferevent) with [scrubbed]: ssl handshake failure (in SSL routines:SSL3_READ_BYTES:SSL3_ST_SR_CERT_A)
`

comment:7 in reply to:  6 Changed 8 years ago by asn

Cc: ln5 added

Replying to keb:

Rebuilt with
./configure --disable-largefile --enable-bufferevents --enable-gcc-warnings --with-libevent-dir=/usr/local/lib
after 4 hours the only error in the log is
`
Dec 11 20:57:31.000 [warn] TLS error while handshaking (with bufferevent) with [scrubbed]: ssl handshake failure (in SSL routines:SSL3_READ_BYTES:SSL3_ST_SR_CERT_A)
`

This one seems to be #3769. I'm not sure if it's related to the original bug of this ticket.

I also CCed Linus, in case IPv6 code has something to do with the original bug.

comment:8 Changed 8 years ago by keb

Dec 11 23:17:41.000 [err] connection_handle_event_cb(): Bug: connection.c:2964: connection_handle_event_cb: Assertion connection_state_is_connecting(conn) failed; aborting.

it died again but no core dump :(

Changed 8 years ago by keb

Attachment: state.2 added

state after dec 12 error

comment:9 Changed 8 years ago by rransom

See bug4697-debug1 ( https://git.torproject.org/rransom/tor.git bug4697-debug1) for a patch that might help diagnose this bug.

Changed 8 years ago by keb

Attachment: debug.log added

3rd run. middle 187MB or so snipped.

comment:10 Changed 8 years ago by keb

thanks, will try the patch later tonight.

comment:11 Changed 8 years ago by rransom

Oops. I used conn_array_index in that patch; I should have used the global_identifier field instead. (It'll emit the information I think we need anyway, but we should fix that buglet before merging the patch.)

comment:12 Changed 8 years ago by rransom

Fixup pushed.

Changed 8 years ago by keb

Attachment: debug.log.1.bz2 added

log of last 2 minutes before abort from same relay, with rransom debugging patch (1st version)

comment:13 Changed 8 years ago by rransom

I just saw that keb had attached new logs. (Trac doesn't send e-mail notifications for new attachments.)

The important lines are right at the end:

Dec 13 03:07:09.000 [debug] connection_handle_event_cb(): event occurred on conn 75 (Exit connecting (marked for close))
Dec 13 03:07:09.000 [err] connection_handle_event_cb(): Bug: connection.c:2971: connection_handle_event_cb: Assertion connection_state_is_connecting(conn) failed; aborting.

So an exit connection was marked for close before it had finished connecting. Can we just ignore _CONNECTED events for connections that are already marked for close? Can we ignore all events for connections that are already marked for close?

comment:14 Changed 8 years ago by nickm

Status: newneeds_review

I think indeed that we can ignore all events for connections marked for close.

Please review possible fix in branch bug4697 in my public repository.

Do you think I should merge your debug-log thing too?

comment:15 Changed 8 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Fix still looks ok. Merging.

comment:16 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:17 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.