Opened 9 years ago

Closed 8 years ago

Last modified 7 years ago

#2001 closed defect (worksforme)

Reachability test for ORPort doesn't complete with --enable-bufferevents set

Reported by: Sebastian Owned by: nickm
Priority: High Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version: Tor: 0.2.3.1-alpha
Severity: Keywords: bufferevents tor-relay
Cc: Actual Points:
Parent ID: #3561 Points:
Reviewer: Sponsor:

Description

Looks like in master as of e268bc3e296b11b0 our reachability tests don't succeed anymore. Need to investigate more.

Child Tickets

Attachments (1)

VidaliaLog-05.14.2011.txt (61.7 KB) - added by keb 8 years ago.
log with tor-0.2.3.1-alpha and libevent-2.0.11 on ubuntu 11.04

Download all attachments as: .zip

Change History (62)

comment:1 Changed 9 years ago by Sebastian

Related to --enable-bufferevents. Still need to investigate more. :)

comment:2 Changed 9 years ago by nickm

ok, will wait until we know a) what platforms does this happen on (snow leopard only or at least 1 more), and b) whether it only happens with --enable-bufferevents.

Once that's known, the thing to find out would be how far it gets in checking its own orport. At what point does it fail, and why? Is its ORPort secretly working but not checking out as okay, or is it broken and correctly detected as broken? etc etc

comment:3 Changed 9 years ago by Sebastian

This was first analyzed on Kubuntu by SwissTorExit, so not an OS X 10.6 exclusive problem.

It only happens when --enable-bufferevents is used during configure. Testing it in a private network doesn't reveal any problems (probably because AssumeReachable is set on both authorities and relays).

When trying to get a relay (with assumereachable set) into the public network, the authorities fail their side of the reachability test though, so the node never makes it.

comment:4 Changed 9 years ago by nickm

Aug. So it seems that with bufferevents enabled, reachability testing never succeeds? The OR can't find itself to be reachable, and the authorities can't find it to be reachable either?

I was under the impression that when I tested in a private network, circuits were indeed built. Can you reproduce this on a platform where this occurs?

If that's right, this might mean that bufferevents-using tors can handshake with each other, but not with non-bufferevents tors. If that's so, I wonder how far they get before the handshake fails. If not, I wonder which part of the reachability testing goes wrong.

comment:5 Changed 9 years ago by Sebastian

Hrm. It does seem to be able to make a circuit *sometimes*, because bootstrapping succeeds

comment:6 Changed 9 years ago by nickm

Owner: set to nickm
Status: newaccepted
Summary: Reachability test for ORPort doesn't completeReachability test for ORPort doesn't complete with --enable-bufferevents set

comment:7 in reply to:  5 Changed 9 years ago by arma

Replying to Sebastian:

Hrm. It does seem to be able to make a circuit *sometimes*, because bootstrapping succeeds

One is outgoing TLS connections, the other is incoming TLS connections. Might be relevant.

comment:8 Changed 9 years ago by nickm

If incoming are failing and outgoing are suceeding, I suspect that this is one of those openssls that needs its "yes let me renegotiate!" magic patched in, and that it needs said magic patched in repeatedly since it likes to forget it. are we failing at the step where the client tries to renegotiate?

comment:9 Changed 9 years ago by nickm

Another thing that would help answering this is, are these systems that patched or upgraded OpenSSL to take SSL3_FLAGS approach to blocking renegotiation, or the SSL_OP approach? If they're both of the same kind, can we try with one that takes the other approach?

comment:10 Changed 9 years ago by Sebastian

Looks like it doesn't matter. One was compiled with openssl 1.0.0a, the other one with the apple-supplied 0.9.8l (the latter one using SSL3_FLAGS, the former SSL_OP)

comment:11 Changed 9 years ago by stars

hi,

My system was recently updated for the Openssl, it's 0.9.8k with if i am right backported all the 0.9.8l version .

To can controlled that my kubuntu version of openssl are : 7ubuntu8.2(amd64)

Best Regards

SwissTorExit

comment:12 Changed 9 years ago by nickm

03:46 < Sebastian> So here's my first test, running a private network with 
                   relays and authorities configured with 
                   --enable-bufferevents, and a client that runs without. Both 
                   using current master HEAD.
03:46 < Sebastian> client debug log available here: 
                   http://sebastianhahn.net/tor/clientdebug.log, relay debug 
                   here: http://sebastianhahn.net/tor/relaydebug.log
03:47 < Sebastian> I used StrictNodes and EntryNodes to make sure this was the 
                   node that would be used.
03:47 < Sebastian> 127.0.0.1:5001 is where the relay runs.
03:50 < Sebastian> It does indeed look like the client fails to renegotiate.
03:51 < Sebastian> nickm: trying your suggestion of old openssl now
03:52 < Sebastian> nickm: if you look at the relay log, there are weird debug 
                   log lines too, though. Looks like the bufferevents-enabled 
                   network itself has issues

comment:13 Changed 9 years ago by nickm

So on analysis, it looks to me and Sebastian as if the relay gets up to "waiting for renegotiation" then stalls, and the client gets up to "connection_tls_continue_handshake(): wanted read", and that's it. Also, it seems nothing in Sebastian's tests managed to connect to the relays.

comment:14 Changed 9 years ago by nickm

I've added a patch in my public repository under branch "loud_ssl_states". It won't fix anything, but it will log (at debug) every SSL state change that occurs in every SSL object, and maybe help us pinpoint this a little better.

comment:15 Changed 9 years ago by nickm

So for some reason, now I can reproduce the OpenSSL 1.0.0a case.

Here is what I can tell: renegotiation works and the client successfully calls connection_tls_finish_handshake. It completes, and the client sends a VERSIONS cell. Go client!

The server gets to connection_or_handle_events_cb, and makes it to the case commented as "improved handshake, but not a client". However, for some reason it does NOT call connection_or_tls_renegotiated_cb! I am guessing that the renegotiation has already finished by the time that we reach connection_or_handle_events_cb on the server side!

Compare the client's

Oct 11 12:25:32.045 [debug] connection_tls_finish_handshake(): client
tls handshake (0x1e92d60) with [scrubbed] done. verifying.

with the server's

Oct 11 12:27:33.968 [debug] tor_tls_finish_handshake(): Completed V2
TLS handshake with client; waiting for renegotiation.

and take special note of the timestamps.

comment:16 Changed 9 years ago by nickm

re the openssl 1.0.0a case:

ARG, it could be a simpler explanation.

There is no code currently to invoke the renegotiate callback in the bufferevent case. Right now, that only happens from tor_tls_read.

comment:17 Changed 9 years ago by nickm

Okay, there were at least 5 bugs here. :p I think I have them all fixed now, as of fbacbf9fd92a7 and a9172c87beaf9. At least, the relays on my test network now manage to build circuits with the offending openssl versions, which is pretty nice of them. Fixing that, though, turned up some other bugs too.

Note that you will need Libevent 2.0.8-rc, which will be released within a week--I hope! You can also use Libevent git master, 34d64f8a347147aa or later.

comment:18 Changed 9 years ago by nickm

Actually, it's possible that server-side tunneled connections might fail, since relays in my private network manage to bootstrap , but the client doesn't.

Also, the client log is full of

Oct 12 14:51:36.034 [debug] connection_or_process_cells_from_inbuf(): 6: startin
g, inbuf_datalen 512 (0 pending in tls object).
Oct 12 14:51:36.034 [debug] connection_or_process_cells_from_inbuf(): 6: startin
g, inbuf_datalen 0 (0 pending in tls object).
Oct 12 14:51:36.166 [debug] run_connection_housekeeping(): Sending keepalive to 
(127.0.0.1:3001)

at one per second.

comment:19 Changed 9 years ago by nickm

Working theory on what's going on with the tunneled connections, it looks like a lot of data is arriving at the client, but for some reason, the client is never getting the EOF.

Some extra debugs I found show that clients get here on consensus downloads...

Oct 12 15:27:50.011 [debug] connection_dir_process_inbuf(): Got data, not eof. Leaving on inbuf. We have 1923 bytes waiting on 0x811dc0

whereas relays only get here:

Oct 12 15:27:49.943 [debug] connection_dir_client_reached_eof(): Got eof on 0x22de440 with 1923 bytes in inbuf

Note that the lengths are the same.

comment:20 Changed 9 years ago by nickm

Also worth noting: this only happens on a private network. I can run a bufferevent-enabled client on its own, and it happily downloads directory stuff. So update the working theory to say that the directories aren't sending RELAY_COMMAND_END cells for some reason. This could be an exit bug, but this could also be a linked-connection bug.

comment:21 Changed 9 years ago by nickm

Yes! I can confirm that the clients never logs the message "%d: end cell (%s) for stream %d. Removing stream." in this case. So for whatever reason, either no END cell is generated, or it isn't delivered to the client, or the client doesn't notice it. I am guessing that it is the former, based on how this appears in my testing net but not when using a client with the public net.

--

Also, it seems that when we re-create a new bufferevent_openssl object callback, we don't properly set its inbuf/outbuf callbacks to inform us when stuff is successfully flushed, so we never update lastwritten on an OR connection.

Good thing _that_ one didn't make it into production! Fixed it in 5710d99

comment:22 Changed 9 years ago by nickm

Indeed, we aren't generating END cells on linked connections. The only way to get an BEV_EVENT_EOF out of a bufferevent_pair is to call bufferevent_flush on it, and the only way to get connection_edge_reached_eof() called with bufferevents is by generating a BEV_EVENT_EOF.

Arguably, libevent's bufferevent pairs should also generate a BEV_EVENT_EOF when BEV_OPT_CLOSE_ON_FREE is set on them, and you free the other one of them.

comment:23 Changed 9 years ago by Sebastian

I noticed my bufferevents-enabled Tor client going into an infinite loop when used with the public Tor network. I was able to reproduce it once, but now can't reproduce it anymore. Sad :/

comment:24 Changed 9 years ago by nickm

When? Before, or after the changes I made today? What were the symptoms?

comment:25 Changed 9 years ago by Sebastian

Before your changes, but I also stopped being able to reproduce it before I changed the binary at all. A while after the successfully bootstrapped message, Tor started using 100% cpu. No new notice log entries were created, and since I hadn't configured a torrc I couldn't just hup the process to change logging to debug :(

comment:26 Changed 9 years ago by nickm

If that happens again, maybe try attaching to it with gdb -p ?

comment:27 Changed 9 years ago by Sebastian

yeah, good suggestion. Thanks :)

comment:28 Changed 9 years ago by stars

ok, i tested again 8 hours ago on Tor network and i got different Log message after starting Tor a couples times, OrPort now be reachable but after when i try open a circuit, it start to open at infinite different circuits because all failings to be done.

Here are the different log message at notice level only:

oct. 13 01:06:48.346 [Notice] Tor v0.2.3.0-alpha-dev (git-5710d99f00b3ac7c). This is experimental software. Do not rely on it for strong anonymity. (Running on Linux x86_64)
oct. 13 01:06:48.346 [Notice] Initialized libevent version 2.0.7-rc-dev using method epoll. Good.
oct. 13 01:06:48.346 [Notice] Opening OR listener on 0.0.0.0:9090
oct. 13 01:06:48.346 [Notice] Opening Directory listener on 0.0.0.0:9091
oct. 13 01:06:48.346 [Notice] Opening Socks listener on 127.0.0.1:9050
oct. 13 01:06:48.347 [Notice] Opening DNS listener on 127.0.0.1:9053
oct. 13 01:06:48.347 [Notice] Opening Control listener on 127.0.0.1:9051
oct. 13 01:06:48.347 [Notice] Based on 775 circuit times, it looks like we don't need to wait so long for circuits to finish. We will now assume a circuit is too slow to use after waiting 19 seconds.
oct. 13 01:06:48.347 [Notice] Parsing GEOIP file.
oct. 13 01:06:50.809 [Notice] Configured to measure statistics. Look for the *-stats files that will first be written to the data directory in 24 hours from now.
oct. 13 01:06:50.809 [Notice] OpenSSL OpenSSL 0.9.8k 25 Mar 2009 [9080bf] looks like it's older than 0.9.8l, but some vendors have backported 0.9.8l's renegotiation code to earlier versions, and some have backported the code from 0.9.8m or 0.9.8n. I'll set both SSL3_FLAGS and SSL_OP just to be safe.
oct. 13 01:06:50.809 [Notice] Your Tor server's identity key fingerprint is 'SwissTorHelp A32E64A8136EBD5124048022786857CB931F584F'
oct. 13 01:06:50.809 [Notice] This version of Tor (0.2.3.0-alpha-dev) is newer than any recommended version, according to the directory authorities. Recommended versions are: 0.2.0.35,0.2.1.19,0.2.1.20,0.2.1.21,0.2.1.22,0.2.1.25,0.2.1.26,0.2.2.1-alpha,0.2.2.2-alpha,0.2.2.3-alpha,0.2.2.4-alpha,0.2.2.5-alpha,0.2.2.6-alpha,0.2.2.7-alpha,0.2.2.8-alpha,0.2.2.10-alpha,0.2.2.11-alpha,0.2.2.12-alpha,0.2.2.13-alpha,0.2.2.14-alpha,0.2.2.15-alpha,0.2.2.16-alpha,0.2.2.17-alpha
oct. 13 01:06:50.810 [Notice] Reloaded microdescriptor cache. Found 12437 descriptors.
oct. 13 01:06:50.810 [Notice] We now have enough directory information to build circuits.
oct. 13 01:06:50.810 [Notice] Bootstrapped 80%: Connecting to the Tor network.
oct. 13 01:06:50.810 [Notice] New control connection opened.
oct. 13 01:06:51.038 [Notice] Guessed our IP address as 80.218.145.226 (source: 208.83.223.34).
oct. 13 01:06:51.890 [Notice] Bootstrapped 85%: Finishing handshake with first hop.
oct. 13 01:06:54.032 [Notice] Bootstrapped 90%: Establishing a Tor circuit.
oct. 13 01:06:54.515 [Notice] Tor has successfully opened a circuit. Looks like client functionality is working.
oct. 13 01:06:54.516 [Notice] Bootstrapped 100%: Done.
oct. 13 01:06:54.516 [Notice] Now checking whether ORPort 80.218.145.226:80 and DirPort 80.218.145.226:443 are reachable... (this may take up to 20 minutes -- look for log messages indicating success)
oct. 13 01:06:54.790 [Warning] Received http status code 404 ("Not found") from server '91.208.34.27:80' while fetching consensus directory.
oct. 13 01:06:56.651 [Notice] Self-testing indicates your DirPort is reachable from the outside. Excellent.
oct. 13 01:06:56.892 [Warning] Received http status code 404 ("Not found") from server '91.208.34.18:80' while fetching consensus directory.
oct. 13 01:06:57.320 [Notice] Based on 777 circuit times, it looks like we don't need to wait so long for circuits to finish. We will now assume a circuit is too slow to use after waiting 18 seconds.
oct. 13 01:07:00.014 [Notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
oct. 13 01:07:51.833 [Warning] TLS error while handshaking (with bufferevent) with [scrubbed]: ssl handshake failure (in SSL routines:SSL3_READ_BYTES:SSL3_ST_SR_CERT_A)
oct. 13 01:08:04.322 [Warning] TLS error while handshaking (with bufferevent) with [scrubbed]: ssl handshake failure (in SSL routines:SSL3_READ_BYTES:SSL3_ST_SR_CERT_A)

commit 5710d99f00b3ac7cef0691fc9153993b0f4aa872 tor

Libevent commit 34d64f8a347147aa5a6ac222f302e12b142a8f2e

Other start case: oct. 13 01:14:47.721 [Notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
oct. 13 01:14:48.122 [Notice] Bootstrapped 85%: Finishing handshake with first hop.
oct. 13 01:16:17.330 [Warning] Problem bootstrapping. Stuck at 85%: Finishing handshake with first hop. (Connection timed out; TIMEOUT; count 10; recommendation warn)

another start :

oct. 13 01:19:10.751 [Notice] We now have enough directory information to build circuits.
oct. 13 01:19:10.751 [Notice] Bootstrapped 80%: Connecting to the Tor network.
oct. 13 01:19:10.751 [Notice] New control connection opened.
oct. 13 01:19:10.752 [Warning] TLS error while handshaking (with bufferevent) with [scrubbed]: http request (in SSL routines:SSL23_GET_CLIENT_HELLO:SSL23_ST_SR_CLNT_HELLO_A)
oct. 13 01:19:10.752 [Notice] Guessed our IP address as 80.218.145.226 (source: 193.23.244.244).
oct. 13 01:19:13.591 [Notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
oct. 13 01:19:18.803 [Notice] Bootstrapped 85%: Finishing handshake with first hop.

next one :

oct. 13 01:20:15.792 [Notice] Bootstrapped 80%: Connecting to the Tor network.
oct. 13 01:20:15.792 [Notice] New control connection opened.
oct. 13 01:20:16.689 [Notice] Bootstrapped 85%: Finishing handshake with first hop.
oct. 13 01:20:16.785 [Notice] Guessed our IP address as 80.218.145.226 (source: 213.115.239.118).
oct. 13 01:20:18.645 [Notice] Bootstrapped 90%: Establishing a Tor circuit.
oct. 13 01:20:19.889 [Notice] Tor has successfully opened a circuit. Looks like client functionality is working.
oct. 13 01:20:19.889 [Notice] Bootstrapped 100%: Done.
oct. 13 01:20:19.889 [Notice] Now checking whether ORPort 80.218.145.226:80 and DirPort 80.218.145.226:443 are reachable... (this may take up to 20 minutes -- look for log messages indicating success)
oct. 13 01:20:20.078 [Notice] Self-testing indicates your ORPort is reachable from the outside. Excellent. Publishing server descriptor.
oct. 13 01:20:21.615 [Notice] Self-testing indicates your DirPort is reachable from the outside. Excellent.
oct. 13 01:20:22.038 [Notice] Based on 794 circuit times, it looks like we don't need to wait so long for circuits to finish. We will now assume a circuit is too slow to use after waiting 20 seconds.

Look good but still never possible to reach te web, when try open a site like Tor Trac or OFTC irc I.E , all circuit fails...and so in infite...

my Torrc :

# This file was generated by Tor; if you edit it, comments will not be preserved
# The old torrc file was renamed to torrc.orig.1 or similar, and Tor will ignore it

CellStatistics 1
ContactInfo swisstorexit at Safe-mail dot net
ControlPort 9051
DataDirectory xxxxxxxxxxxxxxxxxxxxxxxxxxxx
DirListenAddress 0.0.0.0:9091
DirPort 443
DirReqStatistics 1
DNSPort 53
DNSListenAddress 127.0.0.1:9053
EntryStatistics 1
ExcludeNodes xxxxxxxxxxxxxxxxxxxxxxxxxx
ExcludeExitNodes xxxxxxxxxxxxxxxxxxx
ExitPolicy reject *:*
ExtraInfoStatistics 1
GeoIPFile xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
HashedControlPassword xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Log notice stdout
Nickname SwissTorHelp
NumCpus 4
NumEntryGuards 8
ORListenAddress 0.0.0.0:9090
ORPort 80
RefuseUnknownExits 1
RelayBandwidthBurst 1024000
RelayBandwidthRate 409600

comment:29 Changed 9 years ago by nickm

Ug. On the BEV_EVENT_EOF issue, it seems that conn_close_if _marked *is* trying to doing the flush that should be necessary to make directory servers work. So, there's some bug there. Time for yet more investigation on that.

comment:30 Changed 9 years ago by nickm

And bizarrely, connection_handle_write_cb has a check-and-flush too.

comment:31 Changed 9 years ago by nickm

Fixed the linked-connection case with cbda016bc5f. This won't fix stars's bugs above.

comment:32 Changed 9 years ago by Sebastian

Now I get reproducible segfaults when using a bufferevent client with the normal network. I believe this is because some pieces of code still try to use connection_t.inbuf with bufferevents. Also the xxx in connection_add_impl() in main.c seem important.

Here's the backtrace:

#0  buf_datalen (buf=0x0) at buffers.c:518
#1  0x000000010007fa75 in circuit_resume_edge_reading_helper (first_conn=0x10101aff0, circ=0x10101e820, layer_hint=0x1003e3ba0) at relay.c:1513
#2  0x00000001000810db in connection_edge_process_relay_cell (cell=0x7fff5fbff1f0, circ=0x10101e820, conn=0x0, layer_hint=0x1003e3ba0) at relay.c:1232
#3  0x0000000100081bea in circuit_receive_relay_cell (cell=0x7fff5fbff1f0, circ=0x10101e820, cell_direction=<value temporarily unavailable, due to optimizations>) at relay.c:223
#4  0x0000000100019951 in command_process_relay_cell [inlined] () at /tor-git/tor/src/or/command.c:437
#5  0x0000000100019951 in command_process_cell (cell=0x7fff5fbff1f0, conn=0x1010027d0) at command.c:158
#6  0x0000000100038d11 in connection_or_process_inbuf (conn=0x1010027d0) at connection_or.c:1453
#7  0x000000010002b3ed in connection_handle_read_cb (bufev=<value temporarily unavailable, due to optimizations>, arg=0x1010027d0) at connection.c:2821
#8  0x00000001001b8d82 in bufferevent_run_deferred_callbacks_locked (_=0x7fff5fbff1f0, arg=<value temporarily unavailable, due to optimizations>) at bufferevent.c:145
#9  0x00000001001b13c7 in event_process_deferred_callbacks [inlined] () at /git/libevent/event.c:1326
#10 0x00000001001b13c7 in event_base_loop (base=0x100301000, flags=<value temporarily unavailable, due to optimizations>) at event.c:1365
#11 0x000000010006b811 in do_main_loop () at main.c:1725
#12 0x000000010006baee in tor_main (argc=1, argv=<value temporarily unavailable, due to optimizations>) at main.c:2402
#13 0x00000001000011a4 in start ()

comment:33 Changed 9 years ago by nickm

should be better now. thnks!

comment:34 Changed 9 years ago by stars

Hi Nickm,

Thanks i will try tomorrow...

Best regard

comment:35 Changed 9 years ago by nickm

Err, hang on. The stack trace sebastian posted should be better now. There is still probably at least one lingering bug or two that should keep you from running it. It needs more testing on local networks first. :(

comment:36 Changed 9 years ago by erinn

How is this going? Can I build phobos a new package for #2007 or should I wait a bit longer?

comment:37 Changed 9 years ago by nickm

There are a couple more bugs to nail down. We no longer fall over when there's a traffic spike, but we need to figure out why there was a traffic spike at all, when maybe there shouldn't have been.

When that's done, for IOCP to be testable, we need to try out the filtering implementation of SSL bufferevents.

And when *that's* done, we'll be good to go. :)

comment:38 Changed 9 years ago by nickm

Milestone: Tor: 0.2.3.1-alpha
Priority: normalmajor

comment:39 Changed 9 years ago by arma

Any Tor-code-the-way-it-used-to-be things I can help answer here?

comment:40 Changed 9 years ago by nickm

The filtering stuff is in. The traffic-spike code is maybe stuff we should debug first. Or we could just go ahead and try it out and hope that any bandwidth-hogging weirdness will get detected soon. Libevent 2.0.9 will be required; that should be out by thanksgiving. Building from libevent git master would also work.

comment:41 Changed 9 years ago by arma

Is this bug still present?

There's been a newer libevent2 release since the last comment.

I noticed Erinn made test TBB bundles with libevent2. Did she set --enable-bufferevents on or off there?

comment:42 Changed 9 years ago by erinn

I didn't, but I can make new bundles today if people here will test.

comment:44 Changed 9 years ago by stars

i tested all last git origin/master and now it can build the 4 first circuit but can't build new connections to reach the web. It always failed to connect with any sites, irc ect...

It do a segfault at anytimes too when stop it but don't give any trace about the segfault.

Best regards

SwissTorExit

comment:45 in reply to:  43 Changed 9 years ago by rransom

Replying to erinn:

After some confusion was resolved in #tor-dev, I made some new packages with --enable-bufferevents for 0.2.3.0-alpha-dev. They are here:

http://archive.torproject.org/tor-package-archive/technology-preview/vidalia-bundle-0.2.3.0-alpha-dev-bufferevents-0.2.10.exe
http://archive.torproject.org/tor-package-archive/technology-preview/vidalia-bundle-0.2.3.0-alpha-dev-bufferevents-0.2.10.exe.asc

  • I started this Tor as a client, and it succeeded in building a circuit.
  • I then configured it as an unpublished bridge using Vidalia, and its reachability test succeeded. A while later, I tried to browse the web through it, and I think it didn't work. (I may be wrong about the 'it didn't work' part; I wasn't trying to figure out or remember what was going on.)
  • I stopped Tor using Vidalia, and Tor crashed.
  • I started Tor again, and it was unable to bootstrap.
  • I configured it as a client only using Vidalia, and then stopped Tor. Tor crashed again.
  • I started Tor again, and it succeeded in building a circuit.

comment:46 in reply to:  28 Changed 8 years ago by Wawe

Replying to stars:

I encountered a possibly related problem with --enable-bufferevents.
In my environment it seems that bootstrapping never succeeds when buffer events
are used.

The behaviour is similar to what stars described.

Here are the results of a series of tests in which I altered
--enable-bufferevents, the data directory, and TOR_LIBEVENT_TICKS_PER_SECOND
found in src/common/compat_libevent.h.

For the last few I documented the settings and whether bootstrapping succeeded
to post it here:

Here are the log files for the test cases listed below:
http://www-public.rz.uni-duesseldorf.de/~marad002/tor_log/notice.log
http://www-public.rz.uni-duesseldorf.de/~marad002/tor_log/debug.log

  1. no buffer events : bootstrapping succeeds
  2. enable buffer events, 1 tick/s : bootstrapping succeeds
  3. restart : bootstrapping succeeds
  4. clean out data directory, restart: stuck at "Bootstrapped 25%: Loading networkstatus consensus."
  5. restart : stuck at "Bootstrapped 25%: Loading networkstatus consensus."
  6. clean out data directory, no buffer events : bootstrapping succeeds
  7. restart : bootstrapping succeeds
  8. enable buffer events, 100 tick/s : stuck at "Bootstrapped 85%: Finishing handshake with first hop."
  9. clean out data directory, restart: stuck at "Bootstrapped 10%: Loading networkstatus consensus."
  10. no buffer events : bootstrapping succeeds
  11. enable buffer events, 100 tick/s : stuck at "Bootstrapped 85%: Finishing handshake with first hop."
  12. enable buffer events, 1 tick/s : stuck at "Bootstrapped 85%: Finishing handshake with first hop."
  13. clear data directory, no buffer events : bootstrapping succeeds
  14. enable buffer events, 1 tick/s : stuck at "Bootstrapped 90%: Establishing a Tor circuit."

Note that it gets stuck earlier if there is no data directory.
In all my testing there was only one situation in which bootstrapping succeeded
with buffer events enabled, which was in 2. above. All previous attempts failed
at some point.

My own observations so far are these:
When starting with an empty data directory, bootstrapping does not go beyond
25%.
When starting with a data directory created by a correctly working tor node
(i.e. one with buffer events disabled) bootstrapping got stuck between
80-90%. One time it even succeeded.

I also tried upgrading to a later version of OpenSSL and libevent, but it did
not have any effect. I still get the same behaviour with buffer events.

tor commit: d120ee1c63ec52e23f24f3f4f9f80a9896381406
OpenSSL version: 0.9.8g (and later 0.9.8n)
libevent version: 2.0.10 stable
Platform: Debian GNU/Linux, running in a VM
uname -a: Linux oniontest 2.6.26-2-xen-686 #1 SMP Thu Jan 27 05:44:37 UTC 2011 i686 GNU/Linux

comment:47 in reply to:  43 Changed 8 years ago by rransom

Replying to erinn:

After some confusion was resolved in #tor-dev, I made some new packages with --enable-bufferevents for 0.2.3.0-alpha-dev. They are here:

http://archive.torproject.org/tor-package-archive/technology-preview/vidalia-bundle-0.2.3.0-alpha-dev-bufferevents-0.2.10.exe
http://archive.torproject.org/tor-package-archive/technology-preview/vidalia-bundle-0.2.3.0-alpha-dev-bufferevents-0.2.10.exe.asc

The Tor in this bundle crashed about once a day when I ran it as a client on Windows 7 Professional, often when I wasn't actively using it.

comment:48 Changed 8 years ago by nickm

Do we have any way to get a stack trace out of that?

comment:49 in reply to:  48 Changed 8 years ago by rransom

Replying to nickm:

Do we have any way to get a stack trace out of that?

No. I didn't have the proper debugging tools installed, and I still don't (and I'm not sure that binary even had debugging symbols left in it).

Changed 8 years ago by keb

Attachment: VidaliaLog-05.14.2011.txt added

log with tor-0.2.3.1-alpha and libevent-2.0.11 on ubuntu 11.04

comment:50 Changed 8 years ago by keb

Version: Tor: 0.2.3.1-alpha

./configure --with-libevent-dir=/usr/local/lib --disable-largefile --enable-bufferevents --enable-gcc-hardening --enable-linker-hardening --disable-transparent --enable-instrument-downloads --enable-gcc-warnings

make and make check succeeded.
cannot use it as a client either.

comment:51 Changed 8 years ago by nickm

Parent ID: #3561

comment:52 Changed 8 years ago by nickm

Milestone: Tor: 0.2.3.1-alphaTor: 0.2.3.x-final

Move from 0.2.3.1-alpha milestone to 0.2.3.x-final milestone.

comment:53 Changed 8 years ago by erinn

I come bearing gifts. I have two bundles (Tor expert package and Vidalia bundle) built with the latest libevent (2.0.13-stable) and Tor alpha. Please give them some basic testing and let me know what should be announced for wider consideration (e.g., on the blog, tor-talk, etc.) I am not going to announce anything that is clearly crashy for relays, but I will announce things that are occasionally crashy in new ways.

https://archive.torproject.org/tor-package-archive/technology-preview/tor-0.2.3.2-alpha-bufferevents-win32.exe
https://archive.torproject.org/tor-package-archive/technology-preview/tor-0.2.3.2-alpha-bufferevents-win32.exe.asc

https://archive.torproject.org/tor-package-archive/technology-preview/vidalia-bundle-0.2.3.2-alpha-bufferevents-0.3.0-alpha.exe
https://archive.torproject.org/tor-package-archive/technology-preview/vidalia-bundle-0.2.3.2-alpha-bufferevents-0.3.0-alpha.exe.asc

There are also some packages there built without bufferevents, for comparison or other types of testing. You can see the whole list here:

https://archive.torproject.org/tor-package-archive/technology-preview/

comment:54 Changed 8 years ago by Sebastian

#3615 might be relevant. Does it show up?

comment:55 Changed 8 years ago by nickm

Okay, we NO LONGER WANT TESTING on the above bundles.

There are some bad bugs in them that we fixed; testing those is now entirely pointless. If you're testing them, please stop. More bundles forthcoming once I can debug more.

comment:56 Changed 8 years ago by nickm

If you're building from source on unix or windows[*], please try out the very latest tor master branch with the very latest libevent patches-2.0 branch. I think they're working fairly nicely.

(To test with IOCP on Windows, first test as normal to see whether non-IOCP works. Then, run with "DisableIOCP 0".)

I have tried using it as a client on a few platforms, and as a bridge on windows with IOCP. It seemed to work in all those cases.

[*] Sebastian has decent windows mingw build instructions here: http://sebastianhahn.net/tor/bufferevents

comment:57 Changed 8 years ago by Sebastian

Do we believe this is working now (and can the bug be closed?), yes?

comment:58 Changed 8 years ago by nickm

I believe that this specific bug is working and can be closed.

Sadly, some people have used this bug as a catch-all bug for "something screwy happened with bufferevents enabled" ; I am not sure that every issue mentioned here is currently solved. We should do a once-over to make sure that every new issue brought up in the comments is either fixed or has a ticket, and then close this.

comment:59 Changed 8 years ago by nickm

Keywords: bufferevents added
Resolution: worksforme
Status: acceptedclosed

There are remaining bufferevents bugs, but they are not this bug. Closing.

comment:60 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:61 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.