Opened 8 years ago

Closed 3 years ago

#4588 closed defect (wontfix)

Tor + bufferevents hang on with cpu at 100%

Reported by: stars Owned by: nickm
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.2.3.8-alpha
Severity: Keywords: bufferevents hang tor-relay
Cc: stars@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi,

After the last Tor update yesterday, Tor hang on with all cpu's at 100%. It look like that this problem appear after a couple hours and make the relay out of consensus too.

Tor version : "0.2.3.8-alpha-dev (git-58d1aa44023e8b45)"
Libevent git version: commit f3b89dec9eac2cf4000c8dc9467abdbf27121674
I running Kubuntu Lucid 10.02.4 LTS bufferevents enabled
vidalia 0.31-git
OpenSSL 1.0.0e 6 Sep 2011

This problem was never appear before

Best Regards

SwissTorExit

Child Tickets

Change History (22)

comment:1 Changed 8 years ago by nickm

What was the last version before this that worked for you?

To debug, it would be good to know where we're actually freezing. Try running with "DisableDebuggerAttachment 0", then if it locks up again, attach a debugger with "gdb -p {PID of tor}", and use "bt" to get a stack trace and figure out where Tor is stalled?

comment:2 Changed 8 years ago by stars

Cc: stars@… added

Hi nickm,

If i am right i was running Tor 0.2.3.7-alpha (git-07a5fd10544be1ce) on Linux x86_64 before the problem appear...

I am running now with "DisableDebuggerAttachment 0" and put the strace soon as possible

Thanks and best regards

comment:3 Changed 8 years ago by stars

here are my starce:

Loaded symbols for /lib64/ld-linux-x86-64.so.2
[Thread debugging using libthread_db enabled]
[New Thread 0x7f9930b78700 (LWP 32140)]
[New Thread 0x7f9931379700 (LWP 32139)]
[New Thread 0x7f9931b7a700 (LWP 32138)]
[New Thread 0x7f993237b700 (LWP 32137)]
bufferevent_suspend_read (bufev=0x7f992c12d950, what=4) at bufferevent.c:72
72 if (!bufev_private->read_suspended)
(gdb) bt
#0 bufferevent_suspend_read (bufev=0x7f992c12d950, what=4) at bufferevent.c:72
#1 0x00007f99347aae47 in _bufferevent_get_rlim_max (bev=<value optimized out>)

at bufferevent_ratelim.c:241

#2 _bufferevent_get_read_max (bev=<value optimized out>) at bufferevent_ratelim.c:263
#3 0x00007f99349d7834 in do_read (bev_ssl=0x7f992c12d950, n_to_read=0) at bufferevent_openssl.c:572
#4 0x00007f99349d80ea in consider_reading (bev_ssl=0x7f992c12d950) at bufferevent_openssl.c:775
#5 0x00007f99349d8178 in be_openssl_readeventcb (fd=<value optimized out>, what=0,

ptr=0x7f992c12d950) at bufferevent_openssl.c:890

#6 0x00007f993479f5e3 in event_process_active_single_queue (base=0x7f99360a8240,

activeq=0x7f99360a92a0, max_to_process=<value optimized out>, endtime=<value optimized out>)
at event.c:1407

#7 0x00007f99347a01b8 in event_process_active (base=0x7f99360a8240, flags=<value optimized out>)

at event.c:1506

#8 event_base_loop (base=0x7f99360a8240, flags=<value optimized out>) at event.c:1709
#9 0x00007f99352ac5f7 in do_main_loop ()
#10 0x00007f99352adc8f in tor_main ()
#11 0x00007f99352a786c in main ()

(gdb) next
74 bufev_private->read_suspended |= what;
(gdb) next
75 BEV_UNLOCK(bufev);
(gdb) next
_bufferevent_get_rlim_max (bev=<value optimized out>) at bufferevent_ratelim.c:251
251 UNLOCK_GROUP(g);
(gdb) next
_bufferevent_get_read_max (bev=0x0) at bufferevent_ratelim.c:264
264 }
(gdb) next
do_read (bev_ssl=0x7f992c12d950, n_to_read=3184) at bufferevent_openssl.c:576
576 n = evbuffer_reserve_space(input, n_to_read, space, 2);
(gdb) next
577 if (n < 0)
(gdb) next
576 n = evbuffer_reserve_space(input, n_to_read, space, 2);
580 for (i=0; i<n; ++i) {
do_read (bev_ssl=0x7f992c12d950, n_to_read=3184) at bufferevent_openssl.c:623
623 _bufferevent_run_readcb(bev);
627 }
consider_reading (bev_ssl=0x7f992c12d950) at bufferevent_openssl.c:786
786 n_to_read = SSL_pending(bev_ssl->ssl);
774 while (n_to_read) {
786 n_to_read = SSL_pending(bev_ssl->ssl);
775 if (do_read(bev_ssl, n_to_read) <= 0)

comment:4 Changed 8 years ago by stars

a new one after a new crash a bit different:

Loaded symbols for /lib64/ld-linux-x86-64.so.2
[Thread debugging using libthread_db enabled]
[New Thread 0x7fab07103700 (LWP 4997)]
[New Thread 0x7faafffff700 (LWP 4996)]
[New Thread 0x7fab07904700 (LWP 4995)]
[New Thread 0x7fab08105700 (LWP 4994)]
0x00007fab0a50b410 in _evbuffer_read_setup_vecs@plt () from /usr/local/lib/libevent-2.1.so.1
(gdb) bt
#0 0x00007fab0a50b410 in _evbuffer_read_setup_vecs@plt () from /usr/local/lib/libevent-2.1.so.1
#1 0x00007fab0a5172de in evbuffer_reserve_space (buf=0x7fab0eac8e30, size=0, vec=0x7fff46d63fa0,

n_vecs=2) at buffer.c:651

#2 0x00007fab0a74984e in do_read (bev_ssl=0x7fab0df23260, n_to_read=2960)

at bufferevent_openssl.c:576

#3 0x00007fab0a74a0ea in consider_reading (bev_ssl=0x7fab0df23260) at bufferevent_openssl.c:775
#4 0x00007fab0a74a178 in be_openssl_readeventcb (fd=<value optimized out>, what=0,

ptr=0x7fab0df23260) at bufferevent_openssl.c:890

#5 0x00007fab0a5115e3 in event_process_active_single_queue (base=0x7fab0cd30240,

activeq=0x7fab0cd312a0, max_to_process=<value optimized out>, endtime=<value optimized out>)
at event.c:1407

#6 0x00007fab0a5121b8 in event_process_active (base=0x7fab0cd30240, flags=<value optimized out>)

at event.c:1506

#7 event_base_loop (base=0x7fab0cd30240, flags=<value optimized out>) at event.c:1709
#8 0x00007fab0b01e5f7 in do_main_loop ()
#9 0x00007fab0b01fc8f in tor_main ()
#10 0x00007fab0b01986c in main ()

(gdb) next
Single stepping until exit from function _evbuffer_read_setup_vecs@plt,
which has no line number information.
_evbuffer_read_setup_vecs (buf=0x7fab0eac8e30, howmuch=0, vecs=0x7fff46d63fa0, n_vecs_avail=2,

chainp=0x7fff46d63f48, exact=0) at buffer.c:2037

2037 {
2042 ASSERT_EVBUFFER_LOCKED(buf);
2044 if (howmuch < 0)
2049 firstchainp = buf->last_with_datap;
2050 if (CHAIN_SPACE_LEN(*firstchainp) == 0) {
2051 firstchainp = &(*firstchainp)->next;
2055 for (i = 0; i < n_vecs_avail && so_far < (size_t)howmuch; ++i) {
2065 *chainp = firstchainp;
2067 }
evbuffer_reserve_space (buf=0x7fab0eac8e30, size=<value optimized out>, vec=0x7fff46d63fa0, n_vecs=2)

at buffer.c:656

656 EVBUFFER_UNLOCK(buf);
659 }
do_read (bev_ssl=0x7fab0df23260, n_to_read=2960) at bufferevent_openssl.c:577
577 if (n < 0)
576 n = evbuffer_reserve_space(input, n_to_read, space, 2);
580 for (i=0; i<n; ++i) {
do_read (bev_ssl=0x7fab0df23260, n_to_read=2960) at bufferevent_openssl.c:623
623 _bufferevent_run_readcb(bev);
627 }
consider_reading (bev_ssl=0x7fab0df23260) at bufferevent_openssl.c:786
786 n_to_read = SSL_pending(bev_ssl->ssl);
774 while (n_to_read) {

comment:5 Changed 8 years ago by nickm_mobile

Interesting! This is making me suspect a libevent bug more than a tor bug. Libevent had some bug fixes there in 2.0.16-stable that may have caused a regression. Can you try the next one as "bt full" rather than just "bt" to get more info about the status of local variables?

comment:6 Changed 8 years ago by stars

here are my new trace :

Loaded symbols for /lib64/ld-linux-x86-64.so.2
[Thread debugging using libthread_db enabled]
[New Thread 0x7fb2123df700 (LWP 12462)]
[New Thread 0x7fb212be0700 (LWP 12461)]
[New Thread 0x7fb2133e1700 (LWP 12460)]
[New Thread 0x7fb213be2700 (LWP 12459)]
ev_token_bucket_get_tick (tv=0x7fff59a8b120, cfg=0x7fb219a014c0) at bufferevent_ratelim.c:140
140 }
(gdb) bt full
#0 ev_token_bucket_get_tick (tv=0x7fff59a8b120, cfg=0x7fb219a014c0) at bufferevent_ratelim.c:140
No locals.
#1 0x00007fb215ff8d88 in bufferevent_update_buckets (bev=0x7fb219f73880) at bufferevent_ratelim.c:120

now = {tv_sec = 1322428891, tv_usec = 707240}
tick = <value optimized out>

#2 _bufferevent_get_rlim_max (bev=0x7fb219f73880) at bufferevent_ratelim.c:225

max_so_far = 16384

#3 _bufferevent_get_read_max (bev=0x7fb219f73880) at bufferevent_ratelim.c:263
No locals.
#4 0x00007fb216225834 in do_read (bev_ssl=0x7fff59a8b120, n_to_read=429921472)

at bufferevent_openssl.c:572

input = 0x7fb21aab7010
r = <value optimized out>
n = <value optimized out>
n_used = <value optimized out>
blocked = <value optimized out>
space = {{iov_base = 0x7fb21a3a3400, iov_len = 96}, {iov_base = 0x4000,

iov_len = 140402916538496}}

#5 0x00007fb2162260ea in consider_reading (bev_ssl=0x7fb219f73880) at bufferevent_openssl.c:775

r = <value optimized out>
n_to_read = <value optimized out>

#6 0x00007fb216226178 in be_openssl_readeventcb (fd=<value optimized out>, what=0,

ptr=0x7fb219f73880) at bufferevent_openssl.c:890

No locals.
#7 0x00007fb215fed5e3 in event_process_active_single_queue (base=0x7fb218d15240,

activeq=0x7fb218d162a0, max_to_process=<value optimized out>, endtime=<value optimized out>)
at event.c:1407

ev = 0x7fb219f73890
count = 1
func = "event_process_active_single_queue"

#8 0x00007fb215fee1b8 in event_process_active (base=0x7fb218d15240, flags=<value optimized out>)

at event.c:1506

i = 0
tv = {tv_sec = 951494, tv_usec = 817325}
maxcb = 2147483647

---Type <return> to continue, or q <return> to quit---

endtime = 0x0
limit_after_prio = 2147483647

#9 event_base_loop (base=0x7fb218d15240, flags=<value optimized out>) at event.c:1709

n = 339387029
evsel = 0x7fb21621fd80
tv = {tv_sec = 0, tv_usec = 460}
tv_p = <value optimized out>
res = <value optimized out>
retval = <value optimized out>
func = "event_base_loop"

#10 0x00007fb216afa5f7 in do_main_loop ()
No symbol table info available.
#11 0x00007fb216afbc8f in tor_main ()
No symbol table info available.
#12 0x00007fb216af586c in main ()
No symbol table info available.

(gdb) next
bufferevent_update_buckets (bev=0x7fb219f73880) at bufferevent_ratelim.c:121
121 if (tick != bev->rate_limiting->limit.last_updated)
_bufferevent_get_rlim_max (bev=0x7fb219f73880) at bufferevent_ratelim.c:226
226 max_so_far = LIM(bev->rate_limiting->limit);
228 if (bev->rate_limiting->group) {
232 LOCK_GROUP(g);
233 if (GROUP_SUSPENDED(g)) {
_bufferevent_get_read_max (bev=0x7fb219f73880) at bufferevent_ratelim.c:263
263 return _bufferevent_get_rlim_max(bev, 0);
264 }
do_read (bev_ssl=0x7fb219f73880, n_to_read=3968) at bufferevent_openssl.c:576
576 n = evbuffer_reserve_space(input, n_to_read, space, 2);
577 if (n < 0)

comment:7 Changed 8 years ago by nickm

Hm. I'll try to investigate, but it might take a little while till I am no longer swamped with deadlines. In the meantime, if you want to see if I'm right to blame this on Libevent, maybe try and see if Libevent 2.0.15-stable works better for you?

comment:8 Changed 8 years ago by stars

Hi nickm,

i have right now switched to "release-2.0.15-stable" branch and give my feedback soon as possible...

comment:9 Changed 8 years ago by Sebastian

Hrm, I'm seeing gabelmoo using 100% cpu (but it isn't hanging, it's making progress) after upgrading to current master, too. Libevent is at 1.4.13-stable however. I guess getting a profile is next

comment:10 in reply to:  9 Changed 8 years ago by nickm

Replying to Sebastian:

Hrm, I'm seeing gabelmoo using 100% cpu (but it isn't hanging, it's making progress) after upgrading to current master, too. Libevent is at 1.4.13-stable however. I guess getting a profile is next

If it is not running with bufferevents, it's probably a different issue. If that turns out to be the case, and/or it's stalling somewhere different from stars's code, open a new ticket so we can tell the issues apart?

comment:11 Changed 8 years ago by Sebastian

Oh, yes. No bufferevents here. Sorry for abusing this ticket then. Will make a new one once I have a profile.

comment:12 Changed 8 years ago by stars

Hi Nickm,

You has right about Libevent dev, i am running now "release-2.0.15-stable" branch and so far look good, the cpu's don't hang any more...

I wait your go to update again to last Libevent master

Thanks for your quick answer

Best Regards

stars

comment:13 Changed 8 years ago by nickm

Milestone: Tor: 0.2.3.x-final
Owner: set to nickm
Status: newaccepted

comment:14 Changed 8 years ago by nickm

Keywords: bufferevents hang added

comment:15 Changed 8 years ago by nickm

Does Libevent 2.0.17-stable help this at all? It tries to fix the SSL bufferevent issue in 2.0.16

comment:16 Changed 8 years ago by stars

Hi Nickm,

I have tested "Libevent 2.0.17-stable" but still the problem... It just be overloaded a few time later as 2.0.16.. but look to got less warning about SSL certificats.

BTW: i running "OpenSSL 1.0.0g 18 Jan 2012" and only "Libevent-2.0.15-stable" work actually

Best Regards

SwissTorHelp

comment:17 Changed 7 years ago by arma

Summary: Tor hang on with cpu at 100%Tor + bufferevents hang on with cpu at 100%

comment:18 Changed 7 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: 0.2.4.x-final
Priority: criticalmajor

comment:19 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:20 Changed 7 years ago by nickm

Component: Tor RelayTor

comment:21 Changed 7 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: unspecified

comment:22 Changed 3 years ago by nickm

Resolution: wontfix
Status: acceptedclosed

The bufferevents code and corresponding build options have been removed in 0.2.9.2-alpha

Note: See TracTickets for help on using tickets.