Opened 12 years ago

Last modified 7 years ago

#622 closed defect (Fixed)

Tor 0.2.1.0-alpha-dev (r13924): 100% CPU usage: conn_read_callback(): socket 340 wants to read.

Reported by: Safari Owned by:
Priority: Low Milestone: 0.2.0.x-final
Component: Core Tor/Tor Version: 0.1.2.19
Severity: Keywords:
Cc: Safari, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I run 0.2.1.0-alpha-dev (r13924) with Linux 2.6.24.3 x86_64, libevent-1.3d + epoll, openssl latest cvs.
I have not noticed this problem with v0.1.* tor.

I noticed 100% CPU usage (for one CPU out of two :) ) and enabled debug:
[debug] conn_read_callback(): socket 34 wants to read.
Around 21000 of these per second.

This lasted for several minutes.
Tor started hahaving OK all by itself before I had time to do much, except for oprofile,
but this would probably be obvious also from the debug line.

CPU: P4 / Xeon, speed 2797.2 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 240000
Counted MACHINE_CLEAR events (cycles with entire machine pipeline cleared) with a unit mask of 0x01 (count a portion of cycles the machine is cleared for any cause) count 12000
Counted FSB_DATA_ACTIVITY events (DRDY or DBSY events on the front side bus) with a unit mask of 0x03 (multiple flags) count 60000
Counted RETIRED_BRANCH_TYPE events (retired branches, selected by type) with a unit mask of 0x1f (multiple flags) count 180000
Counted RETIRED_MISPRED_BRANCH_TYPE events (retired mispredicted branched, selected by type) with a unit mask of 0x1f (multiple flags) count 6000
samples % samples % samples % samples % samples % image name symbol name
18142 10.9848 194 11.7150 40 2.1882 3891 11.3477 4608 24.4158 tor-0.2.1.0-r13924 connection_handle_read
18020 10.9109 164 9.9034 26 1.4223 4022 11.7297 3166 16.7753 tor-0.2.1.0-r13924 conn_read_callback
17185 10.4053 180 10.8696 38 2.0788 6272 18.2916 123 0.6517 tor-0.2.1.0-r13924 assert_connection_ok
12498 7.5674 95 5.7367 9 0.4923 1734 5.0570 72 0.3815 tor-0.2.1.0-r13924 tor_tls_get_error
11508 6.9680 94 5.6763 48 2.6258 3249 9.4753 46 0.2437 tor-0.2.1.0-r13924 assert_buf_ok
10444 6.3237 106 6.4010 23 1.2582 2296 6.6960 551 2.9195 tor-0.2.1.0-r13924 _openssl_locking_cb
9895 5.9913 83 5.0121 11 0.6018 1535 4.4767 38 0.2013 tor-0.2.1.0-r13924 connection_bucket_round_robin
9292 5.6262 89 5.3744 3 0.1641 1694 4.9404 3047 16.1448 tor-0.2.1.0-r13924 tor_tls_renegotiate
8628 5.2242 99 5.9783 17 0.9300 879 2.5635 166 0.8796 [vdso] (tgid:19150 range:0x7fffe7ffe000-0x7fffe8000000) (no symbols)
6003 3.6347 60 3.6232 16 0.8753 184 0.5366 6 0.0318 tor-0.2.1.0-r13924 buf_datalen
4633 2.8052 64 3.8647 751 41.0832 330 0.9624 27 0.1431 tor-0.2.1.0-r13924 router_get_by_nickname
4375 2.6490 41 2.4758 3 0.1641 775 2.2602 8 0.0424 tor-0.2.1.0-r13924 tor_mutex_acquire
4128 2.4995 39 2.3551 3 0.1641 267 0.7787 8 0.0424 tor-0.2.1.0-r13924 tor_mutex_release
4103 2.4843 40 2.4155 0 0 771 2.2485 4500 23.8436 tor-0.2.1.0-r13924 connection_tls_continue_handshake
2698 1.6336 33 1.9928 8 0.4376 511 1.4903 11 0.0583 tor-0.2.1.0-r13924 connection_process_inbuf
2303 1.3944 38 2.2947 7 0.3829 1113 3.2459 15 0.0795 tor-0.2.1.0-r13924 tor_addr_is_internal
2068 1.2521 18 1.0870 4 0.2188 484 1.4115 12 0.0636 tor-0.2.1.0-r13924 connection_is_listener
1699 1.0287 16 0.9662 48 2.6258 102 0.2975 65 0.3444 tor-0.2.1.0-r13924 rijndaelEncrypt
1618 0.9797 14 0.8454 8 0.4376 598 1.7440 6 0.0318 tor-0.2.1.0-r13924 connection_counts_as_relayed_traffic
1615 0.9779 15 0.9058 5 0.2735 151 0.4404 82 0.4345 tor-0.2.1.0-r13924 logv

[Automatically added by flyspray2trac: Operating System: Fedora Core Linux]

Child Tickets

Change History (14)

comment:1 Changed 12 years ago by nickm

Interesting! A few questions:

  • Am I right in understanding that the oprofile was started _after_ CPU use went to 100%?
  • Are there any other log messages in between the "socket 34 wants to read" ones? Especially, are there any other that mention socket 34?
  • The next time this happens, can you try sending Tor a USR1 signal to make it dump socket status for all the sockets, including the one that's misbehaving?

My current guess is that this has something to do with the new TLS renegotiation code, so I'll start looking
there, but more data would definitely help.

Thanks!

comment:2 Changed 12 years ago by Safari

1) Correct
2) Only this debug line mentioned "socket 34" string.
Other stuff near it:

2008-03-12 04:55:05.422573077 [debug] conn_read_callback(): socket 34 wants to read.
2008-03-12 04:55:05.422613264 [debug] conn_read_callback(): socket 31 wants to read.
2008-03-12 04:55:05.422643402 [debug] connection_read_to_buf(): 31: starting, inbuf_datalen 0 (0 pending in tls object). at_most 16384.
2008-03-12 04:55:05.422687034 [debug] tor_tls_read(): read returned r=-1, err=-2
2008-03-12 04:55:05.422715449 [debug] connection_read_to_buf(): After TLS read of 0: 1420 read, 0 written
2008-03-12 04:55:05.422743785 [debug] connection_or_process_cells_from_inbuf(): 31: starting, inbuf_datalen 0 (0 pending in tls object).
2008-03-12 04:55:05.422778150 [debug] conn_read_callback(): socket 34 wants to read.

Didn't know about USR1 feature, I try to try that.
I also added "SafeLogging 0".

Not related to this problem probably, but when I do 'R' in openssl client ('R' does renegotiate),
openssl server reports:
2008-03-12 17:03:48.169027724 * glibc detected * /usr/local/bin/sslserver-20071120: double free or corruption (fasttop): 0x0000000006805740 *
This was reported maybe year or two ago to openssl-dev but seems their antispam deleted my email.
This is to let you know in case you have similar problems with tor renegotiation.

comment:3 Changed 12 years ago by Safari

OK where do I send the log file?
78615 lines, 470 kB bzip2'ed.

2008-03-12 20:49:26.677586048 [debug] conn_read_callback(): socket 125 wants to read.
2008-03-12 20:49:29.599897402 [info] Conn 117 (socket 125) type 4 (OR), state 5 (renegotiating (TLS)), created 27 secs ago
2008-03-12 20:49:32.915489887 [debug] conn_read_callback(): socket 125 wants to read.

comment:4 Changed 12 years ago by nickm

Aha! state 5 (renegotiating (TLS)).

You can send the log to nickm @ torproject . org.

comment:5 Changed 12 years ago by nickm

received!

comment:6 Changed 12 years ago by nickm

From IRC, so I don't lose it:

14:14 < nickm> I think it's a matter of tor_tls_renegotiate() returning

WANTREAD, and so Tor registers the connection for reading, but
when it *can* read, Tor doesn't actually call
tor_tls_renegotiate().

14:16 < nickm> the solution might be as simple as checking the connection type

and state in connection_handle_read(), and invoking
connection_tls_continue_handshake() as appropriate.

14:17 < nickm> Also, now that I look at it, it seems possible that the

wantread/wantwrite logic in connection_tls_continue_handshake()
is wrong.

comment:7 Changed 12 years ago by Safari

In libevent, epoll.c:epoll_dispatch(), if event has EPOLLHUP or EPOLLERR set,
epoll_dispatch lies and sets both EPOLLIN and EPOLLOUT.

So, in tor, you do not know error happened, because EPOLLHUP+EPOLLERR are not propagated to
caller of dispatch and when read() returns 0, fd is dead and you must call close() or remove event properly.
Otherwise you get the infinite loop (for five minutes) as described above.

If tor does this, problem is elsewhere.

comment:8 Changed 12 years ago by nickm

10:43 < nickm> So you're saying that tor_tls_renegotiate() is returning

_TOR_TLS_SYSCALL and _TOR_TLS_ZERORETURN, which the caller isn't
handling.

10:43 < lodger> yes

Lodger reports that fixing this problem makes the bug go away for him. If so, the bug should be fixed in
current svn trunk and will be fixed in 0.2.0.22-rc. Please let me know whether we got it or not?

comment:9 Changed 12 years ago by Safari

Fix looks just fine.
I am now testing...

comment:10 Changed 11 years ago by nickm

Hi! As far as I can tell, the fix works for me. Does it work for you?

comment:11 Changed 11 years ago by Safari

Yes, no problems here.

comment:12 Changed 11 years ago by nickm

Great; closing.

comment:13 Changed 11 years ago by nickm

flyspray2trac: bug closed.

comment:14 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.