Opened 11 years ago

Last modified 7 years ago

#655 closed defect (Fixed)

using all free CPU time in router_get_by_nickname

Reported by: Safari Owned by:
Priority: Low Milestone:
Component: Core Tor/Tor Version: 0.1.2.19
Severity: Keywords:
Cc: Safari, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

This has happened with both Tor v0.2.1.0-alpha-dev r14297 and r14259.
Quite seldom, tor starts eating all CPU time, with nothing interesting logged with "Log debug stderr"
(debug output looks(?! ;) ) the same as when tor is not hogging all CPU).

I for now "worked around" this by using only hex digests in ExcludeNodes.

Thread 1 (Thread 0x2b5b3a251700 (LWP 17463)):
#0 0x00002b5b3992b265 in strcasecmp () from /lib64/libc.so.6
#1 0x00005555555dd68f in router_get_by_nickname () from /proc/17463/exe
#2 0x00005555555de5e2 in add_nickname_list_to_smartlist ()
#3 0x00005555555e00ce in router_choose_random_node () from /proc/17463/exe
#4 0x000055555556aff7 in circuit_establish_circuit () from /proc/17463/exe
#5 0x0000555555572033 in circuit_get_open_circ_or_launch ()
#6 0x0000555555572635 in connection_ap_handshake_attach_circuit ()
#7 0x000055555558ce99 in connection_ap_attach_pending () from /proc/17463/exe
#8 0x00005555555692e8 in circuit_send_next_onion_skin () from /proc/17463/exe
#9 0x00005555555c44b1 in connection_edge_process_relay_cell ()
#10 0x00005555555c5869 in circuit_receive_relay_cell () from /proc/17463/exe
#11 0x0000555555573e92 in command_process_cell () from /proc/17463/exe
#12 0x000055555558e16d in connection_or_process_cells_from_inbuf ()
#13 0x000055555558f4ec in connection_or_process_inbuf () from /proc/17463/exe
#14 0x000055555558567b in connection_handle_read () from /proc/17463/exe
#15 0x00005555555ba013 in conn_read_callback () from /proc/17463/exe
#16 0x00002b5b38841b8b in event_base_loop () from /usr/lib64/libevent-1.3e.so.1
#17 0x00005555555b9bbe in do_main_loop () from /proc/17463/exe
#18 0x00005555555b9e1d in tor_main () from /proc/17463/exe
#19 0x00002b5b398c536a in libc_start_main () from /lib64/libc.so.6
#20 0x0000555555561209 in _start () from /proc/17463/exe
#0 0x00002b5b3992b265 in strcasecmp () from /lib64/libc.so.6

Thread 1 (Thread 0x2b5b3a251700 (LWP 17463)):
#0 0x00002b5b3992b265 in strcasecmp () from /lib64/libc.so.6
#1 0x00005555555dd68f in router_get_by_nickname () from /proc/17463/exe
#2 0x00005555555de5e2 in add_nickname_list_to_smartlist ()
#3 0x00005555555e00ce in router_choose_random_node () from /proc/17463/exe
#4 0x000055555556aff7 in circuit_establish_circuit () from /proc/17463/exe
#5 0x0000555555572033 in circuit_get_open_circ_or_launch ()
#6 0x0000555555572635 in connection_ap_handshake_attach_circuit ()
#7 0x000055555558ce99 in connection_ap_attach_pending () from /proc/17463/exe
#8 0x000055555557332d in circuit_build_needed_circs () from /proc/17463/exe
#9 0x00005555555b9706 in second_elapsed_callback () from /proc/17463/exe
#10 0x00002b5b38841b8b in event_base_loop () from /usr/lib64/libevent-1.3e.so.1
#11 0x00005555555b9bbe in do_main_loop () from /proc/17463/exe
#12 0x00005555555b9e1d in tor_main () from /proc/17463/exe
#13 0x00002b5b398c536a in libc_start_main () from /lib64/libc.so.6
#14 0x0000555555561209 in _start () from /proc/17463/exe
#0 0x00002b5b3992b265 in strcasecmp () from /lib64/libc.so.6

#0 0x00002b5b3992b265 in strcasecmp () from /lib64/libc.so.6
#1 0x00005555555dd68f in router_get_by_nickname () from /proc/17463/exe
#2 0x00005555555de5e2 in add_nickname_list_to_smartlist ()
#3 0x000055555556b340 in circuit_establish_circuit () from /proc/17463/exe
#4 0x0000555555572033 in circuit_get_open_circ_or_launch ()
#5 0x0000555555572635 in connection_ap_handshake_attach_circuit ()
#6 0x000055555558ce99 in connection_ap_attach_pending () from /proc/17463/exe
#7 0x00005555555692e8 in circuit_send_next_onion_skin () from /proc/17463/exe
#8 0x00005555555c44b1 in connection_edge_process_relay_cell ()
#9 0x00005555555c5869 in circuit_receive_relay_cell () from /proc/17463/exe
#10 0x0000555555573e92 in command_process_cell () from /proc/17463/exe
#11 0x000055555558e16d in connection_or_process_cells_from_inbuf ()
#12 0x000055555558f4ec in connection_or_process_inbuf () from /proc/17463/exe
#13 0x000055555558567b in connection_handle_read () from /proc/17463/exe
#14 0x00005555555ba013 in conn_read_callback () from /proc/17463/exe
#15 0x00002b5b38841b8b in event_base_loop () from /usr/lib64/libevent-1.3e.so.1
#16 0x00005555555b9bbe in do_main_loop () from /proc/17463/exe
#17 0x00005555555b9e1d in tor_main () from /proc/17463/exe
#18 0x00002b5b398c536a in libc_start_main () from /lib64/libc.so.6
#19 0x0000555555561209 in _start () from /proc/17463/exe
#0 0x00002b5b3992b265 in strcasecmp () from /lib64/libc.so.6

#0 0x00002b5b3992b265 in strcasecmp () from /lib64/libc.so.6
#1 0x00005555555bb921 in router_get_consensus_status_by_nickname ()
#2 0x00005555555de685 in add_nickname_list_to_smartlist ()
#3 0x000055555556b340 in circuit_establish_circuit () from /proc/17463/exe
#4 0x0000555555572033 in circuit_get_open_circ_or_launch ()
#5 0x0000555555572635 in connection_ap_handshake_attach_circuit ()
#6 0x000055555558ce99 in connection_ap_attach_pending () from /proc/17463/exe
#7 0x00005555555692e8 in circuit_send_next_onion_skin () from /proc/17463/exe
#8 0x00005555555c44b1 in connection_edge_process_relay_cell ()
#9 0x00005555555c5869 in circuit_receive_relay_cell () from /proc/17463/exe
#10 0x0000555555573e92 in command_process_cell () from /proc/17463/exe
#11 0x000055555558e16d in connection_or_process_cells_from_inbuf ()
#12 0x000055555558f4ec in connection_or_process_inbuf () from /proc/17463/exe
#13 0x000055555558567b in connection_handle_read () from /proc/17463/exe
#14 0x00005555555ba013 in conn_read_callback () from /proc/17463/exe
#15 0x00002b5b38841b8b in event_base_loop () from /usr/lib64/libevent-1.3e.so.1
#16 0x00005555555b9bbe in do_main_loop () from /proc/17463/exe
#17 0x00005555555b9e1d in tor_main () from /proc/17463/exe
#18 0x00002b5b398c536a in libc_start_main () from /lib64/libc.so.6
#19 0x0000555555561209 in _start () from /proc/17463/exe
#0 0x00002b5b3992b265 in strcasecmp () from /lib64/libc.so.6

[Automatically added by flyspray2trac: Operating System: Fedora Core Linux]

Child Tickets

Change History (10)

comment:1 Changed 11 years ago by nickm

Are any of the node lists in your torrc really long?

comment:2 Changed 11 years ago by Safari

ExcludeNodes yes, others no.

comment:3 Changed 11 years ago by nickm

Looks like this is a duplicate of bug 509 then?

comment:4 Changed 11 years ago by Safari

In #509 all free CPU time was not spent in doing the strcasecmp, but only substantial amount of CPU time...
in #655 CPU usage stays at 100% for several minutes.

comment:5 Changed 11 years ago by Safari

That is, maybe it is so that if #509 is fixed, #655 still happens even with better algo, because of some state handling bug.

comment:6 Changed 11 years ago by Safari

Once when my ADSL connection broke (and I had long ExcludeNodes then), tor used 0.5s CPU time every one or two seconds:
[this has not been reported earlier]

CPU: P4 / Xeon, speed 2797.2 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 240000
Counted MACHINE_CLEAR events (cycles with entire machine pipeline cleared) with a unit mask of 0x01 (count a portion of cycles the machine is cleared for any cause) count 12000
Counted FSB_DATA_ACTIVITY events (DRDY or DBSY events on the front side bus) with a unit mask of 0x03 (multiple flags) count 60000
Counted RETIRED_BRANCH_TYPE events (retired branches, selected by type) with a unit mask of 0x1f (multiple flags) count 180000
Counted RETIRED_MISPRED_BRANCH_TYPE events (retired mispredicted branched, selected by type) with a unit mask of 0x1f (multiple flags) count 6000
samples % samples % samples % samples % samples % symbol name
43718 67.4473 444 64.7230 5909 67.7016 4455 59.1476 340 17.8947 router_get_by_nickname
8077 12.4610 85 12.3907 630 7.2181 1603 21.2825 27 1.4211 smartlist_string_isin
4447 6.8607 46 6.7055 1572 18.0110 579 7.6872 240 12.6316 router_get_combined_status_by_nickname
1471 2.2694 24 3.4985 123 1.4093 245 3.2528 4 0.2105 .plt
1304 2.0118 17 2.4781 1 0.0115 234 3.1067 933 49.1053 base16_decode
1117 1.7233 13 1.8950 85 0.9739 31 0.4116 32 1.6842 digestmap_get
873 1.3468 9 1.3120 23 0.2635 30 0.3983 0 0 _compare_digest_to_routerstatus_entry
607 0.9365 4 0.5831 47 0.5385 13 0.1726 13 0.6842 routerlist_assert_ok
451 0.6958 7 1.0204 47 0.5385 20 0.2655 20 1.0526 digestmap_set
257 0.3965 2 0.2915 15 0.1719 6 0.0797 2 0.1053 routerlist_remove_old_routers
248 0.3826 4 0.5831 25 0.2864 11 0.1460 26 1.3684 strmap_get
231 0.3564 1 0.1458 8 0.0917 20 0.2655 1 0.0526 digestmap_iter_get

comment:7 Changed 11 years ago by nickm

0.2.1.6-alpha uses a new implementation of router sets; let me know if it's any better for you?

comment:8 Changed 11 years ago by nickm

I think we got this fixed in 0.2.1.6-alpha with the new routerset code. Please request a re-open if we didn't.

comment:9 Changed 11 years ago by nickm

flyspray2trac: bug closed.

comment:10 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.