Opened 10 years ago

Closed 9 years ago

Last modified 7 years ago

#1245 closed defect (fixed)

circuit_build_times_add_time: Assertion time <= BUILD_TIME_MAX failed

Reported by: arma Owned by: mikeperry
Priority: High Milestone: 0.2.2.x-final
Component: Core Tor/Tor Version: 0.2.2.7-alpha
Severity: Keywords:
Cc: arma, Sebastian, nickm, mikeperry, Tas Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by mikeperry)

Feb 09 09:31:15.047 [Notice] Your system clock just jumped 26715 seconds
forward; assuming established circuits no longer work.
Feb 09 09:31:13.982 [Error] circuit_build_times_add_time(): Bug:
circuitbuild.c:191: circuit_build_times_add_time: Assertion time <=
BUILD_TIME_MAX failed; aborting.

I hibernated my laptop last night, and when I turned it back on, Tor
I guess was in the middle of building a circuit. It decided to time it
out, since it had been building all night (while the machine was off).

#0 0xb7f09424 in kernel_vsyscall ()
#1 0xb7bb0640 in raise () from /lib/i686/cmov/libc.so.6
#2 0xb7bb2018 in abort () from /lib/i686/cmov/libc.so.6
#3 0x0809b6a9 in circuit_build_times_add_time (cbt=0x815b100, time=4294966199)

at circuitbuild.c:191

#4 0x0809f1a7 in circuit_send_next_onion_skin (circ=0x89f4b60)

at circuitbuild.c:1493

#5 0x08061b8c in connection_edge_process_relay_cell (cell=0xbfe02a98,

circ=0x89f4b60, conn=0x0, layer_hint=0x87ee400) at relay.c:1159

#6 0x08062879 in circuit_receive_relay_cell (cell=0xbfe02a98, circ=0x89f4b60,

cell_direction=CELL_DIRECTION_IN) at relay.c:200

#7 0x080aba46 in command_process_cell (cell=0xbfe02a98, conn=0x884fe58)

at command.c:417

#8 0x080c98f1 in connection_or_process_cells_from_inbuf (conn=0x884fe58)

at connection_or.c:1247

#9 0x080ba62c in connection_process_inbuf (conn=0x884fe58, package_partial=6)

at connection.c:3181

#10 0x080bf0b1 in connection_handle_read (conn=0x884fe58) at connection.c:2370
#11 0x08052980 in conn_read_callback (fd=15, event=2, _conn=0x884fe58)

at main.c:479

#12 0xb7e9d9e2 in event_base_loop () from /usr/lib/libevent-1.3e.so.1
#13 0x080524ba in do_main_loop () at main.c:1512
#14 0x08052795 in tor_main (argc=9, argv=0xbfe030d4) at main.c:2137

#3 0x0809b6a9 in circuit_build_times_add_time (cbt=0x815b100, time=4294966199)

at circuitbuild.c:191

191 tor_assert(time <= BUILD_TIME_MAX);
(gdb) print time
$1 = 4294966199
(gdb) up
#4 0x0809f1a7 in circuit_send_next_onion_skin (circ=0x89f4b60)

at circuitbuild.c:1493

1493 circuit_build_times_add_time(&circ_times, (build_time_t)timediff);
(gdb) print timediff
$2 = 0
(gdb) up
#5 0x08061b8c in connection_edge_process_relay_cell (cell=0xbfe02a98,

circ=0x89f4b60, conn=0x0, layer_hint=0x87ee400) at relay.c:1159

1159 if ((reason=circuit_send_next_onion_skin(TO_ORIGIN_CIRCUIT(circ)))<0) {
(gdb) print *circ
$3 = {magic = 892424771, n_conn_cells = {head = 0x0, tail = 0x0, n = 0,

insertion_times = 0x0}, n_conn = 0x884fe58, n_circ_id = 47247,

n_hop = 0x0, streams_blocked_on_n_conn = 0, streams_blocked_on_p_conn = 0,
state = 3 '\003', purpose = 5 '\005', package_window = 1000,
deliver_window = 1000, n_conn_onionskin = 0x0,
timestamp_created = 1265725875, timestamp_dirty = 0, highres_created = {

tv_sec = 1265725875, tv_usec = 58412}, marked_for_close = 0,

marked_for_close_file = 0x0, next_active_on_n_conn = 0x0,
prev_active_on_n_conn = 0x0, next = 0x8b4c290, dirreq_id = 0, n_cell_ewma = {

last_adjusted_tick = 126572587, cell_count = 0, is_for_p_conn = 0,
heap_index = -1}}

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (5)

comment:1 Changed 10 years ago by mikeperry

This appears odd. I'm really not sure how timediff=0 became time=4294966199, unless the value was optimized out
of the core. I'm still not sure how the value could get negative. I'm guessing the hibernate process is sloppy
about when it adjusts the clock.. I guess I should review all the circuit timeout code to handle negative circuit
build times?

comment:2 Changed 10 years ago by mikeperry

So there is a flip side to this bug. During a suspend or clock change, a circuit could also actually complete and
get stored with a really large value, which could skew our measurements. arma suggested that we should maybe change
how circuit_note_clock_jumped() *and* the successfully completed codepath for this bug to check to see if the circuit
is actually considerably older than the timeout value, indicating that excessive time had passed without Tor receiving
its normal events. We should then discard and/or close these circuits, as appropriate.

comment:3 Changed 10 years ago by arma

Worse than just skew our measurements. Tas just reported that his Tor client
jumps to 100% cpu and sticks there. Turns out he has a state file with
...
CircuitBuildTimeBin 1367592725 1
CircuitBuildTimeBin 2147483625 2

so in circuit_build_times_create_histogram(), it does:

build_time_t max_build_time = circuit_build_times_max(cbt);
*nbins = 1 + (max_build_time / CBT_BIN_WIDTH);
histogram = tor_malloc_zero(*nbins * sizeof(build_time_t));

That's about 171MB that it mallocs and zeroes each time.

Sounds like we want to clip the max build time to something much lower.
That or come up with a smarter algorithm for making bins.
Or both. :)

comment:4 Changed 9 years ago by mikeperry

Description: modified (diff)
Resolution: Nonefixed
Status: newclosed

Fix should appear in 0.2.2.14.

comment:5 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.