Opened 10 years ago

Closed 9 years ago

Last modified 7 years ago

#977 closed defect (worksforme)

buf_shrink_freelists: Assertion !n_to_free failed

Reported by: arma Owned by:
Priority: Low Milestone:
Component: Core Tor/Tor Version: 0.2.1.14-rc
Severity: Keywords:
Cc: arma, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by arma)

dr|z3d> Nasty Tor crash: "May 05 21:08:52.448 [Error] Bug: buffers.c:281:
buf_shrink_freelists: Assertion !n_to_free failed; aborting."

It was 0.2.1.14-rc on Windows.

"DNSPort and 1 hidden service. Run from Vidalia, so ControlPort was set too.

A few IRC connections, and probably idle other than that.

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (20)

comment:1 Changed 10 years ago by nickm

Hm. Near the start of that loop, we call assert_freelist_ok(). So we know coming into the part of
the code that sets n_to_free in the first place that cur_length is correct, and that it is no less
than lowest_length, I think. Interesting!

comment:2 Changed 10 years ago by nickm

(You didn't have an INFO-level log for that, did you?)

comment:3 Changed 10 years ago by nickm

Okay, the master and maint-0.2.1 branches now have debugging code to try to give a better message here, and not
crash if we can help it. Let's see if they do any good.

comment:4 Changed 10 years ago by arma

Well, moria1 consistently triggers an assert on start, running git head:

May 16 17:38:08.701 [notice] This version of Tor (0.2.2.0-alpha-dev) is newer than any recommended version, according to the directory authorities. Recommended versions are: 0.2.0.33,0.2.0.34,0.2.1.11-alpha,0.2.1.12-alpha,0.2.1.13-alpha,0.2.1.14-rc
May 16 17:38:08.844 [err] Bug: memarea.c:233: memarea_alloc: Assertion chunk->next_mem <= chunk->u.mem+chunk->mem_size failed; aborting.
memarea.c:233 memarea_alloc: Assertion chunk->next_mem <= chunk->u.mem+chunk->mem_size failed; aborting.

#0 0x00002b4ddc82207b in raise () from /lib/libc.so.6
#1 0x00002b4ddc82384e in abort () from /lib/libc.so.6
#2 0x00000000004dbbc7 in memarea_alloc (area=0x12f1400, sz=12)

at memarea.c:233

#3 0x00000000004dbcf9 in memarea_strndup (area=0x12f1400,

s=0x2b4ddd26ad00 <Address 0x2b4ddd26ad00 out of bounds>, n=11)
at memarea.c:273

#4 0x00000000004bbcf0 in get_token_arguments (area=0x12f1400, tok=0x1108ea0,

s=0x2b4ddd26ad00 <Address 0x2b4ddd26ad00 out of bounds>,
eol=0x2b4ddd26ad0b <Address 0x2b4ddd26ad0b out of bounds>)
at routerparse.c:2941

#5 0x00000000004bb342 in get_next_token (area=0x12f1400, s=0x7fffcebbfbb0,

eos=0x2b4ddd26ae2a <Address 0x2b4ddd26ae2a out of bounds>, table=0x642ac0)
at routerparse.c:3015

#6 0x00000000004bc3aa in tokenize_string (area=0x12f1400,

start=0x2b4ddd26ad00 <Address 0x2b4ddd26ad00 out of bounds>,
end=0x2b4ddd26ae2a <Address 0x2b4ddd26ae2a out of bounds>, out=0x1396e90,
table=0x642ac0, flags=1) at routerparse.c:3133

#7 0x00000000004b3d22 in router_parse_entry_from_string (

s=0x2b4ddd26a2a7 <Address 0x2b4ddd26a2a7 out of bounds>,
end=0x2b4ddd26ae2a <Address 0x2b4ddd26ae2a out of bounds>, cache_copy=0,
allow_annotations=1, prepend_annotations=0x0) at routerparse.c:1191

#8 0x00000000004b388e in router_parse_list_from_string (s=0x7fffcebbffe8,

eos=0x2b4dde18fbe9 <Address 0x2b4dde18fbe9 out of bounds>, dest=0x128b940,
saved_location=SAVED_IN_CACHE, want_extrainfo=0, allow_annotations=1,
prepend_annotations=0x0) at routerparse.c:1061

#9 0x00000000004acc82 in router_load_routers_from_string (

s=0x2b4ddd26a26d <Address 0x2b4ddd26a26d out of bounds>,
eos=0x2b4dde18fbe9 <Address 0x2b4dde18fbe9 out of bounds>,
saved_location=SAVED_IN_CACHE, requested_fingerprints=0x0,
descriptor_digests=0, prepend_annotations=0x0) at routerlist.c:3507

#10 0x00000000004a4739 in router_reload_router_list_impl (store=0x64feb0)

at routerlist.c:805

#11 0x00000000004a4956 in router_reload_router_list () at routerlist.c:855
#12 0x00000000004786f5 in do_main_loop () at main.c:1396
#13 0x0000000000479e1c in tor_main (argc=3, argv=0x7fffcebc0338) at main.c:2060
#14 0x00000000004c65b7 in main (argc=3, argv=0x7fffcebc0338) at tor_main.c:30

#2 0x00000000004dbbc7 in memarea_alloc (area=0x12f1400, sz=12)

at memarea.c:233

233 tor_assert(chunk->next_mem <= chunk->u.mem+chunk->mem_size);
(gdb) print chunk->next_mem
$1 = 0x1108ee8 "!"
(gdb) print chunk->u.mem+chunk->mem_size
$2 = 0x1108ee4 "\"f\200\220!"
(gdb) print *chunk
$4 = {next_chunk = 0x0, mem_size = 4076, next_mem = 0x1108ee8 "!", u = {

mem = "=", _void_for_alignment = 0x10000003d}}

comment:5 Changed 10 years ago by nickm

Oho. That's a 64-bit bug! But I think it's a different crash from the one this bug is about. It reminds me a
little of bug 930.

comment:6 Changed 10 years ago by nickm

Just pushed a fix for the most recent assertion failure in maint-0.2.1 and master.

comment:7 Changed 10 years ago by atagar

Hi, I've encountered a similar issue three times today ("Bug: buffers.c:288: buf_shrink_freelists: Assertion !n_to_free failed; aborting.") with Tor v0.2.0.34 (r18423) on Ubuntu 8.04 (Hardy). It occurred during test runs of a monitoring app that was listening to all events so have a bit of output - hope it helps! -Damian

17:23:10 [ERR] Bug: buffers.c:288: buf_shrink_freelists: Assertion !n_to_free failed; aborting.
17:23:10 [UNKNOWN] ERR BUG REASON=""Bug: buffers.c:288: buf_shrink_freelists: Assertion !n_to_free failed; aborting.\n""
17:23:10 [INFO] buf_shrink_freelists(): Cleaning freelist for 4096-byte chunks: keeping 26, dropping 3.
17:23:10 [DEBUG] run_connection_housekeeping(): Sending keepalive to (69.149.121.207:1554)
17:23:10 [DEBUG] run_connection_housekeeping(): Sending keepalive to (83.253.248.89:1453)
17:23:10 [DEBUG] run_connection_housekeeping(): Sending keepalive to (69.72.254.74:38692)
17:23:10 [INFO] router_pick_published_address(): Could not determine our address locally. Checking if directory headers provide any hints.
17:23:10 [INFO] resolve_my_address(): Address 'odin' resolves to private IP address '127.0.1.1'. Tor servers that use the default DirServers must have

public IP addresses.

17:23:10 [INFO] resolve_my_address(): Interface IP address '192.168.1.78' is a private address too. Ignoring.
17:23:10 [INFO] resolve_my_address(): Guessed local hostname 'odin' resolves to a private IP address (127.0.1.1). Trying something else.
17:23:10 [DEBUG] resolve_my_address(): Guessed local host name as 'odin'
17:23:10 [INFO] routerlist_remove_old_routers(): We have 1690 live routers and 13398 old router descriptors. At most 2630 must be retained because of

networkstatuses.

17:23:10 [INFO] router_pick_published_address(): Could not determine our address locally. Checking if directory headers provide any hints.
17:23:10 [INFO] resolve_my_address(): Address 'odin' resolves to private IP address '127.0.1.1'. Tor servers that use the default DirServers must have

public IP addresses.

17:23:10 [INFO] resolve_my_address(): Interface IP address '192.168.1.78' is a private address too. Ignoring.
17:23:10 [INFO] resolve_my_address(): Guessed local hostname 'odin' resolves to a private IP address (127.0.1.1). Trying something else.
17:23:10 [DEBUG] resolve_my_address(): Guessed local host name as 'odin'
17:23:10 [INFO] update_consensus_router_descriptor_downloads(): 0 router descriptors downloadable. 0 delayed; 1439 present (0 of those were in old_routers);

0 would_reject; 0 wouldnt_use; 0 in progress.

17:23:10 [INFO] router_pick_published_address(): Could not determine our address locally. Checking if directory headers provide any hints.
17:23:10 [INFO] resolve_my_address(): Address 'odin' resolves to private IP address '127.0.1.1'. Tor servers that use the default DirServers must have

public IP addresses.

17:23:10 [INFO] resolve_my_address(): Interface IP address '192.168.1.78' is a private address too. Ignoring.
17:23:10 [INFO] resolve_my_address(): Guessed local hostname 'odin' resolves to a private IP address (127.0.1.1). Trying something else.
17:23:10 [DEBUG] resolve_my_address(): Guessed local host name as 'odin'

comment:8 Changed 10 years ago by atagar

Just got another crash, the following is the gdb output:

(gdb) bt
#0 0xb7f4d410 in kernel_vsyscall ()
#1 0xb7c48085 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7c49a01 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0x0807d49f in enable_control_logging () at control.c:3349
#4 0x080a8f45 in conn_write_callback (fd=72, events=4, _conn=0x8fb8068) at main.c:485
#5 0xb7f12a42 in event_base_loop () from /usr/lib/libevent-1.3e.so.1
#6 0xb7f12cd9 in event_loop () from /usr/lib/libevent-1.3e.so.1
#7 0x080a8b5f in do_main_loop () at main.c:1463
#8 0x080a8d65 in tor_main (argc=11, argv=0xbf9e86e4) at main.c:2006
#9 0x080df342 in main (argc=Cannot access memory at address 0x3319
) at tor_main.c:29

comment:9 Changed 10 years ago by atagar

Updated to Tor v0.2.1.15-rc and crashed again, this time with:

(gdb) bt
#0 0xb7f60410 in kernel_vsyscall ()
#1 0xb7c5b085 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7c5ca01 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0x0807ebff in enable_control_logging () at control.c:3361
#4 0x080aca55 in conn_write_callback (fd=19, events=4, _conn=0x9c06d70) at main.c:484
#5 0xb7f25a42 in event_base_loop () from /usr/lib/libevent-1.3e.so.1
#6 0xb7f25cd9 in event_loop () from /usr/lib/libevent-1.3e.so.1
#7 0x080ac650 in do_main_loop () at main.c:1435
#8 0x080ac867 in tor_main (argc=11, argv=0xbfa75f74) at main.c:2060
#9 0x080eaef2 in main (argc=Cannot access memory at address 0x6bd5
) at tor_main.c:30

comment:10 Changed 10 years ago by nickm

Those last two crashes don't seem to have much to do with the !n_to_free bug. It looks instead like we're hitting
the assert in enable_control_logging. But I can't figure out how that's happening; we call enable_control_logging from
exactly one place in the code, and as near as I can tell we always disable the logging first..

Are you using some kind of custom controller? Do you know which events you're listening for, exactly?

comment:11 Changed 10 years ago by nickm

Oh hm. Possible race condition here. Two of them, maybe. Are you running as a server here?

comment:12 Changed 10 years ago by nickm

There's a possible fix for this last part in my public git repo (git://git.torproject.org/~nickm/git/tor.git) in
the bug977b branch.

It's still not the cause of the original thing, unless there was a cpuworker thread running too.

comment:13 Changed 10 years ago by atagar

Yup, the later two ended with debug messages rather than logging the above error so possibly a different issue.
I'm running Tor via Vidalia with an extra monitor (arm) logging all events. As mentioned on IRC this doesn't occur
if skimming down the types of events being logged. This should be reproduce able by running tor via vidalia, starting
arm with all events ("-e=A"), and leaving it for a half hour.

comment:14 Changed 10 years ago by nickm

"Are you running as a server here?"

(On IRC, Damian confirmed that the answer was "Yes".)

comment:15 Changed 10 years ago by atagar

Alas, I haven't tried Nick's fix yet but I've just encountered something more akin to the original bug again with version 0.2.1.15-rc:

4:13:25 [ERR] Bug: buffers.c:269: buf_shrink_freelists: Assertion (*chp)->next failed; aborting.
02:11:24 [WARN] buf_shrink_freelists(): Bug: There were 32 chunks at the start. I decided to keep 0. I wanted to free 6. I freed 5. I somehow think I

have 1 left to free.

02:11:24 [WARN] buf_shrink_freelists(): Bug: Freelist length for 4096-byte chunks may have been messed up somehow.

This happened after two and a half hour of operation. Unfortunately I wansn't anticipating it so no core file...

comment:16 Changed 9 years ago by nickm

So our options for fixing this are pretty much: valgrind, dmalloc, more asserts, or staring at the code.

atagar/arma: has this been coming up at all recently, or should we presume we fixed it some time in the last 12 months?

comment:17 Changed 9 years ago by nickm

Bug #1125 is related to this, but not necessarily identical.

comment:18 Changed 9 years ago by arma

Description: modified (diff)

I haven't seen it anytime recently. Happy to assume we fixed it.

comment:19 Changed 9 years ago by nickm

Resolution: Noneworksforme
Status: newclosed

Atagar said:

21:52 < atagar> Yikes that's an old one. And nope, haven't seen anything 
                related to that issue since then.

arma said:

23:18 < armadev> nickm: i haven't seen anything 977-like in a long time.

I think this may be fixed, or maybe it mutated into #1125. Closing as "worksforme"

comment:20 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.