Opened 12 years ago

Last modified 7 years ago

#417 closed defect (Fixed)

[err] routerlist.c:4488: routerlist_assert_ok: Assertion r == r2 failed; aborting.

Reported by: weasel Owned by:
Priority: High Milestone:
Component: Core Tor/Tor Version: 0.1.2.12-rc
Severity: Keywords:
Cc: weasel, arma, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

#0 0xffffe410 in kernel_vsyscall ()
#1 0xb7d10885 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d12002 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0x080afe48 in routerlist_assert_ok (rl=0x8144788) at routerlist.c:4504
#4 0x080ab058 in routerlist_remove_old_routers () at routerlist.c:2227
#5 0x080af2f8 in update_router_have_minimum_dir_info () at routerlist.c:4211
#6 0x080af23d in router_have_minimum_dir_info () at routerlist.c:4182
#7 0x080942a5 in run_scheduled_events (now=1177084855) at main.c:932
#8 0x08094873 in second_elapsed_callback (fd=-1, event=1, args=0x0) at main.c:1064
#9 0xb7e1fc79 in event_base_priority_init () from /usr/lib/libevent-1.1a.so.1
#10 0xb7e1ff65 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#11 0xb7e1fdcb in event_loop () from /usr/lib/libevent-1.1a.so.1
#12 0xb7e1fcb0 in event_dispatch () from /usr/lib/libevent-1.1a.so.1
#13 0x08094d3b in do_main_loop () at main.c:1267
#14 0x08095dcd in tor_main (argc=0, argv=0x0) at main.c:2494
#15 0x080b943b in main (argc=0, argv=0x0) at tor_main.c:22
#16 0xb7cfd970 in
libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#17 0x0804c7f1 in _start () at ../sysdeps/i386/elf/start.S:102

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (22)

comment:1 Changed 12 years ago by arma

This looks like the same as bug 404.

Looks like the "more asserts" that Nick committed for 404 didn't get
triggered.

comment:2 Changed 12 years ago by nickm

Okay, I added even more asserts this time. Now we'll know if routerlist_remove_old somehow corrupts the identity
map, and if so, where.

comment:3 Changed 12 years ago by arma

#0 0x0000002a95e43545 in raise () from /lib/libc.so.6
#1 0x0000002a95e44cce in abort () from /lib/libc.so.6
#2 0x000000000045b37d in routerlist_assert_ok (rl=0x611f20)

at routerlist.c:4690

#3 0x00000000004568e5 in routerlist_remove_old_routers () at routerlist.c:2284
#4 0x000000000045a74b in update_router_have_minimum_dir_info ()

at routerlist.c:4360

#5 0x000000000045a6a0 in router_have_minimum_dir_info () at routerlist.c:4331
#6 0x0000000000441a2d in run_scheduled_events (now=1177789965) at main.c:1003
#7 0x0000000000441f79 in second_elapsed_callback (fd=24103, event=24103,

args=0x6) at main.c:1136

#8 0x0000002a9588782d in event_base_priority_init ()

from /usr/lib/libevent-1.1a.so.1

#9 0x0000002a95887a72 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#10 0x0000002a958878e5 in event_loop () from /usr/lib/libevent-1.1a.so.1
#11 0x0000000000442421 in do_main_loop () at main.c:1348
#12 0x000000000044312a in tor_main (argc=24103, argv=0x5e27) at main.c:2574
#13 0x0000002a95e31441 in libc_start_main () from /lib/libc.so.6
#14 0x00000000004062ea in _start () at ../sysdeps/x86_64/elf/start.S:96

This was on moria2, running Tor version 0.2.0.0-alpha-dev (r10020).

comment:4 Changed 12 years ago by arma

Looks like we don't put our own routerinfo into the identity map:
sd2 is null and *(routerinfo_t *)rl->routers.list[0] is moria2.

In some places, like router_get_by_nickname, it looks like we don't
expect to find our own descriptor in the various routerlists. In other
places it looks like we do. Is the intended invariant written down
somewhere?

comment:5 Changed 12 years ago by nickm

We expect to find our own descriptor in the list if we can get it from a directory authority. We expect
never to find the actual copy of our descriptor that you get from router_get_my_routerinfo().

comment:6 Changed 12 years ago by arma

I believe this bug has mostly been sighted on authorities themselves. How
do they expect to behave?

comment:7 Changed 12 years ago by nickm

Looks like that traceback is occuring at the start of routerlist_remove_old_routers: the corruption is happening
somewhere outside of routerlist_remove_old_routers.

comment:8 Changed 12 years ago by nickm

I believe authorities treated the same as everybody else: they generate a descriptor, upload it to everyone
(including themselves) and thenceforth they should act the same as everybody else wrt that descriptor: they keep
a "my descriptor as seen by the authorities (me)" in the routerlist, and a "my descriptor as I most recently
generated it" in desc_routerinfo in router.c.

comment:9 Changed 12 years ago by nickm

I fixed _a_ dirserver-only bit-flipping bug in r10052-3, but I don't see how it could have been _this_ bug. Also,
I added more asserts. Still looking.

comment:10 Changed 12 years ago by arma

Two new cores, from moria1 and moria2, from r10077. Same assert lines:

May 06 12:36:59.593 [err] Bug: routerlist.c:4810: routerlist_assert_ok: Assertion r == r2 failed; aborting.

#0 0x0000002a95e43545 in raise () from /lib/libc.so.6
#1 0x0000002a95e44cce in abort () from /lib/libc.so.6
#2 0x000000000045c4ad in routerlist_assert_ok (rl=0x612880)

at routerlist.c:4826

#3 0x0000000000457935 in routerlist_remove_old_routers () at routerlist.c:2392
#4 0x000000000045b85b in update_router_have_minimum_dir_info ()

at routerlist.c:4496

#5 0x000000000045b7b0 in router_have_minimum_dir_info () at routerlist.c:4467
#6 0x0000000000441e6d in run_scheduled_events (now=1178274050) at main.c:1006
#7 0x00000000004423f9 in second_elapsed_callback (fd=16150, event=16150,

args=0x6) at main.c:1145

#8 0x0000002a9578482d in event_base_priority_init ()

from /usr/lib/libevent-1.1a.so.1

#9 0x0000002a95784a72 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#10 0x0000002a957848e5 in event_loop () from /usr/lib/libevent-1.1a.so.1
#11 0x00000000004428a1 in do_main_loop () at main.c:1357
#12 0x00000000004435ca in tor_main (argc=16150, argv=0x3f16) at main.c:2584
#13 0x0000002a95e31441 in libc_start_main () from /lib/libc.so.6
#14 0x00000000004062ea in _start () at ../sysdeps/x86_64/elf/start.S:96

#0 0x0000002a95e43545 in raise () from /lib/libc.so.6
#1 0x0000002a95e44cce in abort () from /lib/libc.so.6
#2 0x000000000045c4ad in routerlist_assert_ok (rl=0x613010)

at routerlist.c:4826

#3 0x0000000000457935 in routerlist_remove_old_routers () at routerlist.c:2392
#4 0x000000000045b85b in update_router_have_minimum_dir_info ()

at routerlist.c:4496

#5 0x000000000045b7b0 in router_have_minimum_dir_info () at routerlist.c:4467
#6 0x0000000000441e6d in run_scheduled_events (now=1178469419) at main.c:1006
#7 0x00000000004423f9 in second_elapsed_callback (fd=16744, event=16744,

args=0x6) at main.c:1145

#8 0x0000002a9578482d in event_base_priority_init ()

from /usr/lib/libevent-1.1a.so.1

#9 0x0000002a95784a72 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#10 0x0000002a957848e5 in event_loop () from /usr/lib/libevent-1.1a.so.1
#11 0x00000000004428a1 in do_main_loop () at main.c:1357
#12 0x00000000004435ca in tor_main (argc=16744, argv=0x4168) at main.c:2584
#13 0x0000002a95e31441 in libc_start_main () from /lib/libc.so.6
#14 0x00000000004062ea in _start () at ../sysdeps/x86_64/elf/start.S:96

comment:11 Changed 12 years ago by nickm

Possibly, just possibly, this is fixed in r10217.

comment:12 Changed 12 years ago by weasel

on r10403
Jun 02 11:40:28.493 [err] Bug: routerlist.c:5139: routerlist_assert_ok: Assertion r == r2 failed; aborting.

same bt it seems:
(gdb) bt
#0 0xffffe410 in kernel_vsyscall ()
#1 0xb7d6d885 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d6f002 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0x080b58e4 in routerlist_assert_ok (rl=0x814c180) at routerlist.c:5160
#4 0x080b0119 in routerlist_remove_old_routers () at routerlist.c:2629
#5 0x080b4bf8 in update_router_have_minimum_dir_info () at routerlist.c:4826
#6 0x080b4b3d in router_have_minimum_dir_info () at routerlist.c:4797
#7 0x0809599c in run_scheduled_events (now=1180777228) at main.c:1024
#8 0x08095fc3 in second_elapsed_callback (fd=-1, event=1, args=0x0) at main.c:1163
#9 0xb7fbfc79 in event_base_priority_init () from /usr/lib/libevent-1.1a.so.1
#10 0xb7fbff65 in event_base_loop () from /usr/lib/libevent-1.1a.so.1
#11 0xb7fbfdcb in event_loop () from /usr/lib/libevent-1.1a.so.1
#12 0x08096503 in do_main_loop () at main.c:1375
#13 0x0809760d in tor_main (argc=0, argv=0x0) at main.c:2610
#14 0x080c103b in main (argc=0, argv=0x0) at tor_main.c:28

sd2 is non-null this time.

(gdb) p *sd2
$12 = {signed_descriptor_body = 0x3d3d2072 <Address 0x3d3d2072 out of bounds>, signed_descriptor_len = 3306016, signed_descriptor_digest = "idx >= 0\000Contact %s,",

identity_digest = " Platform %s\000\000\000\000\000\000\000", published_on = 0, extra_info_digest = "\000\000\000\000Skipping cached-", ei_dl_status = {

next_attempt_at = 1952543859, n_download_failures = 117 'u'}, saved_location = 543517801, saved_offset = 7308908027320756599, do_not_cache = 0, is_extrainfo = 0}

comment:13 Changed 12 years ago by weasel

  • r10496
  • -DDEBUG_ROUTERLIST
  • -O0

Jun 10 09:19:30.780 [err] Bug: routerlist.c:5146: routerlist_assert_ok: Assertion r == r2 failed; aborting.

(gdb) bt
#0 0xffffe410 in kernel_vsyscall ()
#1 0xb7d67885 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d69002 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0x080b7d14 in routerlist_assert_ok (rl=0x81493c8) at routerlist.c:5167
#4 0x080b0fe2 in routerlist_replace (rl=0x81493c8, ri_old=0x831c510, ri_new=0x892f008, idx=143847484, make_old=1) at routerlist.c:2180
#5 0x080b1b55 in router_add_to_routerlist (router=0x837db40, msg=0xbf91d534, from_cache=0, from_fetch=0) at routerlist.c:2465
#6 0x080881b1 in dirserv_add_descriptor (ri=0x837db40, msg=0xbf91d534) at dirserv.c:613
#7 0x08087feb in dirserv_add_multiple_descriptors (

desc=0x13373c28 "router Unnamed 36.240.122.30 9001 0 0\nplatform Tor 0.1.2.13 on Windows XP Service Pack 2 [workstation] {personal} {terminal services, single user}\npublished 2007-06-10 07:20:46\nopt fingerprint 466D 34"..., msg=0xbf91d574) at dirserv.c:531

#8 0x080860bf in directory_handle_command_post (conn=0x8252fd8, headers=0x0,

body=0x13373c28 "router Unnamed 36.240.122.30 9001 0 0\nplatform Tor 0.1.2.13 on Windows XP Service Pack 2 [workstation] {personal} {terminal services, single user}\npublished 2007-06-10 07:20:46\nopt fingerprint 466D 34"..., body_len=1990) at directory.c:2000

#9 0x0808632c in directory_handle_command (conn=0x8252fd8) at directory.c:2080
#10 0x080847dd in connection_dir_process_inbuf (conn=0x8252fd8) at directory.c:1436
#11 0x0806bad5 in connection_handle_read (conn=0x8252fd8) at connection.c:1602
#12 0x08096ad9 in conn_read_callback (fd=453, event=2, _conn=0x8252fd8) at main.c:493
#13 0xb7fb9c79 in event_base_priority_init () from /usr/lib/libevent-1.1a.so.1
..

(gdb)
#3 0x080b7d14 in routerlist_assert_ok (rl=0x81493c8) at routerlist.c:5167
5167 iter = digestmap_iter_next(rl->identity_map, iter);

(gdb) p sd2
$2 = (signed_descriptor_t *) 0x80fe267
(gdb) p *sd2
$3 = {signed_descriptor_body = 0x74756f72 <Address 0x74756f72 out of bounds>, signed_descriptor_len = 1768714853, signed_descriptor_digest = "st_assert_ok\000routerl",

identity_digest = "ist_check_bug_417\000TO", published_on = 1599229791, extra_info_digest = "CONN\000TO_EDGE_CONN\000TO", ei_dl_status = {

next_attempt_at = 1313817439, n_download_failures = 84 'T'}, saved_location = 1313817439, saved_offset = 6868639652464820302, do_not_cache = 1, is_extrainfo = 1}

[note that sd2 gets assigned after the r==r2 assert, so it's from the previous loop run?]

(gdb) p r
No symbol "r" in current context.
(gdb) p r2
$4 = (routerinfo_t *) 0x141a
(gdb) p *r2
Cannot access memory at address 0x141a

comment:14 Changed 12 years ago by nickm

Great, save that core and that tor!

It looks like something is maybe messing up the identity map inside routerlist_replace. How bizarre. In particular,
it seems like r2 (the one inside identity_map) has been corrupted in a way that makes it point to something that never _was_ a valid pointer. (Or the r2 pointer isn't the one we want.)

Can you see a value for r_sl_idx insifde routerlist_assert_ok()?

comment:15 Changed 12 years ago by weasel

gdb doesn't want to know about r_sl_idx, unfortunately.

comment:16 Changed 12 years ago by weasel

always check your assumptions. like the assumption that you can trust gdb.
Different gdb version on the same core/binary gives me different data.

<weasel> sarge: (gdb) p r2 $10 = (routerinfo_t *) 0x141a
<weasel> sarge: (gdb) p iter $11 = (digestmap_iter_t *) 0x80fe267
<weasel> sarge: (gdb) p sd2 $12 = (signed_descriptor_t *) 0x80fe267
<weasel> etch: (gdb) p r2 $2 = (routerinfo_t *) 0x80fe267
<weasel> etch: (gdb) p iter $1 = (digestmap_iter_t *) 0x0
<weasel> etch: (gdb) p sd2 $3 = (signed_descriptor_t *) 0x0

comment:17 Changed 12 years ago by weasel

just for completness' sake, if we interpret the *267 pointer as a routerinfo_t instead of a signed_descriptor_t:
(gdb) p *r2
$4 = {cache_info = {

signed_descriptor_body = 0x74756f72 <Address 0x74756f72 out of bounds>,
signed_descriptor_len = 1768714853,
signed_descriptor_digest = "st_assert_ok\000routerl",
identity_digest = "ist_check_bug_417\000TO", published_on = 1599229791,
extra_info_digest = "CONN\000TO_EDGE_CONN\000TO", ei_dl_status = {

next_attempt_at = 1313817439, n_download_failures = 84 'T'},

saved_location = 1313817439, saved_offset = 6868639652464820302,
do_not_cache = 1, is_extrainfo = 1},

address = 0x544955 <Address 0x544955 out of bounds>,
nickname = 0x4f5f4f54 <Address 0x4f5f4f54 out of bounds>, addr = 1229408594,
or_port = 24398, dir_port = 18755, onion_pkey = 0x49554352,
identity_pkey = 0x6f720054,
platform = 0x72657475 <Address 0x72657475 out of bounds>,
bandwidthrate = 1818587743, bandwidthburst = 1600414063,
bandwidthcapacity = 2004116846, exit_policy = 0x736b726f,
uptime = 1970561396, declared_family = 0x65670073,
contact_info = 0x706f5f74 <Address 0x706f5f74 out of bounds>,
is_hibernating = 0, has_old_dnsworkers = 0, caches_extra_info = 1,
is_running = 0, is_valid = 1, is_named = 1, is_fast = 1, is_stable = 0,
is_possible_guard = 1, is_exit = 0, is_bad_exit = 0, purpose = 111 'o',
last_reachable = 757672051, testing_since = 1952531518,
num_unreachable_notifications = 1919501409, routerlist_index = 1869898597}

(and why on earth do we have non-aligned pointers)

comment:18 Changed 12 years ago by arma

I just got this in moria1 running 0.1.2.14. Alas, I have no core. Next time!

But here is our cryptic hint:
Jun 14 09:19:40.129 [err] routerlist_assert_ok(): fatal error: router at 0x765e7
90 did not match router at 0x6aade70. [1709]
Jun 14 09:19:40.130 [err] routerlist.c:4560: routerlist_assert_ok: Assertion r =

r2 failed; aborting.

routerlist.c:4560 routerlist_assert_ok: Assertion r == r2 failed; aborting.

comment:19 Changed 12 years ago by weasel

Got another one of these.

assert_ok() was called from replace(). r2 was zero, r a freed routerinfo, the one that
we were supposed to have replaced in the replace() call. The old ri was still in the list,
now freed, the new_ri was at list[listlen], i.e. just outside the list at the end.
ri_new->routerlist_index was consistent with that position. Added three new asserts as r10720.

comment:20 Changed 12 years ago by weasel

for those watching the show at home:

Fri 20:12:00 <weasel> nickm: what I'm currently most wondering about is this:
Fri 20:12:26 <weasel> nickm: in routerlsit.c, in router_add_to_routerlist() around line 2426
Fri 20:12:49 <weasel> in the if (old_router) branch. we assign pos, and then we assert that p <= pos && pos < smartlist_len(routerlist->routers));
Fri 20:13:51 <weasel> a few lines down we give that pos to routerlist_replace, which calls routerlist_find_elt, basically still with that pos. and suddenly that 1807 is

now == smartlist_len

Fri 20:15:09 <nickm> AHA!
Fri 20:15:25 <weasel> so maybe one of the few things we call fuzzes with our list
Fri 20:15:36 <nickm> router_have_minimum_dir_info() calls update_router_have_minimum_dir_info() which can call routerlist_remove_old_routers().

comment:21 Changed 12 years ago by weasel

flyspray2trac: bug closed.
let's hope that r1075[234] fixed this one.

comment:22 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.