#20103 closed defect (fixed)

Crash on OpenBSD: tor invoked from Tor Browser 6.0.4

Reported by: attila Owned by: nickm
Priority: High Milestone: Tor: 0.2.8.x-final
Component: Core Tor/Tor Version: Tor: 0.2.8.7
Severity: Normal Keywords: bug regression 028-backport TorCoreTeam201609
Cc: Actual Points: 1
Parent ID: Points:
Reviewer: Sponsor:

Description

While testing an update to the (proposed) TBB port for OpenBSD both I and my partner in torbsd.crime were able to get the instance of tor started by TBB to dump core, but not reliably.

We're using tor 0.2.8.7 under OpenBSD-current (Sept 5 snapshot). I've built myself a package for amd64 from the OpenBSD port with debugging symbols, so I can see what's going on. Under -current you do:

$ cd /usr/ports/net/tor
$ env DEBUG="-ggdb -O0" INSTALL_STRIP= make repackage

and install the resulting /usr/ports/packages/amd64/all/tor-0.2.8.7.tgz package.

Other than that I made no changes to tor itself. The core dump happened both with the standard package (no debug syms) and my package with debug syms.

We die in nodelist.c:836 at the call to the SL_ADD_NEW_IPV6_AP() macro because node->rs appears to be an invalid pointer (node->ri is fine):

(gdb) where
#0  0x000013438bc334a2 in tor_addr_family (a=0x1345c7c3ff58) at address.h:155
#1  0x000013438bc3501c in tor_addr_is_null (addr=0x1345c7c3ff58)
    at src/common/address.c:871
#2  0x000013438bc3526e in tor_addr_is_valid (addr=0x1345c7c3ff58, 
    for_listening=0) at src/common/address.c:932
#3  0x000013438bb1575f in node_get_all_orports (node=0x1345c21f6000)
    at src/or/nodelist.c:836
#4  0x000013438bc29a20 in node_is_a_configured_bridge (node=0x1345c21f6000)
    at src/or/entrynodes.c:1871
#5  0x000013438bc2b74a in any_bridge_supports_microdescriptors ()
    at src/or/entrynodes.c:2486
#6  0x000013438bb0d2ef in we_use_microdescriptors_for_circuits (
    options=0x134681d2f7a0) at src/or/microdesc.c:924
#7  0x000013438bb0d3e9 in usable_consensus_flavor () at src/or/microdesc.c:961
#8  0x000013438bb102e8 in networkstatus_consensus_is_bootstrapping (
    now=1473280922) at src/or/networkstatus.c:1249
#9  0x000013438bc019b2 in find_dl_schedule (dls=0x13438c0185d0, 
    options=0x134681d2f7a0) at src/or/directory.c:3732
#10 0x000013438bc020d0 in download_status_reset (dls=0x13438c0185d0)
    at src/or/directory.c:3950
#11 0x000013438bb114bc in networkstatus_set_current_consensus (
    consensus=0x13468873f000 "network-status-version 3 microdesc\nvote-status consensus\nconsensus-method 20\nvalid-after 2016-09-07 20:00:00\nfresh-until 2016-09-07 21:00:00\nvalid-until 2016-09-07 23:00:00\nvoting-delay 300 300\nclient"..., flavor=0x1345e6fb8470 "microdesc", flags=0) at src/or/networkstatus.c:1679
#12 0x000013438bbfba02 in connection_dir_client_reached_eof (
    conn=0x1346506c2500) at src/or/directory.c:2009
#13 0x000013438bbfda9a in connection_dir_reached_eof (conn=0x1346506c2500)
    at src/or/directory.c:2471
#14 0x000013438bbd32e9 in connection_reached_eof (conn=0x1346506c2500)
    at src/or/connection.c:4841
#15 0x000013438bbd058d in connection_handle_read_impl (conn=0x1346506c2500)
    at src/or/connection.c:3526
#16 0x000013438bbd05d9 in connection_handle_read (conn=0x1346506c2500)
    at src/or/connection.c:3541
#17 0x000013438bb031ec in conn_read_callback (fd=-1, event=2, 
    _conn=0x1346506c2500) at src/or/main.c:803
#18 0x0000134603284cbe in event_base_loop ()
   from /usr/local/lib/libevent_core.so.1.1
#19 0x000013438bb06397 in run_main_loop_once () at src/or/main.c:2543
#20 0x000013438bb064da in run_main_loop_until_done () at src/or/main.c:2589
#21 0x000013438bb062b7 in do_main_loop () at src/or/main.c:2515
#22 0x000013438bb0a0e5 in tor_main (argc=16, argv=0x7f7ffffc01b8)
    at src/or/main.c:3646
#23 0x000013438bb01f3f in main (argc=16, argv=0x7f7ffffc01b8)
    at src/or/tor_main.c:30
(gdb) up
#1  0x000013438bc3501c in tor_addr_is_null (addr=0x1345c7c3ff58)
    at src/common/address.c:871
871	  switch (tor_addr_family(addr)) {
(gdb) up
#2  0x000013438bc3526e in tor_addr_is_valid (addr=0x1345c7c3ff58, 
    for_listening=0) at src/common/address.c:932
932	  return !tor_addr_is_null(addr);
(gdb) up
#3  0x000013438bb1575f in node_get_all_orports (node=0x1345c21f6000)
    at src/or/nodelist.c:836
836	    SL_ADD_NEW_IPV6_AP(node->rs, ipv6_orport, sl, valid);
(gdb) print node->rs
$16 = (routerstatus_t *) 0x1345c7c3ff00
(gdb) print *node->rs
Cannot access memory at address 0x1345c7c3ff00
(gdb) print node->ri
$18 = (routerinfo_t *) 0x134596a7aa00
(gdb) print *node->ri
$19 = {cache_info = {signed_descriptor_body = 0x0, annotations_len = 73, 
    signed_descriptor_len = 2223, 
    signed_descriptor_digest = "§À[º`?ø/\023\005ò\223»Q\004\223j\204íÌ", 
    identity_digest = "\232h¸Z\0021\217N~\207ò\202\2009ûÕ×[\001B", 
    published_on = 1473266407, 
    extra_info_digest = "¡ce8ÃÆ]ü\204^mà *º\220\021\205¹ä", 
    extra_info_digest256 = "¥m\n\231\234\003\230ý\021|ã\035hÊ\025b2 0ÐÐk/\217à\233ò\235\005ÇÇî", signing_key_cert = 0x1346133eb100, ei_dl_status = {
      next_attempt_at = 1473280814, n_download_failures = 0 '\0', 
      n_download_attempts = 0 '\0', schedule = DL_SCHED_GENERIC, 
      want_authority = DL_WANT_ANY_DIRSERVER, 
      increment_on = DL_SCHED_INCREMENT_FAILURE}, 
    saved_location = SAVED_IN_CACHE, saved_offset = 0, routerlist_index = 0, 
    last_listed_as_valid_until = 0, do_not_cache = 0, is_extrainfo = 0, 
    extrainfo_is_bogus = 0, send_unencrypted = 0}, 
  nickname = 0x13459bfe5820 "NYCBUG0", addr = 1114571284, or_port = 9001, 
  dir_port = 9030, ipv6_addr = {family = 0 '\0', addr = {dummy_ = 0, 
      in_addr = {s_addr = 0}, in6_addr = {__u6_addr = {
          __u6_addr8 = '\0' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 
            0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}, ipv6_orport = 0, 
  onion_pkey = 0x13465a3d8d20, identity_pkey = 0x134674ecf280, 
  onion_curve25519_pkey = 0x134643b73920, cert_expiration_time = 1473872400, 
  platform = 0x134643b739a0 "Tor 0.2.9.2-alpha on FreeBSD", 
  bandwidthrate = 10240000, bandwidthburst = 15360000, 
  bandwidthcapacity = 7341056, exit_policy = 0x134674ecfd40, 
  ipv6_exit_policy = 0x0, uptime = 3, declared_family = 0x134674ecffb0, 
  contact_info = 0x134643b79780 "Admin <mirror-admin AT nycbug DOT org>", 
  is_hibernating = 0, caches_extra_info = 0, allow_single_hop_exits = 0, 
  wants_to_be_hs_dir = 1, policy_is_reject_star = 1, 
  needs_retest_if_added = 0, supports_tunnelled_dir_requests = 1, 
  omit_from_vote = 0, purpose = 2 '\002'}
(gdb) print node
$20 = (const node_t *) 0x1345c21f6000
(gdb) print *node
$21 = {ht_ent = {hte_next = 0x0, hte_hash = 1201906925}, nodelist_idx = 0,
  identity = "\232hZ\0021\217N~\207202\2009[\001B", md = 0x13463eac4500,
  ri = 0x134596a7aa00, rs = 0x1345c7c3ff00, is_running = 1, is_valid = 1,
  is_fast = 1, is_stable = 1, is_possible_guard = 1, is_exit = 0,
  is_bad_exit = 0, is_hs_dir = 0, name_lookup_warned = 0, rejects_all = 0,
  using_as_guard = 0, ipv6_preferred = 0, country = 5, last_reachable = 0,
  last_reachable6 = 0}

I wish I had more details to offer so far that's all I have.

I've changed my malloc.conf(5) settings since the crash to see if any of the new debug features in
OpenBSD's malloc(3) implementation will catch anything (maybe use after free?):

attila@rotfl:~ 18:$ ls -l /etc/malloc.conf
lrwxr-xr-x  1 root  wheel  5 Sep  7 16:55 /etc/malloc.conf -> CFGJU

I've restarted and am hoping to cause this to occur again. Will update this ticket if I learn anything else. Bug me on IRC if you want (I'm attila on #tor-dev).

Child Tickets

Change History (29)

comment:1 Changed 11 months ago by cypherpunks

Component: Core TorCore Tor/Tor

comment:2 Changed 11 months ago by attila

After a few more hours of testing and screwing around I've found this is not hard to reproduce at all:

  1. start TBB;
  2. load a page (I've been using https://blog.torproject.org but I don't think it matters much);
  3. wait :-)

Under OpenBSD-current/amd64 as of the 5 Sept snap you'll eventually get a crash like the one I dissected above; there's a more recent snap and I'm working on upgrading to it.

I now have gdb attached to the last instance of tor that TBB started and am waiting for it to die so I can learn more, but it crashed for me overnight and the tail end of the logs might be interesting to someone who knows more than me (I cranked up logging to debug before having TBB restart tor):

...
Sep 08 15:54:58.000 [debug] relay_lookup_conn(): found conn for stream 23866.
Sep 08 15:54:58.000 [debug] circuit_receive_relay_cell(): Sending to origin.
Sep 08 15:54:58.000 [debug] connection_edge_process_relay_cell(): Now seen 3005 relay cells here (command 2, stream 23866).
Sep 08 15:54:58.000 [debug] connection_edge_process_relay_cell(): circ deliver_window now 966.
Sep 08 15:54:58.000 [debug] connection_or_process_cells_from_inbuf(): 24: starting, inbuf_datalen 514 (0 pending in tls object).
Sep 08 15:54:58.000 [debug] channel_queue_cell(): Directly handling incoming cell_t 0x7f7fffff4880 for channel 0x477f126c000 (global ID 3)
Sep 08 15:54:58.000 [debug] circuit_get_by_circid_channel_impl(): circuit_get_by_circid_channel_impl() returning circuit 0x477f126c800 for circ_id 2778626874, channel ID 3 (0x477f126c000)
Sep 08 15:54:58.000 [debug] relay_lookup_conn(): found conn for stream 23866.
Sep 08 15:54:58.000 [debug] circuit_receive_relay_cell(): Sending to origin.
Sep 08 15:54:58.000 [debug] connection_edge_process_relay_cell(): Now seen 3006 relay cells here (command 3, stream 23866).
Sep 08 15:54:58.000 [info] connection_edge_process_relay_cell(): -1: end cell (closed normally) for stream 23866. Removing stream.
Sep 08 15:54:58.000 [debug] connection_or_process_cells_from_inbuf(): 24: starting, inbuf_datalen 0 (0 pending in tls object).
Sep 08 15:54:58.000 [debug] conn_close_if_marked(): Cleaning up connection (fd -
Sep 08 15:54:58.000 [debug] conn_close_if_marked(): Flushed last 2115 bytes from a linked conn; 0 left; flushlen 0; wants-to-flush==0
Sep 08 15:54:58.000 [debug] circuit_detach_stream(): Removing stream 23866 from circ 2778626874
Sep 08 15:54:58.000 [debug] connection_remove(): removing socket -1 (type Socks), n_conns now 8
Sep 08 15:54:58.000 [info] connection_free_(): Freeing linked Socks connection [open] with 0 bytes on inbuf, 0 on outbuf.
Sep 08 15:54:58.000 [debug] conn_read_callback(): socket -1 wants to read.
Sep 08 15:54:58.000 [debug] fetch_from_buf_http(): headerlen 198, bodylen 612109.
Sep 08 15:54:58.000 [debug] connection_dir_client_reached_eof(): Received response from directory server '66.111.2.20:9001': 200 "OK" (purpose: 14)
Sep 08 15:54:58.000 [debug] router_new_address_suggestion(): Got X-Your-Address-Is: my.home.ip.address
Sep 08 15:54:58.000 [debug] connection_dir_client_reached_eof(): Time on received directory is within tolerance; we are 0 seconds skewed.  (That's okay.)
Sep 08 15:54:58.000 [info] connection_dir_client_reached_eof(): Received consensus directory (size 1403277) from server '66.111.2.20:9001'
Sep 08 15:54:58.000 [info] A consensus needs 5 good signatures from recognized authorities for us to accept it. This one has 8 (dannenberg tor26 longclaw maatuska moria1 dizum gabelmoo Faravahar).

This last message is the same message that appeared in the log from the original crash that George called to my attention (which I forgot to mention in the initial ticket, sorry), which ended thus:

Sep 07 09:57:05.000 [debug] connection_dir_client_reached_eof(): Received response from directory server '66.111.2.20:9001': 200 "OK" (purpose: 14)
Sep 07 09:57:05.000 [debug] router_new_address_suggestion(): Got X-Your-Address-Is: a.b.c.d
Sep 07 09:57:05.000 [debug] connection_dir_client_reached_eof(): Time on received directory is within tolerance; we are -3 seconds skewed.  (That's okay.)
Sep 07 09:57:05.000 [info] connection_dir_client_reached_eof(): Received consensus directory (size 1401858) from server '66.111.2.20:9001'
Sep 07 09:57:05.000 [info] A consensus needs 5 good signatures from recognized authorities for us to accept it. This one has 8 (dannenberg tor26 longclaw maatuska moria1 dizum gabelmoo Faravahar).

One more note: since I'm in Mexico I have to use known bridges to get onto Tor. I would like to do something about this in the future, but for now it should be noted that my torrc for TBB looks like this:

# This file was generated by Tor; if you edit it, comments will not be preserved
# The old torrc file was renamed to torrc.orig.1 or similar, and Tor will ignore it

Bridge 66.111.2.16:9001
Bridge 66.111.2.20:9001
DataDirectory /home/attila/TorBrowser-Data/Browser/tor_data
HiddenServiceStatistics 0
UseBridges 1
Log debug file /home/attila/tmp/tor-debug.log

If anyone wants to play with this you can find packages for the latest OpenBSD-current/amd64 snapshot here temporarily: https://bits.haqistan.net/~tdp/amd64. Those are only the packages necessary to install this latest test build of TBB on OpenBSD/amd64. If you're on a fresh -current install you'll need the run dependencies as well. I put a list of them in https://bits.haqistan.net/~tdp/amd64/full-run-depends.txt to make it simple. If you were to download all the files in that directory onto your current/amd64 box/vm the following would install them (assuming they are in .):

$ doas pkg_add -l full-run-depends.txt -z
$ doas pkg_add *.tgz

Hopefully my gdb session will kick out a segfault at some point and maybe I can see more. The two logs from crashes I have are rather large but if someone wants them I can put them somewhere.

comment:3 Changed 11 months ago by attila

I should also point out that all of the packages for testing that I point at on bits.haqistan.net are of course unsigned. They are only for testing. They are not official anything and we don't sign the binary packages we produce for testing. The pkg_add command will complain about this and ask you to confirm that you are installing unsigned packages. This is normal.

comment:4 Changed 11 months ago by attila

Summary: Difficult-to-reproduce crash on OpenBSD: tor invoked from Tor Browser 6.0.4Crash on OpenBSD: tor invoked from Tor Browser 6.0.4

comment:5 Changed 11 months ago by rubiate

I've been able to reproduce this (on OpenBSD) a bit more easily than what attila described.

Run tor normally and with a standard config, but with UseBridges enabled and the 2 relays attila posted as bridges. Start and stop tor and it will eventually produce this same crash on startup while updating the consensus.

Running this gives the crash usually within about ~30 minutes:

#!/bin/sh                                                                       
while [ ! -e tor.core ] ; do                                                    
  tor --ignore-missing-torrc -f "" --bridge 66.111.2.16:9001 --bridge 66.111.2.20:9001 --usebridges 1 &
  sleep 20                                                                      
  kill -TERM $!                                                                 
  sleep 3                                                                       
done

I couldn't reproduce this on Debian (with 0.2.8.7 compiled from source). Or on OpenBSD when I changed the bridge lines to use different relays (although I might be getting close to superstitious pigeon territory now).

comment:6 Changed 11 months ago by nickm

Keywords: bug regression 028-backport added
Milestone: Tor: 0.2.9.x-final

comment:7 Changed 11 months ago by nickm

Does this happen with older versions of Tor? Is it possible to figure out roughly when this started happening?

comment:8 Changed 11 months ago by nickm

Also, are you able to compile with --enable-expensive-hardening ? That configure flag turns on ubsan and asan if available, and can help diagnose memory corruption problems.

comment:9 Changed 11 months ago by rubiate

Reproduced on tor-0.2.8.2-alpha, could not reproduce on tor-0.2.8.1-alpha

There's no asan/ubsan support for openbsd unfortunately

checking whether the compiler accepts -fsanitize=address... no
checking whether the compiler accepts -fsanitize=undefined... no

comment:10 Changed 11 months ago by rubiate

Did some more digging.

What's up with the consensus when using the .20 relay (NYCBUG0) as a bridge?

network-status-version 3 microdesc\nvote-status consensus\nconsensus-method 20\nvalid-after 2016-09-08 19:00:00\nfresh-until 2016-09-08 20:00:00\nvalid-until 2016-09-08 22:00:00

Tor says the clock is fine:

[debug] connection_dir_client_reached_eof(): Time on received directory is within tolerance; we are -2 seconds skewed. (That's okay.)
[info] connection_dir_client_reached_eof(): Received consensus directory (size 1404160) from server '66.111.2.20:9001'

Whatever the cause, I think this is what is exposing the bug.

Before the crash happens, networkstatus_vote_free(current_md_consensus) on src/or/networkstatus.c:1753 is reached. This calls routerstatus_free(rs) (src/or/networkstatus.c:319) on everything in the routerlist. I added some logging to see what it's doing:

[... bajillion lines trimmed...]
routerstatus_free: 0x167ecf8fa700
routerstatus_free: 0x167e5e425e00
routerstatus_free: 0x167ecf8fab00
routerstatus_free: 0x167e91b76a00
routerstatus_free: 0x167ecf8fa100
[...bajillion lines trimmed...]
Segmentation fault (core dumped)

$ gdb tor/src/or/tor tor.core
(gdb) up 2
(gdb) print *node->rs
$1 = (routerstatus_t *) 0x167ecf8fab00

I'm hoping that NYCBUG relay stays broken for now so I can investigate further, and hopefully figure out why this seems to only happen on OpenBSD.

And well done to atilla on having the specific config to trigger this :-)

comment:11 Changed 11 months ago by rubiate

Bah, I'm slow. Of course, it works the same everywhere, just the results are different. On OpenBSD the memory is read protected after it's freed, hence crashes.

I should have compiled it ASAN on Debian (doh, that's probably what you meant), would've worked this out faster.

==12100==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0004adf78 at pc 0x7f5185128426 bp 0x7ffed0454d70 sp 0x7ffed0454d68
READ of size 2 at 0x60e0004adf78 thread T0
    #0 0x7f5185128425 in tor_addr_family src/common/address.h:155
    #1 0x7f5185128425 in tor_addr_is_null src/common/address.c:871
    #2 0x7f5185128868 in tor_addr_is_valid src/common/address.c:932
    #3 0x7f5184e4f23b in node_get_all_orports src/or/nodelist.c:838
    #4 0x7f518510625a in node_is_a_configured_bridge src/or/entrynodes.c:1871
    #5 0x7f5185112d1a in any_bridge_supports_microdescriptors src/or/entrynodes.c:2487
    #6 0x7f5184e39499 in we_use_microdescriptors_for_circuits src/or/microdesc.c:924
    #7 0x7f5184e397c3 in usable_consensus_flavor src/or/microdesc.c:961
    #8 0x7f5184e3fe8f in networkstatus_consensus_is_bootstrapping src/or/networkstatus.c:1257
    #9 0x7f51850977da in find_dl_schedule src/or/directory.c:3731
    #10 0x7f51850a005e in download_status_reset src/or/directory.c:3950
    #11 0x7f5184e43cd0 in networkstatus_set_current_consensus src/or/networkstatus.c:1690
    #12 0x7f51850a2a4c in connection_dir_client_reached_eof src/or/directory.c:2009
    #13 0x7f51850a72a9 in connection_dir_reached_eof src/or/directory.c:2471
    #14 0x7f5185049d7e in connection_reached_eof src/or/connection.c:4841
    #15 0x7f5185049d7e in connection_handle_read_impl src/or/connection.c:3528
    #16 0x7f5184e24dd7 in conn_read_callback src/or/main.c:803
    #17 0x7f51830693db in event_base_loop (/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5+0x103db)
    #18 0x7f5184e26606 in run_main_loop_once src/or/main.c:2543
    #19 0x7f5184e26606 in run_main_loop_until_done src/or/main.c:2589
    #20 0x7f5184e26606 in do_main_loop src/or/main.c:2515
    #21 0x7f5184e2be04 in tor_main src/or/main.c:3646
    #22 0x7f5184e198cb in main src/or/tor_main.c:30
    #23 0x7f518158ab44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)
    #24 0x7f5184e1c28a (tor/src/or/tor+0x56528a)

0x60e0004adf78 is located 88 bytes inside of 160-byte region [0x60e0004adf20,0x60e0004adfc0)
freed by thread T0 here:
    #0 0x7f5183811527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    #1 0x7f5184e3b9ea in networkstatus_vote_free src/or/networkstatus.c:320
    #2 0x7f5184e43915 in networkstatus_set_current_consensus src/or/networkstatus.c:1662
    #3 0x7f51850a2a4c in connection_dir_client_reached_eof src/or/directory.c:2009
    #4 0x7f51850a72a9 in connection_dir_reached_eof src/or/directory.c:2471
    #5 0x7f5185049d7e in connection_reached_eof src/or/connection.c:4841
    #6 0x7f5185049d7e in connection_handle_read_impl src/or/connection.c:3528
    #7 0x7f5184e24dd7 in conn_read_callback src/or/main.c:803
    #8 0x7f51830693db in event_base_loop (/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5+0x103db)

comment:12 Changed 11 months ago by nickm

Okay, it looks like there's a logic error inside networkstatus_set_current_consensus().

comment:13 Changed 11 months ago by nickm

Oh MAN. When we free the consensus earlier (line 1662) in networkstatus_vote_free, we don't we don't invalidate all the routerstatus_t objects that the node_t structures point to. But they are used deep inside download_status_reset(). Tricky!

Could you tell me what exact version of Tor you were testing on debian above? I want to make sure that it is "networkstatus_free(current_md_consensus)" that's on line src/or/networkstatus.c:1662 , not some other networkstatus_free().


comment:14 Changed 11 months ago by nickm

Branch bug20103_028 might fix this. Before I put it in needs_review, though, it would be good to have it get testing. (I could also use an answer to my question above about the exact Tor version)

comment:15 Changed 11 months ago by rubiate

It was 0.2.8.7 until I added debugging statements so that's probably not helpful. The line is "networkstatus_vote_free(current_md_consensus)" which is really on src/or/networkstatus.c:1651

Here it is with proper line numbers. This is from the tor-0.2.8.7 tag, or "Tor v0.2.8.7 (git-263088633a63982a)".

freed by thread T0 here:
    #0 0x7f7f8ef24527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    #1 0x7f7f9054e9f9 in networkstatus_vote_free src/or/networkstatus.c:313
    #2 0x7f7f90556563 in networkstatus_set_current_consensus src/or/networkstatus.c:1651
    #3 0x7f7f907b568c in connection_dir_client_reached_eof src/or/directory.c:2009
    #4 0x7f7f907b9ee9 in connection_dir_reached_eof src/or/directory.c:2471
    #5 0x7f7f9075c9be in connection_reached_eof src/or/connection.c:4841
    #6 0x7f7f9075c9be in connection_handle_read_impl src/or/connection.c:3528
    #7 0x7f7f90537b67 in conn_read_callback src/or/main.c:803
    #8 0x7f7f8e77c3db in event_base_loop (/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5+0x103db)

comment:16 Changed 11 months ago by rubiate

Still crashes with the bug20103_028 branch.

==17092==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0004d8bb8 at pc 0x7fd113288016 bp 0x7ffc5d960c30 sp 0x7ffc5d960c28
READ of size 2 at 0x60e0004d8bb8 thread T0
    #0 0x7fd113288015 in tor_addr_family src/common/address.h:155
    #1 0x7fd113288015 in tor_addr_is_null src/common/address.c:871
    #2 0x7fd113288458 in tor_addr_is_valid src/common/address.c:932
    #3 0x7fd112faee5b in node_get_all_orports src/or/nodelist.c:836
    #4 0x7fd113265e4a in node_is_a_configured_bridge src/or/entrynodes.c:1871
    #5 0x7fd11327290a in any_bridge_supports_microdescriptors src/or/entrynodes.c:2487
    #6 0x7fd112f99229 in we_use_microdescriptors_for_circuits src/or/microdesc.c:924
    #7 0x7fd112f99553 in usable_consensus_flavor src/or/microdesc.c:961
    #8 0x7fd112fa32ae in networkstatus_set_current_consensus src/or/networkstatus.c:1686
    #9 0x7fd11320263c in connection_dir_client_reached_eof src/or/directory.c:2009
    #10 0x7fd113206e99 in connection_dir_reached_eof src/or/directory.c:2471
    #11 0x7fd1131a996e in connection_reached_eof src/or/connection.c:4841
    #12 0x7fd1131a996e in connection_handle_read_impl src/or/connection.c:3528
    #13 0x7fd112f84b67 in conn_read_callback src/or/main.c:803
    #14 0x7fd1111c93db in event_base_loop (/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5+0x103db)
    #15 0x7fd112f86396 in run_main_loop_once src/or/main.c:2543
    #16 0x7fd112f86396 in run_main_loop_until_done src/or/main.c:2589
    #17 0x7fd112f86396 in do_main_loop src/or/main.c:2515
    #18 0x7fd112f8bb94 in tor_main src/or/main.c:3646
    #19 0x7fd112f7965b in main src/or/tor_main.c:30
    #20 0x7fd10f6eab44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)
    #21 0x7fd112f7c01a (tor/src/or/tor+0x56501a)

0x60e0004d8bb8 is located 88 bytes inside of 160-byte region [0x60e0004d8b60,0x60e0004d8c00)
freed by thread T0 here:
    #0 0x7fd111971527 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x54527)
    #1 0x7fd112f9b9f9 in networkstatus_vote_free src/or/networkstatus.c:313
    #2 0x7fd112fa357a in networkstatus_set_current_consensus src/or/networkstatus.c:1660
    #3 0x7fd11320263c in connection_dir_client_reached_eof src/or/directory.c:2009
    #4 0x7fd113206e99 in connection_dir_reached_eof src/or/directory.c:2471
    #5 0x7fd1131a996e in connection_reached_eof src/or/connection.c:4841
    #6 0x7fd1131a996e in connection_handle_read_impl src/or/connection.c:3528
    #7 0x7fd112f84b67 in conn_read_callback src/or/main.c:803
    #8 0x7fd1111c93db in event_base_loop (/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5+0x103db)

comment:17 Changed 11 months ago by nickm

Actually, that _is_ an improvement: the crash is in usable_consensus_flavor() now, not download_status_reset(). :)

But try my branch bug20103_028_v2. Is that any better?

comment:18 Changed 11 months ago by rubiate

The same, but different:

==9107==ERROR: AddressSanitizer: heap-use-after-free on address 0x60e0004e3b98 at pc 0x7fb130756e46 bp 0x7ffc37f6ce60 sp 0x7ffc37f6ce58
READ of size 2 at 0x60e0004e3b98 thread T0
    #0 0x7fb130756e45 in tor_addr_family src/common/address.h:155
    #1 0x7fb130756e45 in tor_addr_is_null src/common/address.c:871
    #2 0x7fb130757288 in tor_addr_is_valid src/common/address.c:932
    #3 0x7fb13047dc6b in node_get_all_orports src/or/nodelist.c:836
    #4 0x7fb130734c7a in node_is_a_configured_bridge src/or/entrynodes.c:1871
    #5 0x7fb13074173a in any_bridge_supports_microdescriptors src/or/entrynodes.c:2487
    #6 0x7fb130468229 in we_use_microdescriptors_for_circuits src/or/microdesc.c:924
    #7 0x7fb1304728ec in networkstatus_set_current_consensus src/or/networkstatus.c:1680
    #8 0x7fb1306d146c in connection_dir_client_reached_eof src/or/directory.c:2009
    #9 0x7fb1306d5cc9 in connection_dir_reached_eof src/or/directory.c:2471
    #10 0x7fb13067879e in connection_reached_eof src/or/connection.c:4841
    #11 0x7fb13067879e in connection_handle_read_impl src/or/connection.c:3528
    #12 0x7fb130453b67 in conn_read_callback src/or/main.c:803
    #13 0x7fb12e6983db in event_base_loop (/usr/lib/x86_64-linux-gnu/libevent-2.0.so.5+0x103db)
    #14 0x7fb130455396 in run_main_loop_once src/or/main.c:2543
    #15 0x7fb130455396 in run_main_loop_until_done src/or/main.c:2589
    #16 0x7fb130455396 in do_main_loop src/or/main.c:2515
    #17 0x7fb13045ab94 in tor_main src/or/main.c:3646
    #18 0x7fb13044865b in main src/or/tor_main.c:30
    #19 0x7fb12cbb9b44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b44)
    #20 0x7fb13044b01a (tor/src/or/tor+0x56501a)

comment:19 Changed 11 months ago by nickm

closer and closer. Try bug20103_028_v2 again ? I just pushed another commit.

comment:20 Changed 11 months ago by rubiate

I think that did it. Been running in a start-stop loop for over 45 minutes on Debian and OpenBSD with no crash. Without the latest patch it crashes within a few minutes, so looks promising.

comment:21 Changed 11 months ago by nickm

Status: newneeds_review

Okay. I've cleaned it up into a bug20103_028_v3 branch, with a real commit message and a big pile of analysis. Needs code review!

comment:22 Changed 11 months ago by arma

I looked over the patch very briefly and it looks plausible (and also complicated in its effects).

Assuming for the moment that it is the right patch though: are there things we should do to remove this trap for future developers? Maybe a huge comment would be an easy first step? And maybe "precompute the answer to what that macro was about, and locate where in the code the answer might change, and only change it then" as another step?

comment:23 Changed 11 months ago by nickm

are there things we should do to remove this trap for future developers?

+1 on those, but let's call it another ticket.

comment:24 in reply to:  23 Changed 11 months ago by teor

The patch looks sensible to me. And it has received some testing on OpenBSD, so that's good.

Replying to nickm:

are there things we should do to remove this trap for future developers?

+1 on those, but let's call it another ticket.

Don't we have a handle abstraction sitting around somewhere?
Isn't it exactly what we want here?
(Of course, that means re-writing every rs access, right?)

comment:25 Changed 11 months ago by nickm

What if we updated all the nodes to point to the new consensus _before_ we freed the old one?

comment:26 Changed 11 months ago by nickm

(Merged the patch above to 0.2.8 and master.)

comment:27 Changed 11 months ago by nickm

I've created #20191 to track the "make this code safer" task.

comment:28 Changed 11 months ago by nickm

Actual Points: 1
Keywords: TorCoreTeam201609 added
Milestone: Tor: 0.2.9.x-finalTor: 0.2.8.x-final
Owner: set to nickm
Status: needs_reviewaccepted

comment:29 Changed 11 months ago by nickm

Resolution: fixed
Status: acceptedclosed
Note: See TracTickets for help on using tickets.