Opened 10 years ago

Closed 9 years ago

Last modified 7 years ago

#1141 closed defect (fixed)

bug in networkstatus.c using 0.2.1.20

Reported by: mikeb Owned by:
Priority: Low Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: 0.2.1.20
Severity: Keywords:
Cc: mikeb, Sebastian, nickm, arma Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

Here is the config from the Tor client where the crash is happening:

TestingTorNetwork 1
DirServer bridgedirauth v3ident=3C1DA30FB6C8760E43C89ED36A700F429C5763E5 orport=443 172.16.234.136:995 B8F4F6B623E787C47A70111E7E5DE4A53A7A1840
DataDirectory /root/.tor/
UseBridges 1
Bridge 172.16.234.136:443
ClientOnly 1

Here is the output from Tor when the bug happens:

(gdb) run -f torrc log info
Starting program: /root/tor-0.2.1.20/src/or/tor -f torrc log info
Oct 31 19:18:12.716 [notice] Tor v0.2.1.20. This is experimental software. Do not rely on it for strong anonymity. (Running on OpenBSD amd64)
Oct 31 19:18:12.720 [warn] You have used DirServer or AlternateDirAuthority to specify alternate directory authorities in your configuration. This is potentially dangerous: it can make you look different from all other Tor users, and hurt your anonymity. Even if you've specified the same authorities as Tor uses by default, the defaults could change in the future. Be sure you know what you're doing.
Oct 31 19:18:12.720 [warn] TestingTorNetwork is set. This will make your node almost unusable in the public Tor network, and is therefore only advised if you are building a testing Tor network!
Oct 31 19:18:12.721 [notice] Initialized libevent version 1.3e using method kqueue. Good.
Oct 31 19:18:12.721 [notice] Opening Socks listener on 127.0.0.1:9050
Oct 31 19:18:12.723 [info] tor_lockfile_lock(): Locking "/root/.torlock"
Oct 31 19:18:12.723 [info] or_state_load(): Loaded state from "/root/.tor
state"
Oct 31 19:18:12.724 [info] read_file_to_str(): Could not open "/root/.torrouter-stability": No such file or directory
Oct 31 19:18:12.725 [info] geoip_load_file(): Failed to open GEOIP file /usr/local/share/tor/geoip.
Oct 31 19:18:12.725 [warn] You are running Tor as root. You don't need to, and you probably shouldn't.
Oct 31 19:18:12.744 [info] crypto_seed_rng(): Seeding RNG from "/dev/srandom"
Oct 31 19:18:12.798 [info] Bootstrapped 0%: Starting.
Oct 31 19:18:12.800 [info] trusted_dirs_load_certs_from_string(): Adding cached certificate for directory authority bridgedirauth with signing key F554586AA4C6DFB571A33EEBA89D9D48A94B618B
Oct 31 19:18:12.800 [info] read_file_to_str(): Could not open "/root/.tor
cached-consensus": No such file or directory
Oct 31 19:18:12.800 [info] read_file_to_str(): Could not open "/root/.torunverified-consensus": No such file or directory
Oct 31 19:18:12.800 [info] read_file_to_str(): Could not open "/usr/local/share/tor/fallback-consensus": No such file or directory
Oct 31 19:18:12.801 [info] should_delay_dir_fetches(): delaying dir fetches (no running bridges known)
Oct 31 19:18:12.801 [info] router_load_routers_from_string(): 1 elements to add
Oct 31 19:18:12.801 [info] add_an_entry_guard(): Chose 'bridgedirauth' as new entry guard.
Oct 31 19:18:12.802 [info] log_entry_guards(): bridgedirauth (down never-contacted)
Oct 31 19:18:12.802 [notice] new bridge descriptor 'bridgedirauth' (cached)
Oct 31 19:18:12.802 [info] update_consensus_networkstatus_downloads(): Launching networkstatus consensus download.
Oct 31 19:18:12.802 [info] connection_ap_make_link(): Making internal direct tunnel to [scrubbed]:443 ...
Oct 31 19:18:12.803 [info] onion_pick_cpath_exit(): Using requested exit node 'bridgedirauth'
Oct 31 19:18:12.803 [info] circuit_handle_first_hop(): Next router is [scrubbed]: Router not connected (nothing is). Connecting.
Oct 31 19:18:12.803 [notice] Bootstrapped 5%: Connecting to directory server.
Oct 31 19:18:12.803 [info] get_interface_address6(): connect() failed: Invalid argument
Oct 31 19:18:12.804 [info] get_interface_address6(): connect() failed: Invalid argument
Oct 31 19:18:12.804 [info] connection_ap_make_link(): ... application connection created and linked.
Oct 31 19:18:12.804 [info] directory_send_command(): Downloading consensus from 172.16.234.136:443 using /tor/status-vote/current/consensus.z
Oct 31 19:18:12.805 [info] router_load_routers_from_string(): 0 elements to add
Oct 31 19:18:12.805 [info] tor_mmap_file(): Could not open "/root/.tor
cached-extrainfo" for mmap(): No such file or directory
Oct 31 19:18:12.805 [notice] I learned some more directory information, but not enough to build a circuit: We have no network-status consensus.
Oct 31 19:18:12.806 [info] connection_ap_make_link(): Making internal direct tunnel to [scrubbed]:443 ...
Oct 31 19:18:12.806 [info] onion_pick_cpath_exit(): Using requested exit node '0000000000000000000000000000000000000000'
Oct 31 19:18:12.806 [info] circuit_handle_first_hop(): Next router is [scrubbed]: Not connected. Connecting.
Oct 31 19:18:12.806 [info] get_interface_address6(): connect() failed: Invalid argument
Oct 31 19:18:12.807 [info] get_interface_address6(): connect() failed: Invalid argument
Oct 31 19:18:12.807 [info] connection_ap_make_link(): ... application connection created and linked.
Oct 31 19:18:12.809 [info] or_state_save(): Saved state to "/root/.torstate"
Oct 31 19:18:12.809 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.
Oct 31 19:18:12.810 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.
Oct 31 19:18:12.810 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.
Oct 31 19:18:12.810 [info] connection_edge_process_inbuf(): data from edge while in 'waiting for circuit' state. Leaving it on buffer.
Oct 31 19:18:12.810 [notice] Bootstrapped 10%: Finishing handshake with directory server.
Oct 31 19:18:12.866 [info] connection_or_check_valid_tls_handshake(): Connected to router $B8F4F6B623E787C47A70111E7E5DE4A53A7A1840 at 172.16.234.136:443 without knowing its key. Hoping for the best.
Oct 31 19:18:12.867 [info] command_process_versions_cell(): Negotiated version 2 with [scrubbed]:443; sending NETINFO.
Oct 31 19:18:12.868 [notice] Bootstrapped 15%: Establishing an encrypted directory connection.
Oct 31 19:18:12.868 [info] circuit_send_next_onion_skin(): First hop: finished sending CREATE_FAST cell to 'bridgedirauth'
Oct 31 19:18:12.868 [info] circuit_send_next_onion_skin(): First hop: finished sending CREATE_FAST cell to 'bridgedirauth'
Oct 31 19:18:12.868 [info] command_process_netinfo_cell(): Got good NETINFO cell from [scrubbed]:443; OR connection is now open, using protocol version 2
Oct 31 19:18:12.869 [info] command_process_versions_cell(): Negotiated version 2 with [scrubbed]:443; sending NETINFO.
Oct 31 19:18:12.869 [info] command_process_netinfo_cell(): Got good NETINFO cell from [scrubbed]:443; OR connection is now open, using protocol version 2
Oct 31 19:18:12.910 [info] circuit_finish_handshake(): Finished building fast circuit hop:
Oct 31 19:18:12.910 [info] exit circ (length 1, exit bridgedirauth): $B8F4F6B623E787C47A70111E7E5DE4A53A7A1840(open)
Oct 31 19:18:12.910 [info] circuit_send_next_onion_skin(): circuit built!
Oct 31 19:18:12.910 [notice] Bootstrapped 20%: Asking for networkstatus consensus.
Oct 31 19:18:12.910 [info] exit circ (length 1): $B8F4F6B623E787C47A70111E7E5DE4A53A7A1840(open)
Oct 31 19:18:12.911 [info] connection_ap_handshake_send_begin(): Address/port sent, ap socket -1, n_circ_id 23920
Oct 31 19:18:12.911 [info] circuit_finish_handshake(): Finished building fast circuit hop:
Oct 31 19:18:12.911 [info] exit circ (length 1, exit 0000000000000000000000000000000000000000): $0000000000000000000000000000000000000000(open)
Oct 31 19:18:12.912 [info] circuit_send_next_onion_skin(): circuit built!
Oct 31 19:18:12.912 [info] exit circ (length 1): $0000000000000000000000000000000000000000(open)
Oct 31 19:18:12.912 [info] connection_ap_handshake_send_begin(): Address/port sent, ap socket -1, n_circ_id 23921
Oct 31 19:18:12.913 [info] connection_edge_process_relay_cell_not_open(): 'connected' received after 0 seconds.
Oct 31 19:18:12.914 [info] exit circ (length 1): $B8F4F6B623E787C47A70111E7E5DE4A53A7A1840(open)
Oct 31 19:18:12.914 [notice] Bootstrapped 25%: Loading networkstatus consensus.
Oct 31 19:18:12.915 [info] connection_edge_process_relay_cell_not_open(): 'connected' received after 0 seconds.
Oct 31 19:18:12.915 [info] exit circ (length 1): $0000000000000000000000000000000000000000(open)
Oct 31 19:18:12.915 [notice] Bootstrapped 50%: Loading relay descriptors.
Oct 31 19:18:12.916 [info] connection_edge_process_relay_cell(): -1: end cell (closed normally) for stream 44628. Removing stream.
Oct 31 19:18:12.917 [info] _connection_free(): Freeing linked Socks connection [open] with 0 bytes on inbuf, 0 on outbuf.
Oct 31 19:18:12.918 [info] connection_dir_client_reached_eof(): Received consensus directory (size 1367) from server '172.16.234.136:443'
Oct 31 19:18:12.918 [info] 0 unknown, 0 missing key, 1 good, 0 bad, 0 no signature, 1 required
Oct 31 19:18:12.919 [err] Bug: networkstatus.c:1164: update_consensus_networkstatus_fetch_time: Assertion start+dl_interval < c->valid_until failed; aborting.
networkstatus.c:1164 update_consensus_networkstatus_fetch_time: Assertion start+dl_interval < c->valid_until failed; aborting.

Here is the backtrace from this crashed process:

(gdb) backtrace
#0 0x000000020c37137a in kill () from /usr/lib/libc.so.51.0
#1 0x000000020c3bd765 in abort () at /usr/src/lib/libc/stdlib/abort.c:68
#2 0x000000000045cb81 in update_consensus_networkstatus_fetch_time (now=1257016692) at networkstatus.c:1183
#3 0x000000000045d29d in networkstatus_set_current_consensus (

consensus=0x20f769800 "network-status-version 3\nvote-status consensus\nconsensus-method 5\nvalid-after 2009-10-31 19:15:00\nfresh-until 2009-10-31 19:20:00\nvalid-until 2009-10-31 19:30:00\nvoting-delay 20 20\nclient-versions \nse"..., flags=0) at networkstatus.c:1527

#4 0x000000000043feac in connection_dir_client_reached_eof (conn=0x2088f0d00) at directory.c:1638
#5 0x000000000044061e in connection_dir_reached_eof (conn=0x2088f0d00) at directory.c:2040
#6 0x000000000042626f in connection_handle_read (conn=0x2088f0d00) at connection.c:2037
#7 0x0000000000457b91 in conn_read_callback (fd=8833, event=6, _conn=0x37b7) at main.c:456
#8 0x000000020377f2e2 in event_process_active (base=0x201cfdc00) at /usr/src/lib/libevent/event.c:333
#9 0x000000020377f501 in event_base_loop (base=0x201cfdc00, flags=1) at /usr/src/lib/libevent/event.c:450
#10 0x000000020377f3b5 in event_loop (flags=8833) at /usr/src/lib/libevent/event.c:383
#11 0x0000000000459726 in do_main_loop () at main.c:1444
#12 0x000000000045a88b in tor_main (argc=5, argv=0x7f7fffff1850) at main.c:2070
#13 0x000000000040702c in _start ()
#14 0x0000000000000005 in ?? ()
#15 0x00007f7fffff1d00 in ?? ()
#16 0x00007f7fffff1d1e in ?? ()
#17 0x00007f7fffff1d21 in ?? ()
#18 0x00007f7fffff1d27 in ?? ()
#19 0x00007f7fffff1d2b in ?? ()
#20 0x0000000000000000 in ?? ()

The configs used for the servers in this network can be found here: http://pastebin.com/m65b5da00

I'd also be keen to hear if the configs I have for such a network actually make sense, and that I haven't forgotten something crucial when testing Tor.

Thanks

mikeb

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (10)

comment:1 Changed 10 years ago by mikeb

This crash doesn't always seem to happen.
Leaving all servers up for a while (yeah, scientific I know) and then starting the client again the crash doesn't happen. Perhaps there is a clue there for someone who knows something about Tor.

comment:2 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-final

If we can't figure out what's up here in time for 0.2.2.x, we should at least add better logging in the case where the assert is failing.

comment:3 Changed 9 years ago by mwenge

Assuming that the values reported in the trace are sane:
valid-after 2009-10-31 19:15:00
fresh-until 2009-10-31 19:20:00
valid-until 2009-10-31 19:30:00

then I think this is failing in the directory_fetches_dir_info_later(options) clause below:

start = c->fresh_until + (interval*3)/4;
/* But download the next one well before this one is expired. */
dl_interval = ((c->valid_until - start) * 7 )/ 8;

/* If we're a bridge user, make use of the numbers we just computed

  • to choose the rest of the interval *after* them. */

if (directory_fetches_dir_info_later(options)) {

/* Give all the *clients* enough time to download the consensus. */
start = start + dl_interval + CONSENSUS_MIN_SECONDS_BEFORE_CACHING;
/* But try to get it before ours actually expires. */
dl_interval = (c->valid_until - start) -

CONSENSUS_MIN_SECONDS_BEFORE_CACHING;

}

So:

valid-after 2009-10-31 19:15:00 - 0 sec
fresh-until 2009-10-31 19:20:00 - +300 sec
valid-until 2009-10-31 19:30:00 - +900 sec

Once we reach:

start = start + dl_interval + CONSENSUS_MIN_SECONDS_BEFORE_CACHING

start is +525 sec:

long interval = c->fresh_until - c->valid_after; ~ 300 sec
start = c->fresh_until + (interval*3)/4; ~ +525 sec

and dl_interval is 328 sec

dl_interval = ((c->valid_until - start) * 7 )/ 8; ~ (900 - 525 = 375) *7 / 8

so start is now 853 sec + CONSENSUS_MIN_SECONDS_BEFORE_CACHING sec = +973 sec

dl_interval = (c->valid_until - start) -

CONSENSUS_MIN_SECONDS_BEFORE_CACHING;

is 900 - 973 - 120 = -193 sec, so dl_interval gets set to 1:

if (dl_interval < 1)

dl_interval = 1;

so

tor_assert(start+dl_interval < c->valid_until);

fails because 974 is not less than 900.

I'm probably falling down somewhere here - but I can't see it so pressing submit!

comment:4 Changed 9 years ago by nickm

Description: modified (diff)

Hm. So it looks like the fix is to make it so that CONSENSUS_MIN_SECONDS_BEFORE_CACHEING gets scaled down on private networks with extremely low voting intervals.

mwenge, would you agree?

comment:5 in reply to:  4 Changed 9 years ago by mwenge

Replying to nickm:

Hm. So it looks like the fix is to make it so that CONSENSUS_MIN_SECONDS_BEFORE_CACHEING gets scaled down on private networks with extremely low voting intervals.

mwenge, would you agree?

I wasn't sure if those voting intervals were normal or not, but yes - 120 seconds is too high if the total valid time of the consensus is fifteen minutes.

120 seconds must be predicated on an assumed normal interval so I guess if that normal interval is specified somewhere CONSENSUS_MIN_SECONDS_BEFORE_CACHEING could ensure it is aligned with the actual interval appropriately.

comment:6 Changed 9 years ago by nickm

Status: newneeds_review

So I've got a possible fix for this (as a patch against maint-0.2.1) in my branch "bug1141" in my public repository.

comment:7 Changed 9 years ago by arma

Looks fine. Except a) the bugfix probably isn't on 2.0.8-alpha

b) this won't merge cleanly with master, since we added the FetchDirInfoExtraEarly config option:

      start = c->fresh_until + CONSENSUS_MIN_SECONDS_BEFORE_CACHING;
      /* Some clients may need the consensus sooner than others. */
      if (options->FetchDirInfoExtraEarly) {
        dl_interval = 60;
      } else {
        /* But only in the first half-interval after that. */
        dl_interval = interval/2;
      } 

So we should figure out what fraction we want to use in that case. Or I guess we could declare that that config option is not compatible with really short voting intervals.

comment:8 Changed 9 years ago by nickm

Tweaked and rebased as branch bug1141_v2. Looks better now?

comment:9 Changed 9 years ago by nickm

Resolution: Nonefixed
Status: needs_reviewclosed

Merged in maint-0.2.1 and master.

comment:10 Changed 7 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.