Opened 6 weeks ago

Closed 4 weeks ago

Last modified 9 days ago

#21242 closed defect (fixed)

Start-up error [onion_extend_cpath: Assertion info || client failed; aborting]

Reported by: Felixix Owned by: nickm
Priority: Medium Milestone: Tor: 0.3.0.x-final
Component: Core Tor/Tor Version: Tor: 0.3.0.1-alpha
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The following error message came up after reboot and start of multiple 0301 instances. Only one error, the others were fine. Since restart this instance works well.

Jan 15 08:58:45.000 [notice] Tor 0.3.0.1-alpha (git-ac04fcd2e758c225) opening log file.
Jan 15 08:58:45.960 [notice] Tor 0.3.0.1-alpha (git-ac04fcd2e758c225) running on FreeBSD with Libevent 2.0.22-stable, OpenSSL LibreSSL 2.4.4 and Zlib 1.2.8.
Jan 15 08:58:45.960 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Jan 15 08:58:45.960 [notice] This version is not a stable Tor release. Expect more bugs than usual.
Jan 15 08:58:45.971 [notice] Read configuration file "/usr/local/etc/tor/torrc".
Jan 15 08:58:45.977 [notice] Opening OR listener on lo.c.al.jail:1234
Jan 15 08:58:45.977 [notice] Opening Directory listener on lo.c.al.jail:5678
Jan 15 08:58:45.000 [notice] Parsing GEOIP IPv4 file /usr/local/share/tor/geoip.
Jan 15 08:58:46.000 [notice] Parsing GEOIP IPv6 file /usr/local/share/tor/geoip6.
Jan 15 08:58:46.000 [notice] Configured to measure statistics. Look for the *-stats files that will first be written to the data directory in 24 hours from now.
Jan 15 08:58:46.000 [notice] We were built to run on a 64-bit CPU, with OpenSSL 1.0.1 or later, but with a version of OpenSSL that apparently lacks accelerated support for the NIST P-224 and P-256 groups. Building openssl with such support (using the enable-ec_nistp_64_gcc_128 option when configuring it) would make ECDH much faster.
Jan 15 08:58:46.000 [notice] Your Tor server's identity key fingerprint is 'BeastieJoy62 A2E6BB5C391CD46B38C55B4329C35304540771F1'
Jan 15 08:58:46.000 [notice] Bootstrapped 0%: Starting
Jan 15 08:58:47.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
Jan 15 08:58:47.000 [notice] Bootstrapped 45%: Asking for relay descriptors
Jan 15 08:58:47.000 [notice] I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/7162, and can only build 0% of likely paths. (We have 0% of guards bw, 0% of midpoint bw, and 0% of exit bw = 0% of path bw.)
Jan 15 08:58:48.000 [notice] Bootstrapped 50%: Loading relay descriptors
Jan 15 08:58:48.000 [notice] Bootstrapped 55%: Loading relay descriptors
Jan 15 08:58:48.000 [notice] Bootstrapped 60%: Loading relay descriptors
Jan 15 08:58:49.000 [notice] Bootstrapped 65%: Loading relay descriptors
Jan 15 08:58:49.000 [notice] Bootstrapped 70%: Loading relay descriptors
Jan 15 08:58:49.000 [notice] Bootstrapped 75%: Loading relay descriptors
Jan 15 08:58:49.000 [notice] Bootstrapped 80%: Connecting to the Tor network
Jan 15 08:58:49.000 [notice] Starting with guard context "default"
Jan 15 08:58:49.000 [err] tor_assertion_failed_: Bug: src/or/circuitbuild.c:2395: onion_extend_cpath: Assertion info || client failed; aborting. (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug: Assertion info || client failed in onion_extend_cpath at src/or/circuitbuild.c:2395. Stack trace: (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x117ed78 <log_backtrace+0x48> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x1196ca7 <tor_assertion_failed_+0x97> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x10efdb6 <circuit_establish_circuit+0x11c6> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x1103a28 <circuit_launch_by_extend_info+0x1d8> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x1102805 <circuit_build_needed_circs+0x325> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x106a9ea <handle_signals+0x4fa> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x801ae63be <event_base_loop+0x81e> at /usr/local/lib/libevent-2.0.so.5 (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x106a331 <do_main_loop+0x621> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x106c704 <tor_main+0xe4> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x1068089 <main+0x19> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225)
Jan 15 08:58:49.000 [err] Bug:     0x1067f81 <_start+0x1a1> at /usr/local/bin/tor (on Tor 0.3.0.1-alpha ac04fcd2e758c225) 

--
Cheers, Felix

Child Tickets

Change History (20)

comment:1 Changed 6 weeks ago by cypherpunks

  • Component changed from - Select a component to Core Tor/Tor

comment:2 Changed 6 weeks ago by nickm

  • Milestone set to Tor: 0.3.0.x-final

comment:3 Changed 6 weeks ago by nickm

Can you share the contents of your configuration for this tor instance?

comment:4 Changed 6 weeks ago by dgoulet

  • Status changed from new to needs_information

comment:5 Changed 6 weeks ago by Felixix

The torrc file:

SocksPort 0
RunAsDaemon 1
Address ex.ter.na.l
ORPort ex.ter.na.l:443 NoListen
ORPort lo.c.al.jail:1234 NoAdvertise
DirPort ex.ter.na.l:993 NoListen
DirPort lo.c.al.jail:5678 NoAdvertise
Nickname ...
RelayBandwidthRate  15000 KB
RelayBandwidthBurst 50000 KB
ContactInfo ...
MyFamily ...
ExitPolicy reject *:*
HeartbeatPeriod 3600 seconds
#----AvoidDiskWrites 1
NUMCPUs 2
#----HardwareAccel 1
MaxMemInQueues 2 GB
#----DisableNetwork 1
Log notice file /var/log/tor/tor
EntryStatistics 1
ConnDirectionStatistics 1
#----DirPortFrontPage /var/mypage.html
ExtraInfoStatistics 1
#----DownloadExtraInfo 1

More?

comment:6 Changed 5 weeks ago by dgoulet

  • Status changed from needs_information to new

comment:7 Changed 4 weeks ago by nickm

  • Owner set to asn
  • Status changed from new to assigned

comment:8 Changed 4 weeks ago by asn

Thanks for the bug report.

I did some digging here to see what's going on. Unfortunately, I think we are missing info to 100% determine what happened. Some debug/info logs would help, but anyway here is what I found. tl;dr I think we are missing the desc or the md of the node in question, and the guard code does not care.

The code in question is:

  } else if (cur_len == 0) { /* picking first node */
    const node_t *r = choose_good_entry_server(purpose, state,
                                               &circ->guard_state);
    if (r) {
      /* If we're a client, use the preferred address rather than the
         primary address, for potentially connecting to an IPv6 OR
         port. Servers always want the primary (IPv4) address. */
      int client = (server_mode(get_options()) == 0);
      /* ! client */
      info = extend_info_from_node(r, client);
      /* Clients can fail to find an allowed address */
      /* edw to kako */
      tor_assert(info || client);
    }

In this bug report, we are dealing with a relay, so client is false above. Now let's dig into extend_info_from_node() to see what made it return NULL:

Given the fact that there are no warn logs, there are only two return NULL codepaths that we could have taken. Either this:

  if (node->ri == NULL && (node->rs == NULL || node->md == NULL))
    return NULL;

or the final:

  else
    return NULL;

IIUC, in the latter case we know that valid_addr was set (otherwise we would get warn), so we can only reach the above two codepaths if we are missing some crucial information about the node: either node->ri or node->rs or node->md.

So my theory is that perhaps the new guard node code is not making sure that nodes have descriptors or microdescriptors? I see that in the legacy guard code we used to do this in entry_is_live:

  if (need_descriptor && !node_has_descriptor(node)) {
    *msg = "no descriptor";
    return NULL;
  }

but we are not doing this anywhere in our current guard code. Could it be that this assert occurs when we are missing the descriptor of a node that is in the consensus (and we are in server mode)?

comment:9 Changed 4 weeks ago by nickm

Hmmm. That does seem like a sensible analysis. I'll work through the

If we are using a guard in that branch, it came from choose_good_entry_server(), which got it from guards_choose_guard() (I assume. Unless this isn't using a guard, in which case it called router_choose_random_node().)

Now, guards_choose_guard() calls entry_guard_pick_for_circuit(), which *should* yield a yields a negative result (and leave its node_t ** argument un-set) whenever node_get_by_id() fails. So we know that node_get_by_id() returned a node_t for this node. So either we have a descriptor for it, or we have a routerstatus entry for it. Let's assume we have a routerstatus entry but no descriptor.

comment:10 follow-up: Changed 4 weeks ago by nickm

  • Owner changed from asn to nickm

Okay, I *think* the right fix is something like what's in my bug21242 branch. Things get a little tricky here, since it's okay to select a guard with no descriptor when we're using it as a directory guard.

comment:11 Changed 4 weeks ago by nickm

  • Status changed from assigned to needs_review

comment:12 Changed 4 weeks ago by nickm

  • Summary changed from Start-up error to Start-up error [onion_extend_cpath: Assertion info || client failed; aborting]

comment:13 Changed 4 weeks ago by asn

  • Status changed from needs_review to merge_ready

The patch here looks good to me.

Two fine points:

a) Do you think the "wait till all primaries guard have descriptors" might delay bootup? I don't think so. If we are afraid of that perhaps we can switch logic to "wait till the first top maybe-reachable primary guard has descriptor".

b) Should these two new behaviors be mentioned in guard-spec.txt? I see that the spec mentions nothing about router descriptors right now.

Other than the above two comments, I'm switching this to merge_ready for the sake of progress.

comment:14 Changed 4 weeks ago by nickm

Thanks for the fast review!

a) Do you think the "wait till all primaries guard have descriptors" might delay bootup? I don't think so. If we are afraid of that perhaps we can switch logic to "wait till the first top maybe-reachable primary guard has descriptor".

I don't think so either; there should be no delay unless you happen to get unlucky in your timing of which descriptors download when. (Or if a hostile directory guard or fallback decides not to send you some descriptors. But in that case you probably do want to wait for them.)

Should these two new behaviors be mentioned in guard-spec.txt?

Yes! I'll make a patch for that too.

comment:15 follow-up: Changed 4 weeks ago by nickm

Should these two new behaviors be mentioned in guard-spec.txt?

Yes! I'll make a patch for that too.

I now have a feature21242 torspec patch as well.

comment:16 in reply to: ↑ 15 Changed 4 weeks ago by asn

Replying to nickm:

Should these two new behaviors be mentioned in guard-spec.txt?

Yes! I'll make a patch for that too.

I now have a feature21242 torspec patch as well.

Nice! LGTM.

Let's get this merged.

comment:17 Changed 4 weeks ago by nickm

  • Resolution set to fixed
  • Status changed from merge_ready to closed

woo, merged!

comment:18 in reply to: ↑ 10 ; follow-up: Changed 4 weeks ago by teor

Replying to nickm:

Okay, I *think* the right fix is something like what's in my bug21242 branch. Things get a little tricky here, since it's okay to select a guard with no descriptor when we're using it as a directory guard.

Actually, the rule is even more complex: it's okay to select a guard with no descriptor when we're using it as a directory guard, unless we're IPv6-only and using microdescriptors, in which case, we need the entry guards' microdescriptors from fallback directory mirrors in order to know their IPv6 addresses.

comment:19 in reply to: ↑ 18 Changed 4 weeks ago by teor

Replying to teor:

Replying to nickm:

Okay, I *think* the right fix is something like what's in my bug21242 branch. Things get a little tricky here, since it's okay to select a guard with no descriptor when we're using it as a directory guard.

Actually, the rule is even more complex: it's okay to select a guard with no descriptor when we're using it as a directory guard, unless we're IPv6-only and using microdescriptors, in which case, we need the entry guards' microdescriptors from fallback directory mirrors in order to know their IPv6 addresses.

(But in that case, we use hard-coded fallbacks, not this guard code.)

93b39c5162 should have made the other tor_assert(info); in the function non-fatal as well. Opened #21372 for this.

comment:20 Changed 9 days ago by asn

Received report from s7r that his Tor client delayed booting for 1 minute because of "missing primary guard descriptor". We should keep an eye out for similar issues.

Note: See TracTickets for help on using tickets.