Opened 5 months ago

Closed 3 months ago

#30871 closed defect (fixed)

circuit_mark_for_close_(): Bug: Duplicate call to circuit_mark_for_close at ../src/feature/hs/hs_service.c:3166 (first at ../src/feature/hs/hs_service.c:2385)

Reported by: s7r Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.4.0.x-final
Component: Core Tor/Tor Version: Tor: 0.4.1.2-alpha
Severity: Normal Keywords: consider-backport-after-0415, tor-hs double-mark 035-backport 040-backport 041-backport
Cc: asn Actual Points: 0.4
Parent ID: Points:
Reviewer: asn Sponsor: Sponsor27

Description

I experienced this today after normally turning on a machine that was off for physical movement. OS is Debian and Tor is set with systemctl to start on boot, so it's a normal out of the box boring setup. Internet was 100% working on the machine while it was booting up, and of course cable and correct network settings properly setup before powered on.

The Tor instance hosts 2 onion services, one v2 and one v3.

un 12 11:05:50.000 [notice] Tor 0.4.1.2-alpha-dev opening log file.
Jun 12 11:05:50.086 [notice] Tor 0.4.1.2-alpha-dev running on Linux with Libevent 2.0.21-stable, OpenSSL 1.1.0j, Zlib 1.2.8, Liblzma 5.2.2, and Libzstd 1.1.2.
Jun 12 11:05:50.086 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Jun 12 11:05:50.086 [notice] This version is not a stable Tor release. Expect more bugs than usual.
Jun 12 11:05:50.086 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc".
Jun 12 11:05:50.086 [notice] Read configuration file "/etc/tor/torrc".
Jun 12 11:05:50.089 [notice] Opening Socks listener on 127.0.0.1:9050
Jun 12 11:05:50.089 [notice] Opened Socks listener on 127.0.0.1:9050
Jun 12 11:05:50.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.
Jun 12 11:05:50.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.
Jun 12 11:05:50.000 [notice] Bootstrapped 0% (starting): Starting
Jun 12 11:05:50.000 [notice] Starting with guard context "default"
Jun 12 11:05:50.000 [notice] Signaled readiness to systemd
Jun 12 11:05:51.000 [notice] Opening Control listener on /run/tor/control
Jun 12 11:05:51.000 [notice] Opened Control listener on /run/tor/control
Jun 12 11:05:51.000 [notice] Bootstrapped 5% (conn): Connecting to a relay
Jun 12 11:05:52.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 1; recommendation warn; host E9423220AAE845EE4B693A7C4235787C05D56280 at 185.117.82.23:9001)
Jun 12 11:05:52.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 2; recommendation warn; host 6B9EA8927AB6E94E216067E65372182343A5AFA2 at 62.210.83.207:8080)
Jun 12 11:05:52.000 [warn] 1 connections have failed:
Jun 12 11:05:52.000 [warn]  1 connections died in state connect()ing with SSL state (No SSL object)
Jun 12 11:05:55.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 3; recommendation warn; host 96CAA917F65BCD62CECD236F67652BFD7C69E52E at 104.37.193.102:443)
Jun 12 11:05:55.000 [warn] 2 connections have failed:
Jun 12 11:05:55.000 [warn]  2 connections died in state connect()ing with SSL state (No SSL object)
Jun 12 11:05:55.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 4; recommendation warn; host 73C62311F5650010A4D32E09F141A0B574EFCAF5 at 51.38.140.87:9001)
Jun 12 11:05:55.000 [warn] 3 connections have failed:
Jun 12 11:05:55.000 [warn]  3 connections died in state connect()ing with SSL state (No SSL object)
Jun 12 11:05:58.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 5; recommendation warn; host 2E23B75B9D9AB7B86D1D5AB1C9B6B30BA0E84E3E at 148.251.51.66:9001)
Jun 12 11:05:58.000 [warn] 4 connections have failed:
Jun 12 11:05:58.000 [warn]  4 connections died in state connect()ing with SSL state (No SSL object)
Jun 12 11:05:58.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 6; recommendation warn; host 7BE683E65D48141321C5ED92F075C55364AC7123 at 193.23.244.244:443)
Jun 12 11:05:58.000 [warn] 5 connections have failed:
Jun 12 11:05:58.000 [warn]  5 connections died in state connect()ing with SSL state (No SSL object)
Jun 12 11:06:01.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 7; recommendation warn; host 268216C455A8322E07733961A29110110958D1BF at 73.211.181.17:9001)
Jun 12 11:06:01.000 [warn] 6 connections have failed:
Jun 12 11:06:01.000 [warn]  6 connections died in state connect()ing with SSL state (No SSL object)
Jun 12 11:06:06.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 8; recommendation warn; host 7EA6EAD6FD83083C538F44038BBFA077587DD755 at 194.109.206.212:443)
Jun 12 11:06:06.000 [warn] 7 connections have failed:
Jun 12 11:06:06.000 [warn]  7 connections died in state connect()ing with SSL state (No SSL object)
Jun 12 11:06:09.000 [warn] Problem bootstrapping. Stuck at 5% (conn): Connecting to a relay. (No route to host; NOROUTE; count 9; recommendation warn; host FD2F9B881AC640100C428DF47DC9A863DC2F2536 at 37.187.17.67:9001)
Jun 12 11:06:09.000 [warn] 8 connections have failed:
Jun 12 11:06:09.000 [warn]  8 connections died in state connect()ing with SSL state (No SSL object)
Jun 12 11:06:13.000 [notice] Bootstrapped 10% (conn_done): Connected to a relay
Jun 12 11:06:13.000 [notice] Bootstrapped 14% (handshake): Handshaking with a relay
Jun 12 11:06:13.000 [notice] Bootstrapped 15% (handshake_done): Handshake with a relay done
Jun 12 11:06:13.000 [notice] Bootstrapped 20% (onehop_create): Establishing an encrypted directory connection
Jun 12 11:06:13.000 [notice] Bootstrapped 25% (requesting_status): Asking for networkstatus consensus
Jun 12 11:06:13.000 [notice] Bootstrapped 30% (loading_status): Loading networkstatus consensus
Jun 12 11:06:14.000 [notice] Bootstrapped 75% (enough_dirinfo): Loaded enough directory info to build circuits
Jun 12 11:06:15.000 [notice] Bootstrapped 90% (ap_handshake_done): Handshake finished with a relay to build circuits
Jun 12 11:06:15.000 [notice] Bootstrapped 95% (circuit_create): Establishing a Tor circuit
Jun 12 11:06:16.000 [notice] Bootstrapped 100% (done): Done
Jun 12 11:06:21.000 [warn] Hidden service <XXX> exceeded launch limit with 10 intro points in the last 31 seconds. Intro circuit launches are limited to 10 per 300 seconds.
Jun 12 11:06:21.000 [warn] Service configured in "/var/lib/tor/<V2 service was here>":
Jun 12 11:06:21.000 [warn]   Intro point 0 at [scrubbed]: circuit is open
Jun 12 11:06:21.000 [warn]   Intro point 1 at [scrubbed]: circuit is open
Jun 12 11:06:21.000 [warn]   Intro point 2 at [scrubbed]: circuit is waiting to see how other guards perform
Jun 12 11:06:21.000 [warn]   Intro point 3 at [scrubbed]: circuit is waiting to see how other guards perform
Jun 12 11:06:21.000 [warn]   Intro point 4 at [scrubbed]: circuit is waiting to see how other guards perform
Jun 12 11:11:19.000 [warn] Unknown introduction point auth key on circuit 3339485114 for service [scrubbed]
Jun 12 11:11:19.000 [warn] circuit_mark_for_close_(): Bug: Duplicate call to circuit_mark_for_close at ../src/feature/hs/hs_service.c:3166 (first at ../src/feature/hs/hs_service.c:2385) (on Tor 0.4.1.2-alpha-dev )
Jun 12 11:11:34.000 [warn] Unknown introduction point auth key on circuit 3672980732 for service [scrubbed]
Jun 12 11:11:34.000 [warn] circuit_mark_for_close_(): Bug: Duplicate call to circuit_mark_for_close at ../src/feature/hs/hs_service.c:3166 (first at ../src/feature/hs/hs_service.c:2385) (on Tor 0.4.1.2-alpha-dev )
Jun 12 12:44:47.000 [warn] Fetching v2 rendezvous descriptor failed. Retrying at another directory.
Jun 12 12:44:49.000 [warn] Fetching v2 rendezvous descriptor failed. Retrying at another directory.
Jun 12 12:44:49.000 [notice] Closing stream for '[scrubbed].onion': hidden service is unavailable (try again later).

The machine was off for more than 24 hours. It was simply shut down normally while Tor was running previously using sudo poweroff. The consensus it had on disk when it booted wasn't up to date any more of course. Somehow it wanted to connect to the guards first, but could not, even it had nothing blocking it. This is why it reported the bootstrapping problem and NOROUTE, when it Bootstrapped 100% Done, these messages stopped. Either there is something broken in how Tor starts up in some scenarios, either the network interface was not raised by the OS until after Tor service started, and then it was able to bootstrapp? I have tried to ping and traceroute all the IP addresses of the Guards that were printed in the log, and they were all available and reachable to me.

The most concerning message is circuit_mark_for_close_(): Bug: Duplicate call to circuit_mark_for_close at ../src/feature/hs/hs_service.c:3166 (first at ../src/feature/hs/hs_service.c:2385) (on Tor 0.4.1.2-alpha-dev ) that needs fix.

Also: Unknown introduction point auth key on circuit 3339485114 for service [scrubbed] this is very interesting. I have absolutely no auth set up for neither the v3 neither the v2 hidden service.

Child Tickets

Change History (13)

comment:1 Changed 5 months ago by nickm

Keywords: 041-should 041-regression? double-mark added
Milestone: Tor: 0.4.1.x-final

comment:2 Changed 5 months ago by nickm

Owner: set to dgoulet
Status: newassigned

dgoulet said he'd look at this.

comment:3 Changed 5 months ago by dgoulet

So the HS system (v2 or v3) will not launch introduction circuits until 3 things are true:

  if (!have_completed_a_circuit() || net_is_disabled() ||
      networkstatus_get_live_consensus(now) == NULL) {
    goto end;
  }

In _theory_ the have_completed_a_circuit() is the one telling us that the Guard is usable because we were able to complete a full circuit through it.

However, it seems that once those introduction circuits were launched, 3 of them ended up in waiting to "see how the guard performs". I bet it is because the circuit went _back_ into "wait for guard state" after opening some circuits which lead to this situation. In theory, it should be "fine" I believe.

Now onto the main problem, the duplicate close. The warnings before are v2 ... but that duplicate close is for the v3 which has no warnings. The "exceeded limit" log is at info level for v3...

So roughly 5 minutes after bootstrap, the v3 introduction circuit finally opened but the introduction point authentication key disappeared from our state. This is all but too familiar (#28970 and #27471) ... Somehow, both on client or service side, we loose track (or clean it) of the authentication key of an IP leading to these BUG().

This duplicate mark for close is the hint I think. It appears we have a code path in tor that can mark an intro circuit for close and then that circuit is called with the "has opened" callback leading to this double close.

The freeing of the circuit is deferred as a "post main loop" event. I'm wondering if it is possible for this sequence to happen within a _single_ main loop run:

1) Intro circuit(s) is *mark* for close and thus cleaned up from the HS state. Could be on descriptor rotation or build time out for instance.
2) The "circuit has opened" callback is called for the mark for close circuit.
3) The freeing of that circuit is done post main loop.

If possible, then the (2) should never have been done on a marked for close.

comment:4 Changed 5 months ago by gaba

Sponsor: Sponsor27

comment:5 Changed 5 months ago by dgoulet

Keywords: 035-backport 040-backport 041-backport 041-must added; 041-should 041-regression? removed
Milestone: Tor: 0.4.1.x-finalTor: 0.4.2.x-final
Status: assignedneeds_review

Issue is with entry_guards_upgrade_waiting_circuits() that does also select marked for close circuits.

035+: https://github.com/torproject/tor/pull/1117
041: https://github.com/torproject/tor/pull/1118
042: https://github.com/torproject/tor/pull/1119

Last edited 5 months ago by dgoulet (previous) (diff)

comment:6 Changed 5 months ago by dgoulet

Actual Points: 0.4
Cc: dgoulet removed

comment:7 Changed 5 months ago by dgoulet

Reviewer: asn

comment:8 Changed 5 months ago by asn

Status: needs_reviewmerge_ready

This looks good to me!

I don't see the Bug log msg in #28970 so this exact bug is probably not the cause of it, but perhaps the general class might be related?

comment:9 Changed 5 months ago by nickm

Milestone: Tor: 0.4.2.x-finalTor: 0.4.0.x-final

Merged to 0.4.1 and 0.4.2. The 0.4.1 and 0.4.2 branches conflicted, so I did an "ours" merge from maint-0.4.1 into master. Marking for possible 0.4.0 backport.

comment:10 Changed 5 months ago by nickm

Keywords: 041-must removed

comment:11 Changed 4 months ago by teor

Keywords: consider-backport-after-0415 added

This PR was merged into 0.4.1.4, so we will test it until 0.4.1.5 is released, then merge.

comment:12 Changed 3 months ago by teor

Merged to 0.3.5 and later.
Merged #30894, #30871, and #31003 together.

comment:13 Changed 3 months ago by teor

Resolution: fixed
Status: merge_readyclosed
Note: See TracTickets for help on using tickets.