Opened 2 months ago

Closed 3 weeks ago

#27838 closed defect (fixed)

v3 onion service wrongly considers Invalid signature for service descriptor signing key: expired

Reported by: s7r Owned by: dgoulet
Priority: High Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version: Tor: 0.3.5.1-alpha
Severity: Normal Keywords: tor-hs
Cc: Actual Points:
Parent ID: Points:
Reviewer: asn Sponsor:

Description

I host on a server 2 onion services (one v2 and one v3).

Exact Tor version is: Tor version 0.3.5.0-alpha-dev

Suddenly, the v3 onion service went down. This was showing in the log file:

Sep 23 14:04:00.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Sep 23 14:04:02.000 [warn] Invalid signature for service descriptor signing key: expired
Sep 23 14:04:02.000 [warn] tor_bug_occurred_(): Bug: ../src/feature/hs/hs_descriptor.c:2661: hs_desc_encode_descriptor: Non-fatal assertion !(ret < 0) failed. (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug: Non-fatal assertion !(ret < 0) failed in hs_desc_encode_descriptor at ../src/feature/hs/hs_descriptor.c:2661. Stack trace: (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(log_backtrace_impl+0x47) [0x55e9e1761257] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(tor_bug_occurred_+0xc0) [0x55e9e175ca60] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(hs_desc_encode_descriptor+0x106) [0x55e9e1667996] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(hs_service_run_scheduled_events+0x1ab9) [0x55e9e166ff49] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(+0x562e1) [0x55e9e15cd2e1] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(+0x5dc11) [0x55e9e15d4c11] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x6a0) [0x7f246c99f5a0] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(do_main_loop+0x265) [0x55e9e15d11b5] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(tor_run_main+0x11f5) [0x55e9e15d3a35] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(tor_main+0x3a) [0x55e9e15cba0a] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(main+0x19) [0x55e9e15cb589] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f246b1fb2e1] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(_start+0x2a) [0x55e9e15cb5da] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] tor_bug_occurred_(): Bug: ../src/feature/hs/hs_service.c:2816: upload_descriptor_to_hsdir: Non-fatal assertion !(service_encode_descriptor(service, desc, &desc->signing_kp, &encoded_desc) < 0) failed. (on Tor $
Sep 23 14:04:02.000 [warn] Bug: Non-fatal assertion !(service_encode_descriptor(service, desc, &desc->signing_kp, &encoded_desc) < 0) failed in upload_descriptor_to_hsdir at ../src/feature/hs/hs_service.c:2816. Stack trace: (on Tor 0.3.$
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(log_backtrace_impl+0x47) [0x55e9e1761257] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(tor_bug_occurred_+0xc0) [0x55e9e175ca60] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(hs_service_run_scheduled_events+0x1d5b) [0x55e9e16701eb] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(+0x562e1) [0x55e9e15cd2e1] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(+0x5dc11) [0x55e9e15d4c11] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5(event_base_loop+0x6a0) [0x7f246c99f5a0] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(do_main_loop+0x265) [0x55e9e15d11b5] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(tor_run_main+0x11f5) [0x55e9e15d3a35] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(tor_main+0x3a) [0x55e9e15cba0a] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(main+0x19) [0x55e9e15cb589] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f246b1fb2e1] (on Tor 0.3.5.0-alpha-dev )
Sep 23 14:04:02.000 [warn] Bug:     /usr/bin/tor(_start+0x2a) [0x55e9e15cb5da] (on Tor 0.3.5.0-alpha-dev )

It reported that stack trace over and over for few hours, during the time of some internet connectivity problems. The v3 hidden service was unavailable during all this time of course.

Some additional info:
The internet was not working properly when this happened. The IPv4 bgp session was down and only IPv6 was working on the server. I have checked the Guard this particular Tor instance was using and it had an IPv6 ORPort (maybe this lead to a false positive wrt to testing our own network connection).

The bigger problem is that when everything came back online, the bug stack trace was not printing in the log file any more, but the v3 onion service did not recover by itself and still wasn't accessible. Had to restart the Tor process entirely and it came back online.

Child Tickets

Change History (12)

comment:1 Changed 2 months ago by dgoulet

I think this is most probably a duplicate of #25616. I'll prioritize this one as it is more recent stacktrace. Me and asn discussed this in person and the possible solution seems to be:

https://trac.torproject.org/projects/tor/ticket/25616#comment:12

@s7r, any chance you had a time correction on the server?

comment:2 Changed 2 months ago by s7r

@dgoulet I am pretty confident there was no time correction on the server. This runs on dedicated hardware. Time is sync-ed with NTP permanently. I have compared the log timestamps with system date and with the real time the internet incident occured, and it all matches to the second.

While I do not find any ties to a possible time correction on the server I find all ties to the IPv4 connectivity failure. You want me to kill BGP for v4 at my office and try to reproduce? I could do that in weekends or some time 3- 4 AM in the morning...

comment:3 Changed 2 months ago by s7r

Come back after more digging.

I confirm the solution at https://trac.torproject.org/projects/tor/ticket/25616#comment:12

What I think happened in my particular case:

  1. Tor generated a new descriptor for the v3 onion service because it was time to do so [OK]
  1. Tor tried to upload the descriptor it just built. Because of internet connectivity failure, it couldn't. Even so it should have been possible because my guard had a working IPv6 ORPort. This is a different problem though, I'll ping teor about it later. [partially OK, internet connection state is outside Tor's control].
  1. After internet came back up, Tor tried to upload the descriptor earlier built. It was expired, so the warning + stack trace was logged. [PROBLEM TO FIX]
  1. After the time for this descriptor expired as well, new one was built, so warning + stack trace disappeared from log. [NOT OK, WHILE THE WARNING DISAPPEARED v3 ONION SERVICE WAS STILL UNACCESSIBLE]..

Timings:
full internet came back up at ~ 14:03:something;
first warning + stack trace logged at 14:04:00.000;
last warning + stack trace logged at 19:46:28.000; (maybe at this time it built a new descriptor).

v3 onion service resumed working in normal state only after Tor process was restarted.

comment:4 Changed 8 weeks ago by dgoulet

Owner: set to dgoulet
Status: newaccepted

comment:5 Changed 7 weeks ago by nickm

Priority: MediumHigh

comment:6 in reply to:  3 Changed 6 weeks ago by dgoulet

Replying to s7r:

Come back after more digging.

Epic digging. Very useful! Thanks.

  1. After internet came back up, Tor tried to upload the descriptor earlier built. It was expired, so the warning + stack trace was logged. [PROBLEM TO FIX]

Ooooooook that is a great find actually. The HS subsystem is *not* notified whatsoever on an upload failure. Considering connectivity issues on the service side, none of the descriptors were uploaded but we never adjusted to that.

Meaning that the failure will happen and then the next upload time is set to 1h to 2h later...

This we can fix, and should, with a callback to the HS subsystem when any kind of upload failure happens.

  1. After the time for this descriptor expired as well, new one was built, so warning + stack trace disappeared from log. [NOT OK, WHILE THE WARNING DISAPPEARED v3 ONION SERVICE WAS STILL UNACCESSIBLE]..

Timings:
full internet came back up at ~ 14:03:something;
first warning + stack trace logged at 14:04:00.000;

I believe this is possible because (and my memory is flaky here) the directory connection kept the descriptor in the outbuf and once the Internet came back, retried to send it. But, without the HS subsystem knowing.

Ok so considering the idea that we should rebuild our descriptor from scratch every time we are about to upload instead of keeping the certificate for a long time (among other things), we should very much fix the issue highlighted in (3) imo.

The HS subsystem not knowing the upload failed is pretty bad for reachability. Imagine OnionShare on a bad connection moving around on a laptop, we need the HS service to be much more resilient to disconnect.

comment:7 Changed 6 weeks ago by dgoulet

FYI, I found the ticket for (3): #23988

The solution isn't that straight forward. Calling back into the HS subsystem is easy, the issue is keeping track of which HSDir or not failed and why so we can take a decision on the retry or not.

This ticket should now solely be about rebuilding descriptors at every upload attempt.

comment:8 Changed 5 weeks ago by dgoulet

Ok after a discussion with asn, the right thing to do is to:

1) Identify the mutable values within a descriptors that would need to be refreshed before uploading (for instance, revision counter). We already have several of these so we would also need to add the certificate creation so we always have fresh cert. The time is rounded down to the hour so to avoid the leak of when _exactly_ the descriptor is uploaded.

2) Add such a function that refreshes all the mutable values before uploading.

comment:9 Changed 4 weeks ago by dgoulet

Reviewer: asn
Status: acceptedneeds_review

Branch: ticket27838_035_01
PR: https://github.com/torproject/tor/pull/434

comment:10 Changed 4 weeks ago by asn

See https://github.com/torproject/tor/pull/438 for some non-behavioral changes to the branch. Documentation and func renames.

comment:11 Changed 4 weeks ago by dgoulet

Status: needs_reviewmerge_ready

ACK on asn's branch.

comment:12 Changed 3 weeks ago by dgoulet

Resolution: fixed
Status: merge_readyclosed

Merged forward!

Note: See TracTickets for help on using tickets.