#25616 closed defect (duplicate)

Non-fatal assertion in hs_desc_encode_descriptor similar to #24972

Reported by: alnsn Owned by: dgoulet
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.3.2.10
Severity: Normal Keywords: tor-hs, 033-triage-20180326, 033-included-20180326, 035-removed-20180711
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by dgoulet)

Tor 0.3.2.2.10 on NetBSD.

Mar 24 08:40:25.000 [warn] tor_bug_occurred_(): Bug: src/or/hs_descriptor.c:2360: hs_desc_encode_descriptor: Non-fatal assertion !(ret < 
0) failed. (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug: Non-fatal assertion !(ret < 0) failed in hs_desc_encode_descriptor at src/or/hs_descriptor.c:2360. Stack 
trace: (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc9551fe <log_backtrace+0x2e> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc969750 <tor_bug_occurred_+0xbd> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc93f8f0 <hs_desc_encode_descriptor+0x7d> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc9474ec <hs_service_run_scheduled_events+0x1924> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69d
b819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc846e5e <have_completed_a_circuit+0x39> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc861ac4 <pt_free_all+0xee> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e22834 <event_base_init_common_timeout+0xade> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.2
.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e22dc4 <event_base_init_common_timeout+0x106e> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.
2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e23548 <event_base_loop+0x2da> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.2.10 31cc63deb69
db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc848dea <do_main_loop+0x239> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc84c4f3 <tor_main+0x1bf5> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc9b3cda <main+0x9> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)


Mar 24 08:40:25.000 [warn] tor_bug_occurred_(): Bug: src/or/hs_service.c:2245: upload_descriptor_to_hsdir: Non-fatal assertion !(hs_desc_
encode_descriptor(desc->desc, &desc->signing_kp, &encoded_desc) < 0) failed. (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug: Non-fatal assertion !(hs_desc_encode_descriptor(desc->desc, &desc->signing_kp, &encoded_desc) < 0) failed
 in upload_descriptor_to_hsdir at src/or/hs_service.c:2245. Stack trace: (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc9551fe <log_backtrace+0x2e> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc969750 <tor_bug_occurred_+0xbd> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc94773a <hs_service_run_scheduled_events+0x1b72> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69d
b819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc846e5e <have_completed_a_circuit+0x39> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc861ac4 <pt_free_all+0xee> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e22834 <event_base_init_common_timeout+0xade> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.2
.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e22dc4 <event_base_init_common_timeout+0x106e> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.
2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e23548 <event_base_loop+0x2da> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc848dea <do_main_loop+0x239> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc84c4f3 <tor_main+0x1bf5> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc9b3cda <main+0x9> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Invalid signature for service descriptor signing key: expired


Mar 24 08:40:25.000 [warn] tor_bug_occurred_(): Bug: src/or/hs_descriptor.c:2360: hs_desc_encode_descriptor: Non-fatal assertion !(ret < 0) failed. (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug: Non-fatal assertion !(ret < 0) failed in hs_desc_encode_descriptor at src/or/hs_descriptor.c:2360. Stack trace: (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc9551fe <log_backtrace+0x2e> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc969750 <tor_bug_occurred_+0xbd> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc93f8f0 <hs_desc_encode_descriptor+0x7d> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc9474ec <hs_service_run_scheduled_events+0x1924> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc846e5e <have_completed_a_circuit+0x39> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc861ac4 <pt_free_all+0xee> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e22834 <event_base_init_common_timeout+0xade> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e22dc4 <event_base_init_common_timeout+0x106e> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0x74c7f0e23548 <event_base_loop+0x2da> at /usr/pkg/lib/libevent-2.1.so.6 (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc848dea <do_main_loop+0x239> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc84c4f3 <tor_main+0x1bf5> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)
Mar 24 08:40:25.000 [warn] Bug:     0xbc9b3cda <main+0x9> at /usr/pkg/bin/tor (on Tor 0.3.2.10 31cc63deb69db819)

Child Tickets

Change History (16)

comment:1 Changed 16 months ago by alnsn

torrc-defaults:

SafeSocks 1
LogMessageDomains 0
Log err syslog

OutboundBindAddress <<IPv4>>
OutboundBindAddress [<<PIv6>>]

torrc:

SOCKSPort 9050
DNSPort 9053
SafeLogging 0
Log notice file /var/log/tor/proxy.log
Log [app,circ]info file /var/log/tor/proxy.info

HiddenServiceDir /var/chroot/tor/v3sshd
HiddenServiceVersion 3
HiddenServicePort 22 127.0.0.1:22
HiddenServicePort 22 [::1]:22

comment:2 Changed 16 months ago by dgoulet

Keywords: regression tor-hs 033-backport 032-backport added
Milestone: Tor: 0.3.4.x-final
Priority: MediumHigh

Yes, very much like #24792 and the extra logging added paid off: Key part of the log from above:

 Mar 24 08:40:25.000 [warn] Invalid signature for service descriptor signing key: expired

I think we might have an issue with clock jump between descriptor creation and encoding...

comment:3 Changed 16 months ago by alnsn

Is clock jump my local problem or a problem in Tor’s code?

I’m asking because I have two very similar setups: both are tor 0.3.2.10 on NetBSD 8.0, running with ntpd but one runs in a virtual box and clock jumps can be insane. It’s so bad, ntpd doesn’t always cope with jumps and I had to add a cronjob to synchronise clocks. The puzzling part if that I don’t see these asserts on that box, I see them on a stable box.

Alex

comment:4 in reply to:  3 Changed 16 months ago by dgoulet

Replying to alnsn:

Is clock jump my local problem or a problem in Tor’s code?

Tor should be better at handling clock jump. It is really inevitable in reality so tor should be good at noticing it and recovering.

Of course, keeping your system that runs Tor with a correct clock all the time will prevent having other kind of issues but in the case of hidden service, tor should recover without crazy warnings.

comment:5 Changed 16 months ago by dgoulet

Keywords: 033-must added; 033-backport removed
Milestone: Tor: 0.3.4.x-finalTor: 0.3.3.x-final

Oh, I put that one in 034 milestone. This is a regression and must be fixed so moving it to 033 milestone and flagging it 033-must. Also notice, it is flagged for 032 backport.

comment:6 Changed 16 months ago by nickm

Keywords: regression, tor-hs, 032-backport, 033-mustregression, tor-hs, 032-backport, 033-must regression

comment:7 Changed 16 months ago by nickm

Keywords: 033-triaged-20180326 033-included-20180326 added

comment:8 Changed 16 months ago by nickm

Keywords: 033-triage-20180326 added; 033-triaged-20180326 removed

comment:9 Changed 16 months ago by dgoulet

Owner: set to dgoulet
Status: newaccepted

comment:10 Changed 16 months ago by dgoulet

Keywords: regression, tor-hs, 032-backport, 033-must regression 033-triage-20180326 033-included-20180326regression, tor-hs, 032-backport, 033-must 033-triage-20180326 033-included-20180326

One too many regression keyword.

comment:11 Changed 16 months ago by dgoulet

This is caused because we use a different time source when decoding the descriptor from the encoding process. In other words, we encode with time(NULL) (value coming from the main look scheduled event) and once we are done, we try to decode the resulting descriptor to validate where we use approx_time(). A clock jump in between can make the validation to fail.

Good news is that tor will recover that is fail to build the descriptor and a second later will try again but this time with possibly synchronized time source.

Which means that we either remove the BUG(), let the error fly and hope tor recovers. Or we pass down the same time source to the decoding process down the stack which will result in a patch where ~8 functions will be changed to pass now as a parameter.

The former is worrying because removing the BUG() might make us miss other encode/decode issues. The latter is a bit messy but should fix that issue once and for all because we would share the time source for the entire encoding descriptor process.

comment:12 Changed 16 months ago by asn

We discussed this with David. The most likely source of this bug is that we build the descriptor once (build_service_descriptor()) and then we keep it for many hours and we just encode the same desc over and over before publishing it (upload_descriptor_to_hsdir(). If the clock jumps after descriptor build, certs in our descriptor might expire before we encode it, and this BUG will get caused. Even tho the clock jumps, we don't expire the descriptor because should_rotate_descriptor() actually takes ns time as the authoritative source.

A solution here would be to re-build the whole descriptor everytime before we encode it, so that we ensure that all certs are fresh before they enter the encode/decode function.

comment:13 Changed 16 months ago by dgoulet

Keywords: 033-must removed
Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

Moving this to 034. The fixes aren't simple and could cause possible regression because we are quite close from 033 stable.

The BUG() + log_warn() might annoy some operators in the meantime but at least it is harmless as tor does recover from the error gracefully by rebuilding the descriptor with synced time source.

comment:14 Changed 14 months ago by dgoulet

Keywords: regression 032-backport removed
Milestone: Tor: 0.3.4.x-finalTor: 0.3.5.x-final

Moving to 035. We can't fix that in 034, we are after the freeze and this could be a bit destabilizing.

Also, I'm removing the backport keyword for now until we have a better grasp on the solution. This isn't a regression also, it always been that way unfortunately. We need to fix this default behavior.

comment:15 Changed 13 months ago by nickm

Keywords: 035-removed-20180711 added
Milestone: Tor: 0.3.5.x-finalTor: unspecified

These tickets are being triaged out of 0.3.5. The ones marked "035-roadmap-proposed" may return.

comment:16 Changed 10 months ago by dgoulet

Description: modified (diff)
Resolution: duplicate
Status: acceptedclosed

I believe the new #27838 is the same issue. Closing this one in favor of the newer one and in 035.

Note: See TracTickets for help on using tickets.