Opened 13 months ago

Last modified 12 months ago

#27436 new defect

Rejected v3 descriptor (2018 September edition)

Reported by: asn Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-hs reliability reachability
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

After we fixed #26980 the number of rejected v3 descriptors have greatly decreased. In the past 3 weeks of monitoring my v3 service, I only saw a single rejected v3 descriptor:

It does not seem to be related to the revision counters anymore, but we should look more into it to be sure.

Here is the surrounding log area:

Aug 31 08:56:17.000 [info] update_consensus_networkstatus_fetch_time_impl(): Live microdesc consensus will be the most recent until 2018-08-31 09:00:00 and will expire a
t 2018-08-31 11:00:00; fetching the next one at 2018-08-31 10:19:43.
Aug 31 08:56:17.000 [info] handle_response_fetch_consensus(): Successfully loaded consensus.
Aug 31 09:05:27.000 [warn] Setting rev counter for TP #17773: SRV started at 1535587200, now 1535706327 (current)
Aug 31 09:05:27.000 [warn] Encrypted revision counter 119128 to 3892522041
Aug 31 09:34:27.000 [warn] Setting rev counter for TP #17774: SRV started at 1535673600, now 1535708067 (next)
Aug 31 09:34:27.000 [warn] Encrypted revision counter 34468 to 1122321374
Aug 31 10:12:34.000 [warn] Setting rev counter for TP #17773: SRV started at 1535587200, now 1535710354 (current)
Aug 31 10:12:34.000 [warn] Encrypted revision counter 123155 to 4024847148
Aug 31 10:12:58.000 [info] handle_response_upload_hsdesc(): Uploaded hidden service descriptor (status 400 ("Invalid HS descriptor. Rejected."))
Aug 31 10:12:58.000 [info] handle_response_upload_hsdesc(): Uploading hidden service descriptor: http status 400 ("Invalid HS descriptor. Rejected.") response from dirserver '107.181.187.199:53'. Malformed hidden service descriptor?
Aug 31 10:20:16.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 114955) from server '62.210.244.146:9001', resulting in a new consensus document (size 1982606).
Aug 31 10:20:16.000 [info] Choosing expected valid-after time as 2018-08-31 11:00:00: consensus_set=1, interval=3600
Aug 31 10:20:16.000 [info] update_consensus_networkstatus_fetch_time_impl(): Live microdesc consensus was the most recent until 2018-08-31 10:00:00 and will expire at 2018-08-31 12:00:00; fetching the next one at 2018-08-31 11:03:52.
Aug 31 10:20:16.000 [info] handle_response_fetch_consensus(): Successfully loaded consensus.
Aug 31 11:04:16.000 [info] handle_response_fetch_consensus(): Applied consensus diff (size 97523) from server '62.210.244.146:9001', resulting in a new consensus document (size 1985093).
Aug 31 11:04:16.000 [info] Choosing expected valid-after time as 2018-08-31 12:00:00: consensus_set=1, interval=3600
Aug 31 11:04:16.000 [info] update_consensus_networkstatus_fetch_time_impl(): Live microdesc consensus was the most recent until 2018-08-31 11:00:00 and will expire at 2018-08-31 13:00:00; fetching the next one at 2018-08-31 12:36:05.
Aug 31 11:04:16.000 [info] handle_response_fetch_consensus(): Successfully loaded consensus.
Aug 31 11:04:16.000 [warn] Setting rev counter for TP #17774: SRV started at 1535673600, now 1535713456 (next)
Aug 31 11:04:16.000 [warn] Encrypted revision counter 39857 to 1298368820
Aug 31 11:59:46.000 [warn] Setting rev counter for TP #17773: SRV started at 1535587200, now 1535716786 (current)
Aug 31 11:59:46.000 [warn] Encrypted revision counter 129587 to 4236295065

Child Tickets

Change History (2)

comment:1 Changed 13 months ago by asn

From the logs above this doesn't seem to be a revision counter issue, so the other reasons that descriptor storage can fail on the HSDir side are all related to the HSDir parsing the plaintext part of the descriptor. This includes stuff like:

  • Descriptor is too big
  • Descriptor lifetime is too big
  • Descriptor signing key cert is invalid/expired.
  • Descriptor signature is bad.

My intuition currently is towards the descriptor signing key cert being expired, because we've had similar expiration issues in the past.

comment:2 Changed 12 months ago by asn

OK, I got a few more of those isolated rejections on my HSv3 so I digged more. Here are some comments:

  • I asked a friendly relay operator whether he saw any warning logs in their relay. They saw nothing, even tho all (?) error paths of cache_dir_desc_new() (called from hs_cache_store_as_dir()) lead to a log_warn(). This might mean that the failure was again a revision counter rejection (since that's a log_info()), or that I missed an error path that doesn't trigger a log_warn, or that the operator missed the log path.
  • It does not seem to be a problem with how the rev counters get encoded/decoded since descriptors with bigger revision counters got accepted just fine, and also all the other HSDirs in that batch accepted the descriptor. It's always a single HSDir that's rejecting the descriptor.
  • Perhaps we should save descriptors that fail to get uploaded on disk, but that does not seem easy given the currrent code architecture.

In general, because only one descriptor gets rejected out of the whole batch, this is not a terrible issue since reachability can still be maintained. However, it's definitely a bad bug and it shows there are still things we don't understand. We should look more!

Last edited 12 months ago by asn (previous) (diff)
Note: See TracTickets for help on using tickets.