Opened 4 months ago

Closed 3 months ago

#26709 closed defect (wontfix)

Onion V3 addresses not always working

Reported by: time_attacker Owned by:
Priority: Very High Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Major Keywords: onion, tor-hs, 034-backport, 033-backport, 032-backport
Cc: dmr Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I have dozen of V3 onions launched via ADD_ONION control port command. Sometimes they work, sometimes do not. "GETINFO onions/current" shows them working. I have to restart tor to make them work. I have some v2 onions too and they are always working with no trouble at all under the same conditions.

Child Tickets

Change History (8)

comment:1 Changed 4 months ago by dgoulet

Priority: Very HighMedium
Status: newneeds_information

We unfortunately can't do much without more information.

Can you provide the logs at INFO level by adding this to your torrc (be careful, the onion addresses will show up in the logs):

Log info file /tmp/info.log

comment:2 Changed 4 months ago by time_attacker

I have tor on Windows. I added line "Log info file C:\tor\torlog.log" to torrc, nothing is written.

comment:3 Changed 4 months ago by asn

Might be related to #26549: v3 ephemeral onions don't work very well, this will be fixed with #25552 (patch available and will soon be merged to master).

If that's not the issue then logs would help. Thanks!

comment:4 Changed 4 months ago by time_attacker

Now I have logs (just needed to set 'AvoidDiskWrites 0') and will provide them here if the patch will not help.

comment:5 Changed 4 months ago by dgoulet

So I've been having this issue but rarely. This weekend, it happened to me in the morning where one of my service wasn't computing the same hashring as the client. No matter how many times I would restart the client, with latest consensus, they were always different. So the service was the issue.

Every single parameter on the service side was correct in order to compute the right hashring (SRV, time period num from the ns->valid_after, replica, ...).

My investigation lead me to hs_service_descriptor_t->time_period_num value. In theory, every descriptor is _only_ built for a specific time period, they don't overlap. When we build a descriptor, we keep the time period num it is built for and then we never change it (which in theory should be OK). But, descriptor rotation happens at each new SRV which happens 12h *before* a new time period.

Thus, I believe we have an issue where a descriptor can be between two time periods leading to something like: Current Desc: TP - 1, and Next desc.: TP + 1 or something like that which means there is up to a 12h time frame where the current time period num has simply no descriptor for it and thus the service is unreachable.

See build_descriptors_for_new_service() ... there is something problematic there where we use now to check if we are in between TP and SRV and if so, then we get the previous/current time period num but this time using the valid_after ... these can be offset which can lead to missing a time period num for the descriptor we are building.

I'm running an experiment right now that should confirm the theory. I'll have results in hopefully less than 48h.

comment:6 Changed 4 months ago by dgoulet

Keywords: 034-backport 033-backport 032-backport added
Milestone: Tor: 0.3.5.x-final
Priority: MediumVery High
Severity: CriticalMajor

I'm promoting this to 035 milestone and flagging it for backport since if we are correct, this is a major reachability issue.

comment:7 Changed 4 months ago by dmr

Cc: dmr added

comment:8 Changed 3 months ago by dgoulet

Resolution: wontfix
Status: needs_informationclosed

We are currently investigating many HS reachability bugs. This ticket most likely falls under one or many of those issues. Since there are no action items here, I'll close this but rest assure known issues have been opened and are being worked on.

For the record, I still have an experimental patch tracking the time period number so if my service ever is not reachable again, I'll at least have that results. It hasn't happened since my post.

Note: See TracTickets for help on using tickets.