Opened 11 days ago

Last modified 4 days ago

#26709 needs_information defect

Onion V3 addresses not always working

Reported by: time_attacker Owned by:
Priority: Very High Milestone: Tor: 0.3.5.x-final
Component: Core Tor/Tor Version:
Severity: Major Keywords: onion, tor-hs, 034-backport, 033-backport, 032-backport
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I have dozen of V3 onions launched via ADD_ONION control port command. Sometimes they work, sometimes do not. "GETINFO onions/current" shows them working. I have to restart tor to make them work. I have some v2 onions too and they are always working with no trouble at all under the same conditions.

Child Tickets

Change History (6)

comment:1 Changed 11 days ago by dgoulet

Priority: Very HighMedium
Status: newneeds_information

We unfortunately can't do much without more information.

Can you provide the logs at INFO level by adding this to your torrc (be careful, the onion addresses will show up in the logs):

Log info file /tmp/info.log

comment:2 Changed 11 days ago by time_attacker

I have tor on Windows. I added line "Log info file C:\tor\torlog.log" to torrc, nothing is written.

comment:3 Changed 10 days ago by asn

Might be related to #26549: v3 ephemeral onions don't work very well, this will be fixed with #25552 (patch available and will soon be merged to master).

If that's not the issue then logs would help. Thanks!

comment:4 Changed 7 days ago by time_attacker

Now I have logs (just needed to set 'AvoidDiskWrites 0') and will provide them here if the patch will not help.

comment:5 Changed 4 days ago by dgoulet

So I've been having this issue but rarely. This weekend, it happened to me in the morning where one of my service wasn't computing the same hashring as the client. No matter how many times I would restart the client, with latest consensus, they were always different. So the service was the issue.

Every single parameter on the service side was correct in order to compute the right hashring (SRV, time period num from the ns->valid_after, replica, ...).

My investigation lead me to hs_service_descriptor_t->time_period_num value. In theory, every descriptor is _only_ built for a specific time period, they don't overlap. When we build a descriptor, we keep the time period num it is built for and then we never change it (which in theory should be OK). But, descriptor rotation happens at each new SRV which happens 12h *before* a new time period.

Thus, I believe we have an issue where a descriptor can be between two time periods leading to something like: Current Desc: TP - 1, and Next desc.: TP + 1 or something like that which means there is up to a 12h time frame where the current time period num has simply no descriptor for it and thus the service is unreachable.

See build_descriptors_for_new_service() ... there is something problematic there where we use now to check if we are in between TP and SRV and if so, then we get the previous/current time period num but this time using the valid_after ... these can be offset which can lead to missing a time period num for the descriptor we are building.

I'm running an experiment right now that should confirm the theory. I'll have results in hopefully less than 48h.

comment:6 Changed 4 days ago by dgoulet

Keywords: 034-backport 033-backport 032-backport added
Milestone: Tor: 0.3.5.x-final
Priority: MediumVery High
Severity: CriticalMajor

I'm promoting this to 035 milestone and flagging it for backport since if we are correct, this is a major reachability issue.

Note: See TracTickets for help on using tickets.