Opened 20 months ago

Last modified 18 months ago

#23126 new enhancement

HSDirs should publish some count about new-style onion addresses

Reported by: arma Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: prop224, tor-hs, prop224-extra, research, privcount, 032-unreached
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Right now we have an ongoing estimate of the total number of onion addresses published to the HSDirs:
https://metrics.torproject.org/hidserv-dir-onions-seen.html

How many of those are 224-style onion addresses, and how many of them are legacy-style onion addresses?

I see a rep_hist_stored_maybe_new_hs() for the v2-style descriptors, and I think I see a

  /* XXX: Update HS statistics. We should have specific stats for v3. */

for the v3-style descriptors.

So I think that means that the graph is only showing v2-style onions, and we have no infrastructure for noticing trends with v3 style onions.

I also suspect that noticing trends is harder with v3-style onions, since each descriptor the hsdir sees is standalone, and it's not possible (without knowing the onion address) to link two descriptors to the same address.

So our only chance at estimating total number of v3 onion addresses is to know the publishing habits of v3 style onion services (how many descriptors per how much time period), and then publish the total number of descriptors we see, and folks can do some math afterwards to estimate number of running services? In any case we can see if the number goes up or down over time.

Or maybe there is some even better design? :)

The reason I bring it up now is (a) if we want to get any code into relays, we need to do it sufficiently before we need it, so it can get rolled out, and (b) I see discussions about bugs with v3-style onion services publishing every 2 minutes, and while we're fixing those we should keep in mind how handy it would be to be able to predict how many descriptors a new onion service will publish per time period on average.

Child Tickets

TicketTypeStatusOwnerSummary
#23367defectnewmetrics-teamOnion address counts ignore descriptor upload overlap

Change History (5)

comment:1 Changed 20 months ago by asn

Cc: prop224 tor-hs removed
Keywords: prop224 tor-hs prop224-extra added
Milestone: Tor: 0.3.2.x-final

comment:2 Changed 19 months ago by asn

Keywords: research added

We can probably use the fact that the HS blinded key is known to the HSDir, and that blinded keys rotate at a known time (they change when the prop224 time period changes), to produce some sort of "unique onion services per day" statistic from hsdirs.

I'll try to think about how this would work.

comment:3 Changed 19 months ago by asn

So a very very basic statistic here that would give us an idea of the adoption of HSv3 services could be:

a) When a time period completes, every relay publishes the number of HSv3 blinded keys it saw during the previous time period in its extra-info desc. Relays also add some laplace noise to obfuscate the original number. Time periods start at 12:00 UTC and last 24 hours, so relays can publish this statistic once per day.

b) After we have received all descriptors containing stats from a specific time period, we add all the unique blinded key counts together, and publish the aggregate count. We add everything together to remove the laplace noise, and also to get a final graphable number. Unfortunately, that final number is not the number of unique HSv3 services since HSes publish their descriptor on multiple HSDirs under the same blinded key. However this number is definitely related to the number of unique HSes, by noticing how this number moves over time, we can certainly spot adoption rates of HSv3 services.

This is a very basic stat that could help us here. Furthermore, we can then deploy similar analysis to what we did for the unique v2 .onion counts, to weed out the duplicate HSes so that we get a more accurate number. And I guess we can use privcount etc. to get an even more accurate number.

Version 1, edited 19 months ago by asn (previous) (next) (diff)

comment:4 in reply to:  3 Changed 19 months ago by teor

Keywords: privcount added

Replying to asn:

So a very very basic statistic here that would give us an idea of the adoption of HSv3 services could be:

a) When a time period completes, every relay publishes the number of HSv3 blinded keys it saw during the previous time period in its extra-info desc. Relays also add some laplace noise to obfuscate the original number.

There are several bugs in the HS v2 laplace noise implementation, see #23061 and children. In particular, we need to make sure we don't re-implement bug #23414 for HS v3.

Time periods start at 12:00 UTC and last 24 hours, so relays can publish this statistic once per day.

b) After we have received all descriptors containing stats from a specific time period, we add all the unique blinded key counts together, and publish the aggregate count. We add everything together to remove the laplace noise, and also to get a final graphable number. Unfortunately, that final number is not the number of unique HSv3 services since HSes publish their descriptor on multiple HSDirs under the same blinded key. However this number is definitely related to the number of unique HSes, and by noticing how this number moves over time, we can certainly spot adoption rates of HSv3 services.

This is a very basic stat that could help us here. Furthermore, we can then deploy similar analysis to what we did for the unique v2 .onion counts, to weed out the duplicate HSes so that we get a more accurate number.

I think there are some bugs in the v2 analysis, see #23367.

For v3, here's the analysis and implementation I did for experimental privcount:
https://github.com/privcount/privcount/blob/master/privcount/tools/compute_fractional_position_weights#L26

(I left out the HS v3 hash ring, because it needed extra crypto, and imports of ed25519 ids in descriptors. I'll implement it in https://github.com/privcount/privcount/issues/422 )

And I guess we can use privcount etc. to get an even more accurate number.

No, you can't use PrivCount to get unique totals. (Aaron is working on a separate project that uniquely aggregates addresses, but the current design takes too much CPU and RAM to be run daily on relays.)

But you can use privcount to get a safe, noisy total from the individual relay counts. (Otherwise, to get a total you have to publish the number of addresses seen at each HSDir, which is less safe.)

comment:5 Changed 18 months ago by nickm

Keywords: 032-unreached added
Milestone: Tor: 0.3.2.x-finalTor: unspecified

Mark a large number of tickets that I do not think we will do for 0.3.2.

Note: See TracTickets for help on using tickets.