Opened 2 years ago
Last modified 2 years ago
#23126 new enhancement
HSDirs should publish some count about new-style onion addresses
Reported by: | arma | Owned by: | |
---|---|---|---|
Priority: | Medium | Milestone: | Tor: unspecified |
Component: | Core Tor/Tor | Version: | |
Severity: | Normal | Keywords: | prop224, tor-hs, prop224-extra, research, privcount, 032-unreached |
Cc: | Actual Points: | ||
Parent ID: | Points: | ||
Reviewer: | Sponsor: |
Description
Right now we have an ongoing estimate of the total number of onion addresses published to the HSDirs:
https://metrics.torproject.org/hidserv-dir-onions-seen.html
How many of those are 224-style onion addresses, and how many of them are legacy-style onion addresses?
I see a rep_hist_stored_maybe_new_hs()
for the v2-style descriptors, and I think I see a
/* XXX: Update HS statistics. We should have specific stats for v3. */
for the v3-style descriptors.
So I think that means that the graph is only showing v2-style onions, and we have no infrastructure for noticing trends with v3 style onions.
I also suspect that noticing trends is harder with v3-style onions, since each descriptor the hsdir sees is standalone, and it's not possible (without knowing the onion address) to link two descriptors to the same address.
So our only chance at estimating total number of v3 onion addresses is to know the publishing habits of v3 style onion services (how many descriptors per how much time period), and then publish the total number of descriptors we see, and folks can do some math afterwards to estimate number of running services? In any case we can see if the number goes up or down over time.
Or maybe there is some even better design? :)
The reason I bring it up now is (a) if we want to get any code into relays, we need to do it sufficiently before we need it, so it can get rolled out, and (b) I see discussions about bugs with v3-style onion services publishing every 2 minutes, and while we're fixing those we should keep in mind how handy it would be to be able to predict how many descriptors a new onion service will publish per time period on average.
Child Tickets
Ticket | Type | Status | Owner | Summary |
---|---|---|---|---|
#23367 | defect | new | metrics-team | Onion address counts ignore descriptor upload overlap |
Change History (5)
comment:1 Changed 2 years ago by
Cc: | prop224 tor-hs removed |
---|---|
Keywords: | prop224 tor-hs prop224-extra added |
Milestone: | → Tor: 0.3.2.x-final |
comment:2 Changed 2 years ago by
Keywords: | research added |
---|
comment:3 follow-up: 4 Changed 2 years ago by
So a very very basic statistic here that would give us an idea of the adoption of HSv3 services could be:
a) When a time period completes, every relay publishes the number of HSv3 blinded keys it saw during the previous time period in its extra-info desc. Relays also add some laplace noise to obfuscate the original number. Time periods start at 12:00 UTC and last 24 hours, so relays can publish this statistic once per day.
b) After we have received all descriptors containing stats from a specific time period, we add all the unique blinded key counts together, and publish the aggregate count. We add everything together to remove the laplace noise, and also to get a final graphable number. Unfortunately, that final number is not the number of unique HSv3 services since HSes publish their descriptor on multiple HSDirs under the same blinded key. However this number is definitely related to the number of unique HSes, by noticing how this number moves over time, we can certainly spot adoption rates of HSv3 services.
This is a very basic stat that could help us here. Furthermore, we can then deploy similar analysis to what we did for the unique v2 .onion counts, to weed out the duplicate HSes so that we get a more accurate number. And I guess we can use privcount etc. to get an even more accurate number.
comment:4 Changed 2 years ago by
Keywords: | privcount added |
---|
Replying to asn:
So a very very basic statistic here that would give us an idea of the adoption of HSv3 services could be:
a) When a time period completes, every relay publishes the number of HSv3 blinded keys it saw during the previous time period in its extra-info desc. Relays also add some laplace noise to obfuscate the original number.
There are several bugs in the HS v2 laplace noise implementation, see #23061 and children. In particular, we need to make sure we don't re-implement bug #23414 for HS v3.
Time periods start at 12:00 UTC and last 24 hours, so relays can publish this statistic once per day.
b) After we have received all descriptors containing stats from a specific time period, we add all the unique blinded key counts together, and publish the aggregate count. We add everything together to remove the laplace noise, and also to get a final graphable number. Unfortunately, that final number is not the number of unique HSv3 services since HSes publish their descriptor on multiple HSDirs under the same blinded key. However this number is definitely related to the number of unique HSes, and by noticing how this number moves over time, we can certainly spot adoption rates of HSv3 services.
This is a very basic stat that could help us here. Furthermore, we can then deploy similar analysis to what we did for the unique v2 .onion counts, to weed out the duplicate HSes so that we get a more accurate number.
I think there are some bugs in the v2 analysis, see #23367.
For v3, here's the analysis and implementation I did for experimental privcount:
https://github.com/privcount/privcount/blob/master/privcount/tools/compute_fractional_position_weights#L26
(I left out the HS v3 hash ring, because it needed extra crypto, and imports of ed25519 ids in descriptors. I'll implement it in https://github.com/privcount/privcount/issues/422 )
And I guess we can use privcount etc. to get an even more accurate number.
No, you can't use PrivCount to get unique totals. (Aaron is working on a separate project that uniquely aggregates addresses, but the current design takes too much CPU and RAM to be run daily on relays.)
But you can use privcount to get a safe, noisy total from the individual relay counts. (Otherwise, to get a total you have to publish the number of addresses seen at each HSDir, which is less safe.)
comment:5 Changed 2 years ago by
Keywords: | 032-unreached added |
---|---|
Milestone: | Tor: 0.3.2.x-final → Tor: unspecified |
Mark a large number of tickets that I do not think we will do for 0.3.2.
We can probably use the fact that the HS blinded key is known to the HSDir, and that blinded keys rotate at a known time (they change when the prop224 time period changes), to produce some sort of "unique onion services per day" statistic from hsdirs.
I'll try to think about how this would work.