Opened 3 months ago

Last modified 3 weeks ago

#24468 assigned task

Measure HSDir usage to guide parameter choices

Reported by: teor Owned by: teor
Priority: Medium Milestone: Tor: 0.3.4.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: privcount-experimental, tor-hs
Cc: Actual Points:
Parent ID: Points: 3
Reviewer: Sponsor:

Description

Split off #24425:

Replying to asn:

Replying to teor:

If you write down a list of exactly what you want to know, we can probably collect some stats on ~18 HSDirs using PrivCount.

...
here are some basic ideas:

  • How many v2/v3/both descs per HSDir?

How is this different to "rate of incoming"?

If you mean "cached right now", then I'd need a timeframe so I could design an event. I could do this in December or January.

  • How much total RAM do all v2/v3/both descs occupy on your hsdirs? (max,min,avg,mean over your 18 hsdirs)

I think we have some of the data, but I'd need a list of the objects that contribute to RAM usage. Do you just want descriptors, or is there a replay cache? I could do this in December or January.

  • Size variance of v2/v3 descs? (max,min,avg,mean)

Already implemented as a histogram, needs defined bin sizes.

  • What's the rate of incoming v2/v3/both descs?

Already implemented, needs a time period.

  • How many failed requests for HS descriptors over time? (percentage over total requests?)

I'm going to implement this in December.

These are just the obvious stats that I came up with. We can come up with more stuff as we see some results and understand the space better.

Let me know if you need help in turning the above sentences into methodologies.

We will also need an estimate of how much 1 client / service would contribute to each statistic in 10 minutes.

Is that to figure out the noise for differential privacy? Let's try to come up with the final stats list and then we can figure this out.

Yes, that's fine. They only need to be rough estimates.

Child Tickets

Change History (4)

comment:1 in reply to:  description ; Changed 3 months ago by asn

Replying to teor:

Split off #24425:

Replying to asn:

Replying to teor:

If you write down a list of exactly what you want to know, we can probably collect some stats on ~18 HSDirs using PrivCount.

...
here are some basic ideas:

  • How many v2/v3/both descs per HSDir?

How is this different to "rate of incoming"?

If you mean "cached right now", then I'd need a timeframe so I could design an event. I could do this in December or January.

Yeah I meant "cached right now". How many descs does the HSDir have cached at 01:00? At 03:00? At 14:00?

Perhaps what could be done is a graph with x-axis being time (24 hours), and y-axis being boxplots of the number of descriptors through the whole observation period. Something like this: https://i.imgur.com/oyryjs4.png

  • How much total RAM do all v2/v3/both descs occupy on your hsdirs? (max,min,avg,mean over your 18 hsdirs)

I think we have some of the data, but I'd need a list of the objects that contribute to RAM usage. Do you just want descriptors, or is there a replay cache? I could do this in December or January.

Hm. Don't care much about the replay cache right now, mainly about the descs themselves. We could use rendcache.c:rend_cache_total_allocation (which tracks both v2 and v3), and then rend_cache_entry_allocation() for v2, and cache_get_dir_entry_size() for v3.

  • Size variance of v2/v3 descs? (max,min,avg,mean)

Already implemented as a histogram, needs defined bin sizes.

Hm, not sure. Probably depends on the size variance to optimize bin size.

  • What's the rate of incoming v2/v3/both descs?

Already implemented, needs a time period.

We could do per hour to get some initial insight?

comment:2 in reply to:  1 Changed 3 months ago by dgoulet

Replying to asn:

  • What's the rate of incoming v2/v3/both descs?

Already implemented, needs a time period.

We could do per hour to get some initial insight?

I think per-hour is what we want. v3 services upload at a random rate a descriptor between 60 min and 120 min so the lower bound is "per-hour" (not considering IP rotation)

comment:3 Changed 2 months ago by nickm

Parent ID: #24425

comment:4 Changed 3 weeks ago by teor

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

Moving most of my assigned tickets to 0.3.4

Note: See TracTickets for help on using tickets.