Opened 15 months ago

Last modified 11 months ago

#25964 new enhancement

Remove hs_index_t fetch, and use one of the stores instead

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords: technical-debt, refactor
Cc: Actual Points:
Parent ID: #27243 Points:
Reviewer: Sponsor:

Description

Split off from https://trac.torproject.org/projects/tor/ticket/23094#comment:9

Replying to asn:

After our #23387 refactoring of hsdir index logic Nick suggested that we don't need to keep all 3 around in memory, since two of them are always identical:
https://oniongit.eu/asn/tor/merge_requests/6#note_1201

We need to:

  • analyse the state machine of fetch, store_first, and store_second to make sure that fetch is always equal to store_first or store_second
  • work out the condition that we can use to select betweeen store_first or store_second when we want fetch
  • write a function that produces fetch from a hsdir_index_t, and use it instead of fetch

Child Tickets

Change History (4)

comment:1 Changed 15 months ago by dgoulet

Datapoint:

The function that does all the hsdir index magic assignment is: node_set_hsdir_index().

analyse the state machine of fetch, store_first, and store_second to make sure that fetch is always equal to store_first or store_second

We have to be very careful here. This hsdir index logic has been a source of complexity that created many headache during the prop224 development. We have a very very very important unit tests that makes sure client and service will always be in sync with the hashring and thus the index they compute. See in test_hs_common.c: reachability_scenarios so at least if that passes, we should be maybe be OK.

write a function that produces fetch from a hsdir_index_t, and use it instead of fetch

To clarify this a bit. We do *not* compute HSDir index on the fly but rather when the node_t is added to the nodelist. And we should keep it that way because else, everytime you want to upload or download a descriptor, you would need to compute all the indexes for all the nodes to get the full hashring.

v2 used the relay identity but in v3, we do a computation that changes every time there is a new SRV and entering a new time period.

comment:2 Changed 11 months ago by dgoulet

Parent ID: #27243

comment:3 Changed 11 months ago by nickm

If we decide to pursue this, it might also be interesting to see if we can truncate the indices. The probability of all 32 bytes mattering is minuscule.

comment:4 Changed 11 months ago by teor

There have never been more than 8000 relays in the tor network:
https://metrics.torproject.org/networksize.html?start=2007-01-01&end=2018-08-28

And there have never been more than 4500 relays with the HSDir flag:
https://metrics.torproject.org/relayflags.html?start=2007-01-01&end=2018-08-28&flag=HSDir

We need need 12 + N bits to index <= 212 = 4096 HSDirs, where N is a small slop factor to avoid collisions.

If we use a 16-bit index, ~1/16 indexes will be occupied. I think this means that 1/32 = ~128 HSDirs will share an index with exactly 1 other HSDir. Similarly, ~64 HSDirs will share an index with exactly 2 other HSDirs, and so on.

But the spread store is currently 4, so an index shared by 2 HSDirs only matters when it's in the 4th spot (128 * 1/4 = 32 HSDirs), an index shared by 3 HSDirs only matters in the 3rd or 4th spots (64 * 2/4 = 32 HSDirs), 4 HSDirs only matters if it's after the 1st spot (32 * 3/4 = 24 HSDirs), and 5 or more HSDirs always matter (32 HSDirs). So overall, approximately 112 / 4096 = 2.7% HSDirs will have a meaningful collision.

I'm not sure how we want to deal with these collisions, given the small number of relays involved, we could just store to all the extra equal-indexed relays. (If services choose an equal-indexed relay at random, that would effectively increase the spread that clients need to check. If services use a relay attribute as a tie-breaker, then malicious relays would have an incentive to modify that attribute.)

If we don't want any collisions at all, we should use a 24 bit index. (More precisely, 1 meaningful collision every 2 days.)

It should be impossible for someone to mine relay keys to generate collisions, because the shared random value changes every 24 hours, but the HSDir flag is only given after 96 hours.

Note: See TracTickets for help on using tickets.