Ticket #13509: hs_stats.txt

File hs_stats.txt, 4.3 KB (added by asn, 5 years ago)
Line 
1HSDir-related stats: (asn)
2
3* HSDirs threat model notes
4
5Hidden Service directories periodically receive HS descriptors from
6HSes. They cache them, and then serve them to any clients that ask for
7them.
8
9HSDirs are placed in a hash ring, and each HS picks a slice of HSDirs
10from that hash ring. Given the address of an HS, it's easy to learn
11which HSDirs are responsible for it. This makes HSDir statistics
12dangerous since they can potentially be matched to specific HSes.
13
14Furthremore, each HS has 6 HSDirs, and each HSDir serves a different
15set of HSes. This means that attackers have 6 different data points
16per HS every hour that can be used to reduce measurement noise.
17
18* How many HSes is the HSDir hosting descriptors for?
19
20Publishing this stat would allow someone who is indexing HSes to be
21able to say "I have seen 76% of all HSes". I would really like to
22avoid having such an enumeration property.
23
24That said, this is an interesting stat that would allow us to
25understand how used HSes are, and also detect sudden changes in the
26number of HSes (botnets, chat protocols, etc.). Also, learning the
27number of HSes per HSDir will help us find bugs in the hash ring code
28and also understand how loaded HSDirs are.
29
30I could be persuaded that with some heavy stats obfuscation (heavier
31than the bridge stats obfuscation), this stat might be plausible. By
32stats obfuscation, I mean obfuscating the numbers so that the attacker
33can only say "I'm somewhere between 60% to 75% of all HSes.".  This is
34a bit related to differential privacy as I understand it, but much
35more basic.
36
37FWIW, when rend-spec-ng.txt gets implemented, it will be harder for
38HSDirs to learn the number of served HSes since the descriptor will be
39encrypted. However, HSDirs will still be able to approximate the
40number of HSes by checking the amount of descriptors received per
41publishing period. If this ever becomes a problem we can imagine
42publishing fake descriptors to confuse the HSDirs.
43
44* Number of HS descriptor fetch requests that the HSDir received.
45
46An adversary can use this stat to evaluate the popularity of an HS.
47
48An adversary can also use this stat to detect big changes in the
49numbers of visitors of popular HSes.
50
51Of course, there will be noise in the stats since multiple HSes
52correspond to each HSDir, but the adversary could reduce the noise
53after observing the same HS rotating to different HSDirs, and also by
54examining the stats of all 6 HSDirs that correspond to the HS.
55
56This doesn't seem like a problem that is solvable with simple
57obfuscation of stats, and I suggest we don't do this statistic at all.
58
59* How many of the fetch requests referenced a non-existend descriptor?
60
61This seems like a stat that could potentially find bugs in Tor, but
62also something that we don't really understand and might reveal
63information about specific HSes.
64
65We need to enumerate the reasons why a client would ask for the wrong
66descriptor. For example:
67a) clock sync issues, 
68b) different network view between
69
70* Average number of descriptor updates for the same HS?
71
72Assuming that stats are published daily (which is not necessary), this
73is going to be a number between 0 and 24 (since RendPostPeriod is
74currently one hour) and HSes pick a new HSDir after 24 hours (see
75rendcommon.c:get_time_period()).
76
77Depending on how many HSes are behind each HSDir, this might or might
78not reveal uptime information about specific HSes. Still it doesn't
79seem like something we want to risk.
80
81Also, if the result is greater than 24, it means that an HS with
82modded RendPostPeriod was publishing to that HSDir (and that the HSDir
83doesn't have many clients). Do we want to reveal that?
84
85OTOH, it seems to me that if the HSDir is serving many HSes, this stat
86doesn't really provide any insight.
87
88* Total size of HS descs / Average size of HS desc
89
90These stats are again not very helpful if reported by HSDirs that
91serve many HSes. Any bugs or irregularities of one HS will be smoothed
92out by all the other HSes.
93
94Basically, the only thing we would learn is approximately how much
95disk space HS descriptors take, and maybe the average number of
96contained IPs (if we also know the number of HSes).
97
98This stat seems like a more verbose version of (c) and not very useful.
99
100* Total/average number of contained introduction points (will be killed by rend-spec-ng)
101
102Depends on the number of HSes per HSDir.