wiki:org/meetings/2018NetworkTeamHackfestSeattle/privcount

blinding

description: shamir secret sharing of a random value that is the starting counter for a statistic.

noise

description: add noise in a smooth enough distribution that you can't distinguish between neighbouring values

DONE: rust rng wrapper DONE: tests for gaussian distributions of integers

there are a bunch of problems that come along with using IEEE754 floating point.

if we wrote a fixed-point arithmetic implementation, we'd need to write our own trancendental functions including square root and logarithm.

TODO: it is unknown if there are any fixed-point libraries which do what we need.

https://www.doc.ic.ac.uk/~wl/papers/07/csur07dt.pdf

api for statistics

TODO: need to design api for allocating noise using an optimisation method that aaron created. for that we need an action bound and estimated value. the estimated value is not a security parameter; the action bound is. we'll need to do measurements with an actual client implementation to discover an appropriate action bound for our desired anonymity set size/security bounds.

TODO: if we ran privcount on all our current statistics, how many of them would we not be able to collect anymore because it's not possible to add sufficient noise.

TODO: should we protect the average case statistically, or some factor of the average?

epsilon is a factor in the probability that you'll be able to distinguish whether a given user was active on the network on a given day.

let ϵ be a positive real number and A be a randomized algorithm that takes a dataset as input (representing the actions of the trusted party holding the data). let imA denote the image of A. the algorithm A is ϵ-differentially private if for all datasets D1 and D2 that differ on a single element (i.e., the data of one person), and all subsets S of imA,

Pr[A(D1) ∈ S] ≤ e{ϵ} × Pr[A(D2) ∈ S],

where the probability is taken over the randomness used by the algorithm.

apple used epsilon=43 at one point in time. now they use epsilon=11. (lower numbers of epsilon are better.) we're aiming for epsilon=0.3.

TODO: need detailed spec on what stats and their noise levels, also versioning for stats when we want to change and/or tweak noisiness. if a statistic's version is too old or we believe its noise to be insufficient to maintain privacy, we should have a mechanism for telling those clients to simply not report that data.

TODO: need threat modelling and decisions on potential bad relays that decide to stop adding noise to their collected statistics. the proposed attack is that a relay could not add noise in order to discover more from the collected data from other relays. we could not care because any relay which wanted to be malicious could more effectively do so by exposing their own users, or we could add additional noise based on consensus weight. another idea is that we can allocate noise based on the number of N relays in the network such that each relay gets 1/Nth of the noise.

TODO: optimise this for "simplest possible decisions at first" so that we can deploy it.

Last modified 2 weeks ago Last modified on Jul 4, 2018, 12:54:09 AM