** Session: PrivCount Technical Issues **

10/03/2018

12:00 -- 13:00

Session lead: teor

Scope: Solve remaining issues to securely implement PrivCount.

Agenda:

- Action bounds - Consistent derivation
- Adding new statistics - how to estimate value and cost
- Binning outputs for privacy
- Debugging protocol implementation
- Floating point and differential privacy issues
- Malicious inputs - prevent them from modifying output too much
- Metrics statistical analysis
- Noise
- How much excess noise to handle malicious inputs
- How to divide noise among relays
- Issue with low per-relay noise and possible zero truncation
- Function to sample Gaussian noise

- Versioning
- Ensure noise is correctly despite multiple versions

Initial questions:

- Question: What is the vision for running PrivCount? Answer: Goal is every relay submits data via PrivCount.
- Question: Do we need a minimum number of relays running PrivCount before we started outputting statistics? Answer: Not necessarily for the noise, but perhaps just to provide some level of aggregation.

Discussion:

- Action bounds
- How do we do consistent derivation?
- We should run a client and see what activity gets generated. What user applications get tested isn't clear.
- We could perhaps use measurements about popular client applications to determine what user activity to simulate

- Adding new statistics
- How do we estimate value for purposes?
- We can ramp up.

- Binning outputs for privacy
- Perhaps bin size could be the action bound.
- Do the rounding after the aggregation to handle floating point issues. Reference: "On significance of the least significant bits for differential privacy", Mironov, https://dl.acm.org/citation.cfm?id=2382264.
- Some small multiple (say, 10) of the action bound seems like a reasonable bin size

- Debugging protocol implementation
- Run protocol on a test network
- Debugging mode when each relays individually adds sufficient noise to its statistic, and compare to side-by-side to current method
- Send a cell on a circuit on a circuit that says "measure me" with an ID that has the experiment number, or just pin a node that isn't in the consensus

- Issue with low per-relay noise and possible zero truncation
- Used fixed-point arithmetic, where the decimal point position is chosen based on:
- the expected value of the statistic: avoid overflowing the ≈ 2
^{60}available bits of the shamir secret (our field elements are 2^{62}- 2^{30}- 1, which can contain 61-bit 2's complement integers. We use the remaining 2^{61}- 2^{30}- 1 values to detect overflow with probability ≈ 0.5) - the noise allocated to the statistic: allow enough (how much? 8 extra bits?) precision in the smallest possible noise added by a relay (a relay with consensus weight 1 will add sqrt(1/24 hours * 1/50 million total consensus weight) * noise allocation ≈ 1 / 2
^{15}* noise allocation)

- the expected value of the statistic: avoid overflowing the ≈ 2

- Used fixed-point arithmetic, where the decimal point position is chosen based on:
- Function to sample Gaussian noise
- Need to re-implement sampling from a Gaussian distribution
- Use discrete Gaussian distribution, lattice-based cryptography implementations should have this, e.g. implementation of BV (Brakerski and Vaikuntanathan)
- one alternative: Sampling exactly from the normal distribution
- Project page, tarballs, also implemented in MPFR as mpfr_nrandom
- Algorithm D outputs Gaussian-distributed integers with standard deviation σ, using approximately (1/0.715)log
_{2}σ uniformly random bits per sample - Since our Gaussians will be scaled along with our counters, we will be sampling Gaussians with σ between 2
^{15}and 2^{60}, using approximately 21-84 bits per sample - This method sets the low bits of the Gaussian directly from the uniformly random input bits, much like Appendix C of the PrivCount shamir spec

- another alternative: Lattice Signatures and Bimodal Gaussians
- Original source code, production-grade C implementation
- Algorithm 12 in Section 6 outputs Gaussian-distributed integers with standard deviation σ for σ = kσ
_{2}(k integer, σ_{2}= sqrt(1/(2ln2)) ≈ 0.849), using approximately 4 + 2log_{2}σ uniformly random bits per sample - Since our Gaussians will be scaled along with our counters, we will be sampling Gaussians with σ between 2
^{15}and 2^{60}, using approximately 34-124 bits per sample - We will round up σ to the nearest kσ
_{2}. These large σ make the difference between kσ_{2}and σ negligible

- background: SAMPLING FROM DISCRETE GAUSSIANS FOR LATTICE-BASED CRYPTOGRAPHY ON A CONSTRAINED DEVICE

- one alternative: Sampling exactly from the normal distribution
- HELib (Homomorphic Encryption in C++)
~~should also have this~~uses the box-mueller transform in floating-point and rounds to the nearest integer

- Malicious inputs
- How to prevent small number of relays from modifying output too much
- Report individual statistics with large amount of noise to check for reasonable size, then aggregate inputs with smaller noise
- Divide relays into buckets and report aggregate values from each bucket, choose at least bucket using trust or bandwidth or flags, choose shared random value to generate buckets produced *after* the inputs have been received
- Specific proposal: choose a few (say, 20) of the largest relays for one bucket, choose a few relays randomly based on bandwidth (say, 20), and also release the network-wide aggregate. Use the bucket values to "check" the network aggregate.

- Metrics statistical analysis
- Noise
- How much excess noise to handle malicious inputs
- How to divide noise among relays
- How to share noise across statistics.
- Divide it based on consensus weight. Then make some assumption about maximum fraction of bandwidth that is malicious or fails, which answers how excess noise is added.
- Have a minimum requirement also for number (or weight) of relays that need submit inputs to proceed with the aggregation.

- Versioning
- Ensure noise is correctly despite multiple versions
- Noise parameters and scaling
- If relays know the noise allocated to each statistic (and the number of relays on each version?), they can calculate how much noise to add to the statistics they are collecting

Last modified 20 months ago
Last modified on Oct 8, 2018, 4:34:23 AM