wiki:org/meetings/2018MexicoCity/Notes/PrivCountTechnical

Session: PrivCount Technical Issues

10/03/2018

12:00 -- 13:00

Session lead: teor

Scope: Solve remaining issues to securely implement PrivCount.

Agenda:

  • Action bounds - Consistent derivation
  • Adding new statistics - how to estimate value and cost
  • Binning outputs for privacy
  • Debugging protocol implementation
  • Floating point and differential privacy issues
  • Malicious inputs - prevent them from modifying output too much
  • Metrics statistical analysis
  • Noise
    • How much excess noise to handle malicious inputs
    • How to divide noise among relays
    • Issue with low per-relay noise and possible zero truncation
    • Function to sample Gaussian noise
  • Versioning
    • Ensure noise is correctly despite multiple versions

Initial questions:

  • Question: What is the vision for running PrivCount? Answer: Goal is every relay submits data via PrivCount.
  • Question: Do we need a minimum number of relays running PrivCount before we started outputting statistics? Answer: Not necessarily for the noise, but perhaps just to provide some level of aggregation.

Discussion:

  • Action bounds
    • How do we do consistent derivation?
    • We should run a client and see what activity gets generated. What user applications get tested isn't clear.
    • We could perhaps use measurements about popular client applications to determine what user activity to simulate
  • Adding new statistics
    • How do we estimate value for purposes?
    • We can ramp up.
  • Binning outputs for privacy
    • Perhaps bin size could be the action bound.
    • Do the rounding after the aggregation to handle floating point issues. Reference: "On significance of the least significant bits for differential privacy", Mironov, https://dl.acm.org/citation.cfm?id=2382264.
    • Some small multiple (say, 10) of the action bound seems like a reasonable bin size
  • Debugging protocol implementation
    • Run protocol on a test network
    • Debugging mode when each relays individually adds sufficient noise to its statistic, and compare to side-by-side to current method
    • Send a cell on a circuit on a circuit that says "measure me" with an ID that has the experiment number, or just pin a node that isn't in the consensus
  • Issue with low per-relay noise and possible zero truncation
    • Used fixed-point arithmetic, where the decimal point position is chosen based on:
      • the expected value of the statistic: avoid overflowing the ≈ 260 available bits of the shamir secret (our field elements are 262 - 230 - 1, which can contain 61-bit 2's complement integers. We use the remaining 261 - 230 - 1 values to detect overflow with probability ≈ 0.5)
      • the noise allocated to the statistic: allow enough (how much? 8 extra bits?) precision in the smallest possible noise added by a relay (a relay with consensus weight 1 will add sqrt(1/24 hours * 1/50 million total consensus weight) * noise allocation ≈ 1 / 215 * noise allocation)
  • Function to sample Gaussian noise
  • Malicious inputs
    • How to prevent small number of relays from modifying output too much
    • Report individual statistics with large amount of noise to check for reasonable size, then aggregate inputs with smaller noise
    • Divide relays into buckets and report aggregate values from each bucket, choose at least bucket using trust or bandwidth or flags, choose shared random value to generate buckets produced *after* the inputs have been received
    • Specific proposal: choose a few (say, 20) of the largest relays for one bucket, choose a few relays randomly based on bandwidth (say, 20), and also release the network-wide aggregate. Use the bucket values to "check" the network aggregate.
  • Metrics statistical analysis
    • PrivCount needs to tell Metrics *which* relays provided inputs.
    • PrivCount also needs to tell Metrics the standard deviation of the noise.
    • Sampling error needs to be taken into account. Ian wrote an email describing how to do this.
  • Noise
    • How much excess noise to handle malicious inputs
    • How to divide noise among relays
    • How to share noise across statistics.
    • Divide it based on consensus weight. Then make some assumption about maximum fraction of bandwidth that is malicious or fails, which answers how excess noise is added.
    • Have a minimum requirement also for number (or weight) of relays that need submit inputs to proceed with the aggregation.
  • Versioning
    • Ensure noise is correctly despite multiple versions
    • Noise parameters and scaling
    • If relays know the noise allocated to each statistic (and the number of relays on each version?), they can calculate how much noise to add to the statistics they are collecting
Last modified 2 months ago Last modified on Oct 8, 2018, 4:34:23 AM