wiki:org/meetings/2018MexicoCity/Notes/PrivCount

9/30/2018

12:30 -- 13:30

Session lead: teor

* Session: PrivCount in tor *

Progress since 3/2018

  • Proposal written providing fault tolerance using Shamir secret sharing (Tor Proposal 288)
  • Summary of system structure:
    • Entities: Relays, Data Collectors (DCs), Tally Reporters (TRs), Tally Server (TS)
    • Relays send data to DCs
    • DCs send results to TRs
    • TRs generate aggregate results and send them to TS (i.e. Tor Metrics)
    • Concern is that per-relay statistics revealed without noise can allow individual usage activity to be inferred. A specific example is that the guard of onion services can be discovered using statistics about bandwidth usage.
    • Threat model allows adversary to run relays and Tally Reporters. At least one Tally Reporter must be honest.
  • Question: can you run statistics collection on Tor clients? Answer: We have no plans to due to difficulties in preserving client privacy.
  • Question: If the whole network doesn't participate in PrivCount, are you doing network-wide extrapolation? Answer: Yes, we will be doing network extrapolation.
  • Question: What statistics do you plan to collect? Answer: Tor is already collecting statistics that they would like to transfer to using PrivCount. In addition, we would like to add other statistics to Tor. We do need many relays participating in PrivCount to have reliable, significant results.
  • Suggestion: collect the zero-statistic as a sanity check on correctness and noise.
  • Plans for the code
    • It will not be Tor-specific. The Collector can take data from any other process. The Tally Reporter can aggregate inputs from any Data Collectors. Other systems needing privacy-preserving measurement might find this useful (e.g. I2P, Firefox).
    • It will be written in Rust.
  • PrivCount Shamir secret-sharing code has been written and is being reviewed.
  • Question: Who chooses who is running PrivCount? Answer: All relay operators can and should run it. It will eventually be on by default. The Tally Reporters will be the DirAuth operators or people designated by the DirAuths.
  • Question: What specific statistics will be reported? Answer: Relay traffic statistics. Onion-service statistics. Error statistics (this would be particularly straightforward because relays already locally emit events at various warning levels).
  • Question: Do you plan to create an analogous Alexa top sites for Tor? Answer: Probably not because it seems hard to use PrivCount to produce such a list accurately and in a privacy-preserving manner. For example, just locally storing sites visited creates a list of potentially-sensitive domains/URLs.
  • Performed a fairly comprehensive measurement study of Tor using PrivCount that shows the kinds of useful statistics than can be collected ("Understanding Tor Usage with Privacy-Preserving Measurement", ACM IMC 2018).

Next Steps

  • Remaining design issues
    • Floating point issues in the noise generation.
    • How to partition relays into different measurement groups
    • Want to bin the totals (after aggregation and noise) to limit information leakage over time. How to do choose the bins?
    • Decide minimum amount of noise to require from the relays
    • Need a process to determine how much user activity to protect. This is actually pretty hard because users can in principle individually constitute a large fraction of the total counts. Hiding that would make the result very inaccurate. Therefore, we will try to protect a reasonable amount of individual activity, but the requires defining what "reasonable" is.
    • How should we divide the privacy budget among multiple statistics?
    • How should we coordinate among multiple versions of PrivCount reporting statistics? They need to be somewhat aware of each other to manage to the total privacy budget.
    • Work out how to debug the system. Using noisy, blinded, and encrypted values make it hard to check that outputs are correct.
  • Proof of Concept (POC)
    • Will likely include shortcuts like hardcoding identities
    • Would like to get POC out in October
    • Minimum Viable Product by end of November
  • Plan for deployment
    • Implement various modules in Rust.
    • Start collecting dummy statistics (e.g. zero).
    • Allow relay operators to start opting into PrivCount collection (will require compiling Rust)
Last modified 2 months ago Last modified on Sep 30, 2018, 10:31:41 PM