wiki:org/meetings/2018Rome/Notes/PrivCountInTor

PrivCount in Tor: Intro for Metrics

Goal of this session: walk through design of how we want to take this experimental protocol, solve some issues with it, and implement it in Tor.

Overview

  • Tor Relays

Right now relays publish their own individual stats.

This means adversaries can see what each relay reported.

Even if you add noise, you still learn something about that relay's stat.

But metrics reports just the total over all the relays anyway, so simply doing a global aggregate accomplishes that goal.

Each relay adds a fraction of the noise that is necessary to make the final count safe. This is the only point where noise is added.

  • Data Collectors

Each reporting relay has a data collector associated with it. They do the encryption of the stats.

  • Tally Reporters

Separate program that runs elsewhere from the relays. The Tally Reporters collaborate between themselves to output a total of the aggregate reports without any one of them learning the reported number from a single relay.

  • Stats Consumers

The Tally Reporters output a single answer, and then metrics et al import that document and use it for visualizing.

Discussion topics:

  • Malicious input from a relay can mess up the outputted answer, but can't mess up the privacy properties of the protocol. One way to solve the malicious input concern is to split relays up into groups, and then you need a malicious relay in each group. We do plan to do this multiple randomized groups design. We'll need shared randomness for this step. So we will need to coordinate the timing of privcount phases to use the SRV that the consensus provides. Needs more design work.
  • What analysis would be useful for the metrics team to do, now, to help privcount become what it should be?

Examples:

  • how often flags change in a 24 hour period
  • whether it's possible to partition a set of relays to achieve certain properties for each partition
  • What kind of metrics data would this system be useful for? Not all metrics needs fit into what privcount can do ("add integers, subtract integers, and that's it").
  • Nowadays there are not-too-terrible libraries that do full MPC. It is not totally ridiculous to imagine using full MPC (to compute any function we want) at some point in the future.

(The protocol has versioning in it, so if we want to update the protocol to something smarter later, we can do so.)

  • So we plan to evaluate all of the statistics we do right now, and move some of those to Privcount? Yes.

Actually we should choose among three options: (a) keep as is, (b) move to Privcount, or (c) drop because it's too scary to keep.

For example of current stats that we will want to either move or drop: exit statistics, split up by ports.

A stat Nick wants: learn how many protocolwarnings lines (of each type) happen at each relay, to get a global picture of trends in protocol violations.

Data Flow

Data Collector

  • Initialisation
    • Blinding
    • Noise
    • Events / Increments
  • Encryption
  • Submission

Tally Reporter

  • Partition
  • Aggregation
  • Reporting

Discussion: how concerned are we really about the worry where an attacker breaks in and gets the in-memory values right now? Answer: Fairly. One feature is that if it isn't critical to protect the value in memory, we can persist those values to disk, which is good for robustness.

If somebody breaks the encryption keys, they can strip away the blinding value, but they can't strip away the noise. But note that the noise at each relay is only a tiny piece of the total noise, so just each piece of noise isn't that protective.

Discussion: is the differential privacy stuff we added to Tor useful, or will it be thrown away? Answer: we never checked if it actually worked, so moving that to Privcount will be smarter. Floating point is exciting.

Guideline: add noise, and *then* bin. Never bin and then add noise.

Guideline: no sending floating point over the wire.

Teor's open research question: Should the tally reporters bin the aggregated value to protect something-something over time?

What does the metrics team need to *do*? How many of the components will the Privcount team do? Answer: the tally reporters output a number, and metrics pulls in that number and uses it.

Metrics team is going to need to get better at analyzing results from Privcount. It's subtle. So metrics team needs to either get good at understanding differential privacy (more generally, statistics analysis), or add somebody who is good at that.

Metrics team is also going to need to extend Collector to remember all of the outputs from the tally reporters. So for example, if we have five groups and we want to take the median for our analysis, collector would remember all five numbers, and that way we can change the analysis approach later and collector still has all of the analysis inputs.

Need to make a plan for: different relays are running different versions so they'll be publishing different things.

Relays need to know how many votes there are going to be, so they know how much noise they should include in their share of the noise.

Outputs

Counter / Partition / Period

  • Total - Sum of Relay Increments and Relay Noises

Counter

  • Noise distribution
  • Config
    • Name
    • Definition
    • Expected Values

Partition / Period

  • Identities of Relays (might be per-counter?)

Period

  • Collection Period
    • Start Time
    • End Time

Follow-Up Discussion

See this metrics-team thread: https://lists.torproject.org/pipermail/metrics-team/2018-March/000722.html

Last modified 7 months ago Last modified on Mar 18, 2018, 1:37:11 PM