Changes between Version 1 and Version 2 of org/meetings/2018Rome/Notes/PrivCountInTor

Mar 12, 2018, 5:19:23 AM (11 days ago)

Roger's draft notes


  • org/meetings/2018Rome/Notes/PrivCountInTor

    v1 v2  
    11= PrivCount in Tor: Intro for Metrics =
     3Goal of this session: walk through design of how we want to take this
     4experimental protocol, solve some issues with it, and implement it in Tor.
    36== Overview ==
    58* Tor Relays
     10Right now relays publish their own individual relays.
     12This means adversaries can see what each relay reported.
     14Even if you add noise, you still learn something about that relay's stat.
     16But metrics reports just the total over all the relays anyway, so simply
     17doing a global aggregate accomplishes that goal.
     19Each relay adds a fraction of the noise that is necessary to make the
     20final count safe. This is the only point where noise is added.
    622* Data Collectors
     24Each reporting relay has a data collector associated with it. They do
     25the encryption of the stats.
    727* Tally Reporters
     29Separate program that runs elsewhere from the relays. The Tally Reporters
     30collaborate between themselves to output a total of the aggregate reports
     31without any one of them learning the reported number from a single relay.
    833* Stats Consumers
     35The Tally Reporters output a single answer, and then metrics et al import
     36that document and use it for visualizing.
     38Discussion topics:
     40* Malicious input from a relay can mess up the outputted answer,
     41but can't mess up the privacy properties of the protocol. One way to solve
     42the malicious input concern is to split relays up into groups, and then
     43you need a malicious relay in each group. We do plan to do this multiple
     44randomized groups design. We'll need shared randomness for this step. So
     45we will need to coordinate the timing of privcount phases to use the SRV
     46that the consensus provides. Needs more design work.
     48* What analysis would be useful for the metrics team to do, now, to
     49help privcount become what it should be?
     51  * how often flags change in a 24 hour period
     52  * whether it's possible to partition a set of relays to achieve certain
     53    properties for each partition
     55* What kind of metrics data would this system be useful for? Not all
     56metrics needs fit into what privcount can do ("add integers, subtract
     57integers, and that's it").
     59* Nowadays there are not-too-terrible libraries that do full MPC. It is
     60not totally ridiculous to imagine using full MPC (to compute any function
     61we want) at some point in the future.
     63(The protocol has versioning in it, so if we want to update the protocol to
     64something smarter later, we can do so.)
     66* So we plan to evaluate all of the statistics we do right now, and move
     67some of those to Privcount? Yes.
     68Actually we should choose among three options: (a) keep as is,
     69(b) move to Privcount, or (c) drop because it's too scary to keep.
     71For example of current stats that we will want to either move or
     72drop: exit statistics, split up by ports.
     74A stat Nick wants: learn how many protocolwarnings lines (of each type)
     75happen at each relay, to get a global picture of trends in protocol
    1078== Data Flow ==
    1381* Initialisation
    1482  * Blinding
    15   * Encryption
    1683  * Noise
    17 * Events / Increments
     84  * Events / Increments
     85* Encryption
    1886* Submission
    2290* Aggregation
    2391* Reporting
     93Discussion: how concerned are we really about the worry where an attacker
     94breaks in and gets the in-memory values right now?
     95Answer: Fairly. One feature is that if it isn't critical to protect the
     96value in memory, we can persist those values to disk, which is good
     97for robustness.
     99If somebody breaks the encryption keys, they can strip away the blinding
     100value, but they can't strip away the noise. But note that the noise at
     101each relay is only a tiny piece of the total noise, so just each piece
     102of noise isn't that protective.
     104Discussion: is the differential privacy stuff we added to Tor useful, or
     105will it be thrown away? Answer: we never checked if it actually worked,
     106so moving that to Privcount will be smarter. Floating point is exciting.
     108Guideline: add noise, and *then* bin. Never bin and then add noise.
     110Guideline: no sending floating point over the wire.
     112Teor's open research question: Should the tally reporters bin the
     113aggregated value to protect something-something over time?
     115What does the metrics team need to *do*? How many of the components
     116will the Privcount team do? Answer: the tally reporters output a number,
     117and metrics pulls in that number and uses it.
     119Metrics team is going to need to get better at analyzing results from
     120Privcount. It's subtle. So metrics team needs to either get good at
     121understanding differential privacy (more generally, statistics analysis),
     122or add somebody who is good at that.
     124Metrics team is also going to need to extend Collector to remember all
     125of the outputs from the tally reporters. So for example, if we have
     126five groups and we want to take the median for our analysis, collector
     127would remember all five numbers, and that way we can change the analysis
     128approach later and collector still has all of the analysis inputs.
     130Need to make a plan for: different relays are running different versions
     131so they'll be publishing different things.
     133Relays need to know how many votes there are going to be, so they know
     134how much noise they should include in their share of the noise.
    25136== Outputs ==