PrivCount Research Retrospective

Transcript of an IRC chat on #tor-dev in the 3/4 July patch party. Posted with permission, edited to remove other users.

<+komlo> i don't have patches to discuss, but i would like to hear opinions about good research ideas that didn't make it into tor, and why
<+komlo> ah ok, just that the idea lacked complete analysis and further analysis revealed flaws
<+komlo> ok, understood.
<+komlo> are there common trends in the analysis that isn't done?
<+komlo> ah right. making assumptions explicit is important
<+komlo> and sufficient information to reproduce the results
<+catalyst> it seems that often reviewers of papers lack the context to know if the assumptions were flawed
<teor4> komlo: I can tell you the privcount story if you want
<+komlo> teor4: that would be great!
<+komlo> just thinking these over and what concrete "do these things" recommendations would be helpful
<teor4> Ok, so researchers looked at Tor's statistics, and said: "wow, that's a lot of detailed information that adversaries are getting for free"
<+teor> And they thought that tor could do better, by adopting a secure stats aggregation scheme, with noise
<+teor> (secure stats aggregation reveals the network total, rather than each relay total)
<+komlo> teor: yep, following
<+teor> (noise hides a client's usage on the network)
<+teor> So they wrote a system called PrivEx, that had a single statistics counter, and their paper was done
<+komlo> teor: what do you mean by single statistics counter? 
<+teor> each Tor relay's public statistics are a simple total of what that relay has seen, sometimes categorised into countries. The non-public logs show about the same level of detail.
<+teor> komlo: just looking up the paper
<+komlo> teor: no worries, i can look at the paper later. 
<+komlo> teor: i'm curious what the delta was between that paper and privcount :) 
<+teor> komlo: ha! we're not even at PrivCount yet
<+teor> So the single counter was the number of hits on a large list of censored exit domains
<+teor> And then that paper was done, and it was called PrivEx:
<+teor> (And those researchers did great work, because they actually deployed on the live tor network, rather than providing a theoretical scheme)
<+komlo> ah yes that seems useful 
<+teor> And then another group of researchers came along, and they said: "wait, Tor has many statistics, and we want to protect users even when an adversary knows them all"
<+teor> So they generalised PrivEx to multiple counters, with a combined noise budget that protected all of a user's actions on the network, rather than a particular action from a particular user
<+komlo> teor: that seems like a good next step to build on 
<+teor> And they made some improvements to the code, and deployed on the live network, and called it PrivCount, and their paper was done
<+komlo> i'm waiting for the catch :) 
<+teor> deployed a short-term experimental collection on the live network
<+teor> And then several groups of researchers though that PrivCount was great, and so they modified this experimental PrivCount to collect more statistics, and wrote papers
<+teor> Some of them used machine learning:
<+komlo> "We deployed PrivCount on the live Tor network with 1 tally server, 6 share keepers, and 7 data collectors monitoring one Tor relay each."
<+komlo> ^ from the paper 
<+teor> Others used traffic flow analysis (and their paper is in draft)
<+teor> And others focused on collecting a large number of statistics (me et al), and their paper is in draft
<+teor> We also expanded the deployment to 16-17 relays, and 3 operators
<+teor> And did some early work on unique counts (PrivEx/PrivCount only support additive counts)
<+teor> And then the grant was done, and the papers were done-ish
<+teor> And at the same time, those researchers wanted to get their experimental PrivCount implementation deployed on the Tor network
<+komlo> and PrivCount was ready to be dropped directly into tor? :) 
<+teor> Oh, I wish!
<+teor> Just skimming prop#280 to refresh my memory
<+teor> But the design wasn't robust enough to scale to 1000s of relays, and their Tor patch required the control port to be open, which is a security issue, and it was slow
<+teor> (And none of us asked these questions in 2016, so we were a bit surprised in 2017 to get that feedback)
<+teor> And so nickm and I redesigned PrivCount for Tor, and wrote prop#280
<+teor> But prop#280 didn't solve all our open issues, because it didn't specify noise, and if any Tally Reporter (aggregation node) went down, there were no stats that day
<+teor> So we wrote prop#288
<+teor> where we specify secret sharing for redundancy, and how to generate noise, because it's easy to generate insecure noise
<+teor> And here's what we thought we had left to do:
<+komlo> right, i wondered how often scalability is a major issue when evaluating research papers for potential implementation 
<+teor> So prop#288 defines the low-level API
<+teor> But we still need to specify how to allocate noise between counters, between relay partitions (to avoid outliers), between relays, and between consensuses
<+teor> And we need to work out how to determine the amount of client usage we want to protect for each statistic, in a repeatable way
<+teor> And we need to work out how to add and remove statistics, once some are already deployed
<+teor> And how we communicate the configuration parameters to all the relays
<+teor> So there's still a lot of work left
<+teor> If everyone is ok, I'll take a copy of the last few minutes, tidy it up, and put it on trac wiki
<+komlo> teor: ok by me, thanks for asking 
<+teor> So there's still a lot of work left to get PrivCount in Tor, and it's not necessarily work that interests researchers
<+catalyst> teor: ok with me, though i didn't say much once you started :)
<+komlo> teor: sure- would you say this is something that should have been discussed in the research paper or addressed in the implementation? 
<+teor> PrivCount specifies how to allocate noise between counters, and between relays using an operator model. But that might not be the best compromise model for Tor.
<+teor> (Maybe consensus weight or number of relays is better.)
<+teor> PrivCount also tries to justify the amount of client usage that's being protected, but it's not repeatable. We should probably measure actual client usage. (Which is circular!)
<+teor> The PrivCount configuration model isn't robust enough for a large number of relays
<+komlo> ^ that is a good takeaway 
<+teor> And researchers tend to shut down and redeploy new versions, rather than investing in mixed version deployment compatibility
<+teor> ^ another good takeaway
<+komlo> agreed, mixed versioning is always tricky :) 
<+komlo> ok this is all really helpful, quite a lot to think over 
<+teor> With PrivCount it's also tricky because the total noise across all statistics on all reporting relays determines user privacy
<+teor> And there's probably a few other things I've forgotten
<+komlo> right, trying to think about what a takeaway would be for how the deployment across the network matters for a single relay 
<+teor> Oh, with N relay partitions, we can resist N malicious or broken relays, but we probably need better resistance than that
<+komlo> i guess there wouldn't be a good incremental deployment option there? 
<+teor> I think the best option is to say that the noise budget is X, and each relay adds X*1/R to the noise, where R is the number of relays that support PrivCount in Tor
<+teor> (Technically it's X*sqrt(1/R), because noise standard deviation isn't additive.)
<+teor> With some minimum number of relays you need to get a report.
<+komlo> incremental deployment adds risk (which makes deploying it hard)? (to generalize)
<+teor> Yes, it means you have to think about minimum thresholds, and scaling
<+komlo> yeah that is tricky 
<+komlo> ok, this has been extremely helpful :) i'm going to go write notes and think about how to synthesize this, i'll send out notes after
<+teor> Oh, also, experimental PrivCount fails if any node fails. But PrivCount in Tor needs to work even if a significant fraction of the relays fail.
<+komlo> ^ omg
<+komlo> yes definitely! failure cases are very important!
<+teor> Researchers can usually repeat an experiment that fails. We can't really do that on the live tor network.
<+komlo> not exactly :)
<+teor> People expect Tor stats most days
<+komlo> signing off for the evening now- thanks for the great ideas/feedback, feel free to add more here, i'll read scrollback in the morning 
<+teor> No worries, I'll put notes on the wiki
<+teor> Also, part of scaling will be running a simulation of splitting the noise from 6000 relays, and working out if the integer truncation makes our noise too low
<+teor> Or we could just use ceil() to be safe.
<+teor> Anyway, lots of work to do!
Last modified 2 years ago Last modified on Jul 4, 2018, 1:16:58 AM