Website Fingerprinting
Motivation
Website Fingerprinting is a passive deanonymization technique that may be performed by a malicious guard or any adversary positioned between the client and their guard. The adversary uses machine learning to train a classifier that, on observation of a sequence of Tor cells coming from a client, may correctly identify the page being loaded with high accuracy. The problem is exacerbated for clients fetching onion service websites because of a related attack known as [fingerprinting], which can identify this fact, narrowing down the list of sites they may be loading to a set many orders of magnitudes smaller than the total space of websites.
Discussion Forums
Points of In-Person Session Discussion
- The effects of the HTTP/2 on the website fingerprinting have not yet been studied. We believe it may make the problem somewhat more difficult for attackers by hiding the size of subresources, which get pushed as a bundle in response to the initial HTTP GET request, instead of being sent as the client makes individual requests for them over separate HTTP connections.
- http://cacr.uwaterloo.ca/techreports/2016/cacr2016-05.pdf: a decoy routing system that was discussed in the context of a possible server-side defense.
Ongoing and Completed Projects
- [Negotiation (Prop 254)]: "This proposal aims to describe mechanisms for requesting various types of padding from relays. These padding primitives are general enough to use to defend against both website traffic fingerprinting as well as hidden service circuit setup fingerprinting."
- https://arxiv.org/pdf/1512.00524v3.pdf: a protocol-level defense implemented as a pluggable transport. Does not incur a latency penalty, and it's bandwidth to feature coverage ratio is relatively good.
- http://3tmaadslguc72xc2.onion: An onion service implementing an experimental server-side website fingerprinting defense. You can find the source code and the paper on the defense on the site itself.
- [HOT research project]: "investigating pluggable transports for Tor to resist website fingerprinting attacks and censorship."
- https://github.com/mjuarezm/wfpadtools: "a framework implemented as an Obfsproxy Pluggable Transport to develop link-padding-based website fingerprinting strategies in Tor... implements a framing layer for the Tor protocol that allows to add cover traffic and provides a set of primitives that can be used to implement more specific anti-website fingerprinting strategies."
- [Effect of DNS on Tor’s Anonymity]: "We study (i) how exposed the DNS protocol is compared to web traffic, (ii) how Tor exit relays are configured to use DNS, (iii) how existing website fingerprinting attacks can be enhanced with DNS, and (iv) how effective these enhanced website fingerprinting attacks are at Internet-scale."
Research Tools
- [SecureDrop]: A complete website fingerprinting data collection and analysis pipeline. While currently focused on SecureDrop, it is the developers hope this project may later be more easily generalized to accomodate a large range of website fingerprinting research projects.
- [Browser Crawler]: A library for crawling websites with Tor Browser.
- [Browser Selenium]: Tor Browser automation with Selenium.
- https://github.com/pylls/go-knn: a Go implementation of the specialized [classifier] and a feature extractor that uses the popular featureset also defined in Wang's thesis.