It was shortly discussed on #tor-dev that some sort of "censorship-timeline" for Tor would be helpful. In particular, this should provide:
Detailed technical analyses of the censorship mechanisms in place (DPI fingerprints and manufacturers, traceroutes, ...)
Code and data to reproduce all experiments
Tor patches and standalone tools to evade the censorship devices
After all, this timeline should serve as a comprehensive archive for all people interested in how Tor is getting blocked. It should make it easy to answer questions such as "What happened to Tor in country X back in Y?".
There are also some open questions:
How should the data be structured? In form of a timeline? Or country based? Something else?
What data should be published and when? Full disclosure too early in the process helps the censors.
How should it be presented? In a wiki page or a standalone web site?
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Packet captures can be sensitive and we probably don't want to publish them online for everyone to see. Maybe we should put them in a private git.tpo repo for now?
Packet captures can be sensitive and we probably don't want to publish them online for everyone to see. Maybe we should put them in a private git.tpo repo for now?
It depends what the packet captures contain. If they are the packet captures of what a censorship event looks like as long as you strip the src IP they should be fine.
Packet captures can be sensitive and we probably don't want to publish them online for everyone to see. Maybe we should put them in a private git.tpo repo for now?
It depends what the packet captures contain. If they are the packet captures of what a censorship event looks like as long as you strip the src IP they should be fine.
I'd say that the source IP address is pretty useful to have. I don't know if there is a way to sanitize client and bridge pcap files without removing data that is useful to the person analyzing the files.
We can then, if it becomes non practical to have it in a wiki, move it to a standalone website.
I think having both timeline and per country indexes would be of great use. I don't see why one should exclude the other. They will end up anyways being event specific so there is no reason to go for one over the other.
Packet captures can be sensitive and we probably don't want to publish them online for everyone to see. Maybe we should put them in a private git.tpo repo for now?
It depends what the packet captures contain. If they are the packet captures of what a censorship event looks like as long as you strip the src IP they should be fine.
I'd say that the source IP address is pretty useful to have. I don't know if there is a way to sanitize client and bridge pcap files without removing data that is useful to the person analyzing the files.
Just put the ASN in place of the source ip. I don't think that makes the data at all less useful.
Just put the ASN in place of the source ip. I don't think that makes the data at all less useful.
Be very careful when thinking you've anonymized data. For example, if you take out the IP address, but you leave in a checksum of the previous thing that included the IP address, it is not hard to recompute the IP address.
Some good suggestions WRT sanitizing the pcap logs appeared on IRC:
< rransom> Runa, hellais: Keep in mind that country + IP header checksum is probably sufficient to recover redacted packet IP addresses.
< radii> hellais: then, it's important that in anonymized.pcap, all the frames for 192.168.1.100 map to a random key, say 3.4.5.6; while the frames for 192.168.1.101 map to a different random key, 8.7.6.5
< radii> if you just rand() for every packet, you lose way too much information and can't reconstruct TCP streams anymore (among many other problems)
Just put the ASN in place of the source ip. I don't think that makes the data at all less useful.
Be very careful when thinking you've anonymized data. For example, if you take out the IP address, but you leave in a checksum of the previous thing that included the IP address, it is not hard to recompute the IP address.
Depending on how sensitive the data is, even port numbers can be a problem since we have to assume that data might be captured and stored by the censor for later analysis. Anonymizing traffic traces is a hard problem and in most cases it might be better to just provide the tools to quickly reproduce traffic traces.
We should also probably consider moving to a database design in the future, so that people can search by-country, or by-year, or by-DPI-box-manufacturer. But I guess that with the current amount of data, the wiki is a fine start.
BTW, I think failsafe pcap sanitization is pretty much a lost cause, except if someone audits all packets by hand to make sure that no application-layer leaks exist (assuming that we plugged all the network/transport-layer leaks). I agree with 'phw' that providing the tools to quickly reproduce traffic traces is a good idea.
I gave it a little bit more structure and data. However, just one wiki page might not be the best way to organize all the data since it becomes confusing rather quickly.
One possibility would be to use this timeline software for visualization and link to single trac pages which then cover all the censorship incidents in detail.
I gave it a little bit more structure and data. However, just one wiki page might not be the best way to organize all the data since it becomes confusing rather quickly.
You can use as many wiki pages as you want. I restructured the data to be on a country by country basis. If we end up having too much information for country we can create sub pages for the countries.
One possibility would be to use this timeline software for visualization and link to single trac pages which then cover all the censorship incidents in detail.
I think we can achieve something similar with just a master trac page that has this information. If we want to do it the right way we may want to find a good trac plugin that does it, but I would try not to depend too much on external infrastructure.
I think that we should not bother to anonymize the data - only post data where it's safe to share the entire payload of a pcap. That way, we don't have to deal with secret repositories or any weird bullshit.
Throwing in this blurb relevant for SponsorZ-stuff: As of right now, we have logs and network captures from six or seven different blocking events. What I would like to do is to analyze the data we have and see if there are any similarities between them, heuristics on spoofed packets, number of TCP resets, and so on. This could help answer questions such as "Is Ethiopia using the same type of device as the Philippines?", "Does Kazakhstan have a filter similar to the one used in the UAE?", and will hopefully make future packet analysis projects a bit easier.
we should not be storing pcaps for any time longer than necessary to determine how tor is being blocked in country. Our systems will be cracked at some point, and we will lose control of the pcap files.
Another thing I would like to see in the censorshipwiki are the changes that Tor has done to its source code to dodge censorship. Also, the tor versions where the changes were introduced.
This is interesting both from a history perspective and for understanding how a specific Tor version can be blocked.
Another thing I would like to see in the censorshipwiki are the changes that Tor has done to its source code to dodge censorship. Also, the tor versions where the changes were introduced.
This is interesting both from a history perspective and for understanding how a specific Tor version can be blocked.
That's a good idea. I added the page "Changes in Tor" to the Censorship Wiki and started by covering the cipher list change introduced in version 0.2.3.17-beta.