Methodology
We intend to apply the scientific method to the realm of network surveillance and filter detection. In order to ensure reproducibility, all experiments conducted shall be properly documented and all data collected made available to the public in a timely manner. The same observations should be possible to reproduce independently, in line with standard full disclosure practice.
We base our tests on the concepts of experiment and control groups. The experiment is defined as the portion being run on the test network, and the resultant data is stored as the experiment result. The control is defined as what the expected result should be on an unbiased, uncensored, and otherwise untampered network, and this control data is compared with the experiment result. If experiment and control mismatch, then this is an indication of some unusual network activity. The control data may be dynamic or static – for example, some DNS records are predictable while many webpages are geographically diverse.
Mismatch between experiment and control data is not always a clear signal of network manipulation, but in many protocols, it is a clear indication that some kind of tampering has taken place. We will always favor false positives rather than false negatives. This means that it is better to have more events that indicate the possible presence of censorship rather than fewer. This is because the false positives can then be investigated further and the researcher is able to understand if censorship is, in fact, occurring. This may take the form of large scale data analysis across all sample data, or, perhaps, only against a subset of the data. Collection and analysis should be considered as separate phases even while we do conduct some types of analysis during data collection.
There are instances in which the experiment–control methodology cannot be applied and in these cases the researcher is still advised to focus on being in favor of a higher false negative rating.
Every test should include a high level description of how the tests will work as well as an in-depth technical description. In describing the methodology, we will focus on result significance and accuracy, in order to enable non-technical audiences in grasping the actual meaning and how accurate they should consider the result.
The methods will also be classified by a quantifiable level of risk. In other words, any person running the test should be able to comprehend the visibility and type of traffic which will be generated, what information will be collected, stored, or sent, and how that information will be stored or sent. Therefore, given the context of their own political, economic, and legal circumstances, a person should be able to reasonably calculate a personal risk assessment. We plan to do this by presenting a full text description of the test methods and all corresponding data, allowing us to be completely transparent with people who wish to support the project by running tests on their networks. By making it clear what risks they may incur by running a test, users will be able to make informed consensual choices. We also intend to educate users about possible testing scenarios, and to ensure that they understand any potential differences between conducting tests on under personally identifying circumstances, such as the difference between running a test on an open wireless network versus their cell phone with a SIM card registered to their passport. We believe that previous efforts have not produced such educational material and that users have been left in the dark.
Test Categorization
The tests that are run by OONI can be divided into two macro categories: Traffic Manipulation and Content Blocking.
For Traffic Manipulation tests there is no need to supply a list of assets or targets to be tested for blocking. In the case of Content Blocking tests such inputs are required.
When running a Content Blocking test the inputs go through a preprocessing phase. The aim of this phase is to collect a set of hostnames, URLs and/or keywords that are likely to be censored on the target network.
Test Writing HOWTO
When implementing an OONI test, the first step is the test specification. Such specification should follow the provided test template.
The test should then be categorized as of Traffic Manipulation type or Content Blocking and added to the main Test page.
When writing a new test it is ideal if a ticket for such test is created and added to cc hellais, ioerror, isis.
Censorship Taxonomy
We will research various surveillance and censorship methodologies in order to understand their false positive/false negative rates. We will produce high level descriptions of how the tests will work as well as in-depth technical descriptions. In describing the methodology we will focus on explaining the meaning and predicted accuracy of returned results.
The methods will also be classified by estimated level of risk entailed for a user running these tests on their given network. OONI cannot take responsibility nor provide any warranty to testers, but will always attempt to inform them as much as possible concerning risk factors. Each user or tester must, of course, make their own decisions based on their own local context.
I believe the taxonomy proposed by Leberknight et al.![1] is a very good starting point. The paper proposes these as factors to be taken into account when analyzing a censorship system:
- Cost: both resource and opportunity cost, which directly impacts the availability of censors.
- Scope: the range of communication modes censored.
- Scale: the number of people and devices that can be simultaneously censored.
- Speed: the reaction time of censors.
- Granularity: the resolution at different levels, e.g., server, port, webpage, end user device, etc.
- False negative: the accuracy of censors.
- False positive: too high a false positive rate depletes the censor's resources.
- Circumventability: how easily the censors can be disabled
We plan to expand and revise upon this taxonomy, after actively determining its practicality in the field.
With respect to censorship detection, we have a separate taxonomy for the efficacy of a given technique for detecting censorship, how "invasive" it is, and how risky it may be for the user running it.
[1] Leberknight et al. "A Taxonomy of Internet Censorship and Anti-Censorship". Princeton University, Dept. of Electrical Engineering, 2011. http://www.princeton.edu/~chiangm/anticensorship.pdf