Changes between Initial Version and Version 1 of doc/OONI/TestWritingMethodology

Jul 9, 2012, 7:49:16 PM (7 years ago)

Change the title of wiki page as discussed


  • doc/OONI/TestWritingMethodology

    v1 v1  
     1= Test Writing Methodology =
     2We intend to apply the scientific method to the realm of network surveillance and filter detection. In order
     3to ensure reproducibility, all experiments conducted shall be properly documented and all data collected
     4made available to the public in a timely manner. The same observations should be possible to reproduce
     5independently, in line with standard full disclosure practice.
     7We base our tests on the concepts of experiment and control groups. The experiment is defined as the portion
     8being run on the test network, and the resultant data is stored as the experiment result. The control is
     9defined as what the expected result should be on an unbiased, uncensored, and otherwise untampered network,
     10and this control data is compared with the experiment result. If experiment and control mismatch, then this
     11is an indication of some unusual network activity. The control data may be dynamic or static – for example,
     12some DNS records are predictable while many webpages are geographically diverse.
     14Mismatch between experiment and control data is not always a clear signal of network manipulation, but in
     15many protocols, it is a clear indication that some kind of tampering has taken place. We will always favor
     16false positives rather than false negatives. This means that it is better to have more events that indicate
     17the possible presence of censorship rather than fewer. This is because the false positives can then be
     18investigated further and the researcher is able to understand if censorship is, in fact, occurring. This may
     19take the form of large scale data analysis across all sample data, or, perhaps, only against a subset of the
     20data. Collection and analysis should be considered as separate phases even while we do conduct some types of
     21analysis during data collection.
     23There are instances in which the experiment–control methodology cannot be applied and in these cases the
     24researcher is still advised to focus on being in favor of a higher false negative rating.
     26Every test should include a high level description of how the tests will work as well as an in-depth technical
     27description. In describing the methodology, we will focus on result significance and accuracy, in order to
     28enable non-technical audiences in grasping the actual meaning and how accurate they should consider the result.
     30The methods will also be classified by a quantifiable level of risk. In other words, any person running the test
     31should be able to comprehend the visibility and type of traffic which will be generated, what information will
     32be collected, stored, or sent, and how that information will be stored or sent. Therefore, given the context
     33of their own political, economic, and legal circumstances, a person should be able to reasonably calculate a
     34personal risk assessment. We plan to do this by presenting a full text description of the test methods and all
     35corresponding data, allowing us to be completely transparent with people who wish to support the project by
     36running tests on their networks. By making it clear what risks they may incur by running a test, users will be
     37able to make informed consensual choices. We also intend to educate users about possible testing scenarios,
     38and to ensure that they understand any potential differences between conducting tests on under personally
     39identifying circumstances, such as the difference between running a test on an open wireless network versus
     40their cell phone with a SIM card registered to their passport. We believe that previous efforts have not
     41produced such educational material and that users have been left in the dark.
     43== Test Categorization ==
     45The tests that are run by OONI can be divided into two macro categories: Traffic Manipulation and Content
     48For Traffic Manipulation tests there is no need to supply a list of assets or targets to be tested for blocking.
     49In the case of Content Blocking tests such inputs are required.
     51When running a Content Blocking test the inputs go through a preprocessing phase. The aim of this phase is to
     52collect a set of hostnames, URLs and/or keywords that are likely to be censored on the target network.
     54== Test Writing HOWTO ==
     56When implementing an OONI test, the first step is the test specification. Such specification should follow
     57[wiki:doc/OONI/Tests/TestTemplate the provided test template].
     59The test should then be categorized as of Traffic Manipulation type or Content Blocking and added to
     60[wiki:doc/OONI/Tests the main Test page].
     62When writing a new test it is ideal if a ticket for such test is created and added to cc hellais, ioerror, isis.
     64== Censorship Taxonomy ==
     66We will research various surveillance and censorship methodologies in order to understand their false positive/false negative rates. We will produce high level descriptions of how the tests will work as well as in-depth technical descriptions. In describing the methodology we will focus on explaining the meaning and predicted accuracy of returned results.
     68The methods will also be classified by estimated level of risk entailed for a user running these tests on their given network. OONI cannot take responsibility nor provide any warranty to testers, but will always attempt to inform them as much as possible concerning risk factors. Each user or tester must, of course, make their own decisions based on their own local context.
     70I believe the taxonomy proposed by Leberknight et al.![1] is a very good starting point. The paper proposes these as factors to be taken into account when analyzing a censorship system:
     72 * Cost: both resource and opportunity cost, which directly impacts the availability of censors.
     73 * Scope: the range of communication modes censored.
     74 * Scale: the number of people and devices that can be simultaneously censored.
     75 * Speed: the reaction time of censors.
     76 * Granularity: the resolution at different levels, e.g., server, port, webpage, end user device, etc.
     77 * False negative: the accuracy of censors.
     78 * False positive: too high a false positive rate depletes the censor's resources.
     79 * Circumventability: how easily the censors can be disabled
     81We plan to expand and revise upon this taxonomy, after actively determining its practicality in the field.
     83With respect to censorship detection, we have a separate taxonomy for the efficacy of a given technique for detecting censorship, how "invasive" it is, and how risky it may be for the user running it.
     85[1] Leberknight et al. "A Taxonomy of Internet Censorship and Anti-Censorship". Princeton University, Dept. of Electrical Engineering, 2011.