Changes between Initial Version and Version 1 of doc/OONI/DataCollection


Ignore:
Timestamp:
May 17, 2012, 3:25:09 PM (8 years ago)
Author:
hellais
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/OONI/DataCollection

    v1 v1  
     1All the data collected by OONI-probe must be open and accessible. We will do the effort to not redact any data that is contained in the logs, unless it's content could potentially lead to identification of users and lead to privacy leaks.
     2
     3The data format for the reports is YAMLOONI. Every report must contain as a bare minimum the following information:
     4
     5 * Timestamp of start and end of test
     6 * ASN from which the test originated
     7 * The address from where the test was run (optional)
     8 * A two way traceroute from and to another OONI-probe node (optional)
     9
     10The two way traceroute ([wiki:doc/OONI/Tests/TwoWayTraceroute]) should be done with source and destination ports set to 0, 21, 80, 123, 443, UDP, TCP and ICMP. If A is the host from which the test is being performed the host will pick a random OONI-probe node X and perform a traceroute to X. X will be signaled that they need to traceroute to A and they will run the same traceroute and send the result to A. This is useful to understand the topology of the network from which the test is being run from. This traceroute can potentially leak information about the fact that the user is running OONI-probe software and should be run once all of the tests that needed to be run have completed.
     11
     12The timestamp format we use is that specified in RFC3339. All times are expressed in UTC.
     13
     14This is an example of a YAMLOONI OONI-probe report:
     15
     16{{{
     17# OONI Probe Report for Test httphost
     18# 18th of April 2012 18:00:00
     19---
     20test_name: httphost
     21asn: ASN-59395
     22addr: 198.51.100.1
     23start_time: 2012-04-18T18:00:00.00Z
     24---
     25start_time: 2012-04-18T18:00:00.00Z
     26end_time: 2012-04-18T18:00:02.12Z
     27result: {'thetestresult': 'data'}
     28---
     29start_time: 2012-04-18T18:00:00.00Z
     30end_time: 2012-04-18T18:00:03.12Z
     31result: {'thetestresult': 'data'}
     32# Test ended in 200s
     33---
     34end_time: 2012-04-18T18:00:00.00Z
     35traceroute:
     36- dst: 80
     37  src: 80
     38  tcp:
     39  - [10.0.2.1, 794.792, 6.323, 1.18]
     40  - [198.51.100.2, 25.092, 11.716, 44.371]
     41  - [203.0.113.5, 59.241, 12.302, 14.776]
     42  - [203.0.113.8, 59.241, 12.302, 14.776]
     43  timestamp: '2012-04-18T19:11:14'
     44  udp:
     45  - [10.0.2.1, 794.792, 6.323, 1.18]
     46  - [198.51.100.2, 25.092, 11.716, 44.371]
     47  - [203.0.113.5, 59.241, 12.302, 14.776]
     48  - [203.0.113.8, 59.241, 12.302, 14.776]
     49
     50}}}
     51
     52We decided to choose YAML as a data format since we believe it is the best
     53compromise between human readable and machine parsable. YAML supports binary
     54data allowing us to store also packet dumps inside of reports.
     55