wiki:doc/OONI/DataCollection

All the data collected by OONI-probe must be open and accessible. We will do the effort to not redact any data that is contained in the logs, unless it's content could potentially lead to identification of users and lead to privacy leaks.

The data format for the reports is YAMLOONI. Every report must contain as a bare minimum the following information:

  • Timestamp of start and end of test
  • ASN from which the test originated
  • The address from where the test was run (optional)
  • A two way traceroute from and to another OONI-probe node (optional)

The two way traceroute (doc/OONI/Tests/TwoWayTraceroute) should be done with source and destination ports set to 0, 21, 80, 123, 443, UDP, TCP and ICMP. If A is the host from which the test is being performed the host will pick a random OONI-probe node X and perform a traceroute to X. X will be signaled that they need to traceroute to A and they will run the same traceroute and send the result to A. This is useful to understand the topology of the network from which the test is being run from. This traceroute can potentially leak information about the fact that the user is running OONI-probe software and should be run once all of the tests that needed to be run have completed.

The timestamp format we use is that specified in RFC3339. All times are expressed in UTC.

This is an example of a YAMLOONI OONI-probe report:

# OONI Probe Report for Test httphost
# 18th of April 2012 18:00:00
---
test_name: httphost
asn: ASN-59395
addr: 198.51.100.1
start_time: 2012-04-18T18:00:00.00Z
---
start_time: 2012-04-18T18:00:00.00Z
end_time: 2012-04-18T18:00:02.12Z
result: {'thetestresult': 'data'}
---
start_time: 2012-04-18T18:00:00.00Z
end_time: 2012-04-18T18:00:03.12Z
result: {'thetestresult': 'data'}
# Test ended in 200s
---
end_time: 2012-04-18T18:00:00.00Z
traceroute:
- dst: 80
  src: 80
  tcp:
  - [10.0.2.1, 794.792, 6.323, 1.18]
  - [198.51.100.2, 25.092, 11.716, 44.371]
  - [203.0.113.5, 59.241, 12.302, 14.776]
  - [203.0.113.8, 59.241, 12.302, 14.776]
  timestamp: '2012-04-18T19:11:14'
  udp:
  - [10.0.2.1, 794.792, 6.323, 1.18]
  - [198.51.100.2, 25.092, 11.716, 44.371]
  - [203.0.113.5, 59.241, 12.302, 14.776]
  - [203.0.113.8, 59.241, 12.302, 14.776]

We decided to choose YAML as a data format since we believe it is the best compromise between human readable and machine parsable. YAML supports binary data allowing us to store also packet dumps inside of reports.

Last modified 7 years ago Last modified on May 17, 2012, 3:25:09 PM