wiki:org/meetings/2018MexicoCity/Notes/OONIDataFormatPain

Arturo introducts the ooni-specs: https://github.com/ooni/spec.

Master ticket: https://github.com/ooni/spec/issues/74

Migrated from yaml to json during their lifetime, when the data is uploaded even with an old version, will be converted in the same format. Some data is not rapresented with the best format, there is a need to come with new and better specs after 6 years.

Format pain can be strings --

Collecting feedback from users and understand if coming with a new major data format is needed, as it's not a quick task because we have to mantain compatibility with old data formats.

How much space does ooni data takes? 15TB data uncompressed

Are you going to revision the decision of storing text data formatted data? There are 3 formats

  • Data sent by OONI compatible client
  • OONI Pipeline processing
  • Data exposed to the public for downloading data

Reviewing the main issues https://github.com/ooni/spec/issues/74

Other issues:

  • Referencing another measurement inside the data format
  • Collecting better location data, not to rely only on GeoIP, maybe using GPS on mobile

Data should be processed on mobile to don't sent exact coordinate but zone

  • Allowing people to sign measurements, but then the binary data can't be modified

The setting should be really clear to the user, because he can be stamped and followed across networks and countries. Question if would be possible to mark measurement of an user but not make them available to the public (as marked) but only trought the user's account. This is not easy but should be discussed more in deep, expecially the various use cases. Inverse problem, understanding if two similar measurement from the same network comes from the same "user".

The way to get and join together all user's measurement is not user based or device based but with gathering rich information about the network property. In the revamped mobile app we already store some network releated data. We don't want to publish user data that can help profile users (office network, home network...) Currently on the app we have the identity per-device, not per-user. We'll have a desktop app and some user will want to have all their data aggregated.

Pool of user where many people can partecipate (previous identification) and send data all to the same pool (ex Venezuela Inteligente)

Last point about the data format, standardize best practices for new tests ex. use case in every test, communicate if there was a failure (state of the test) and what tipe of failure (reporting error) at the moment we have certain fields that are either null, boolean or string and this is a big pain that triggered the need of an OONI specs best practice

Two issues from sbs: 1) We don't have a mechanism to submit logs or stuff that don't go in the report 2) The collector rejects the report that don't comply with the specs, do we want to keep them? Suggestion: Drop filenames as a context.

Last modified 2 months ago Last modified on Sep 30, 2018, 8:45:09 PM