Changes between Initial Version and Version 1 of org/meetings/2018MexicoCity/Notes/OONIDataFormatPain

Sep 30, 2018, 8:45:09 PM (9 months ago)



  • org/meetings/2018MexicoCity/Notes/OONIDataFormatPain

    v1 v1  
     1Arturo introducts the ooni-specs:
     3Master ticket:
     5Migrated from yaml to json during their lifetime, when the data is uploaded even with an old version, will be converted in the same format.
     6Some data is not rapresented with the best format, there is a need to come with new and better specs after 6 years.
     8Format pain can be strings --
     10Collecting feedback from users and understand if coming with a new major data format is needed, as it's not a quick task because we have to mantain compatibility with old data formats.
     12How much space does ooni data takes?
     1315TB data uncompressed
     15Are you going to revision the decision of storing text data formatted data?
     16There are 3 formats
     17- Data sent by OONI compatible client
     18- OONI Pipeline processing
     19- Data exposed to the public for downloading data
     21Reviewing the main issues
     23Other issues:
     24- Referencing another measurement inside the data format
     25- Collecting better location data, not to rely only on GeoIP, maybe using GPS on mobile
     26Data should be processed on mobile to don't sent exact coordinate but zone
     27- Allowing people to sign measurements, but then the binary data can't be modified
     28The setting should be really clear to the user, because he can be stamped and followed across networks and countries.
     29Question if would be possible to mark measurement of an user but not make them available to the public (as marked) but only trought the user's account.
     30This is not easy but should be discussed more in deep, expecially the various use cases.
     31Inverse problem, understanding if two similar measurement from the same network comes from the same "user".
     33The way to get and join together all user's measurement is not user based or device based but with gathering rich information about the network property.
     34In the revamped mobile app we already store some network releated data.
     35We don't want to publish user data that can help profile users (office network, home network...)
     36Currently on the app we have the identity per-device, not per-user. We'll have a desktop app and some user will want to have all their data aggregated.
     38Pool of user where many people can partecipate (previous identification) and send data all to the same pool (ex Venezuela Inteligente)
     40Last point about the data format, standardize best practices for new tests
     41ex. use case in every test, communicate if there was a failure (state of the test) and what tipe of failure (reporting error)
     42at the moment we have certain fields that are either null, boolean or string and this is a big pain that triggered the need of an OONI specs best practice
     44Two issues from sbs:
     451) We don't have a mechanism to submit logs or stuff that don't go in the report
     462) The collector rejects the report that don't comply with the specs, do we want to keep them?
     47Suggestion: Drop filenames as a context.