Opened 4 years ago

Closed 4 years ago

#13563 closed task (implemented)

Implement scripts for sanitising the reports

Reported by: hellais Owned by: otr
Priority: Medium Milestone:
Component: Archived/Ooni Version:
Severity: Keywords: ooni_data_analytics_team
Cc: asn, sysrqb, kudrom, aagbsn, infinity0, joelanders, otr, shidash, david415, dawuud Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

What this script needs to take as input is a directory where some reports that are in the RAW state.

It should strip all information that should not be published from the reports, archive a compressed copy of the RAW reports (ideally also encrypting them with a public key specified) and copying the clean ones to the /data/sanitized directory.

In particular what should be implemented in the first iteration is the replacement of the bridge IP with the fingerprint if it's part of bridge_db. See https://github.com/hellais/datanalytics/blob/master/process.py#L69 for an idea of what needs to be done to it.

Child Tickets

Change History (5)

comment:1 Changed 4 years ago by hellais

Cc: otr added

comment:2 Changed 4 years ago by hellais

Cc: shidash added

comment:3 Changed 4 years ago by hellais

Cc: david415 dawuud added

comment:4 Changed 4 years ago by otr

Owner: changed from hellais to otr
Status: newassigned

comment:5 Changed 4 years ago by hellais

Resolution: implemented
Status: assignedclosed

otr did a great job at implementing this and I reviewed his patch: https://github.com/TheTorProject/ooni-pipeline/blob/master/tasks/sanitise.py.

I reviewed the PR and merged it. This can be closed.

Note: See TracTickets for help on using tickets.