Opened 5 years ago

Closed 16 months ago

#13566 closed task (wontfix)

Work on script that copies the reports from the collectors to the /raw directory

Reported by: hellais Owned by: hellais
Priority: Medium Milestone:
Component: Archived/Ooni Version:
Severity: Normal Keywords: ooni_data_analytics_team, archived-closed-2018-07-04
Cc: asn, sysrqb, kudrom, aagbsn, infinity0, joelanders, otr, shidash, david415, dawuud Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

When reports are submitted to a collector they end up in some various different locations:

1) They are in the local directory /data/bouncer-XXXX/archive/
2) They are in the local directory /data/collector/archive/
3) They are on a remote host in some directory XXX

The task here is to write a script that moves the reports from there into the /data/raw directory so that they are ready to go through the data processing pipeline.

Child Tickets

Change History (5)

comment:1 Changed 5 years ago by hellais

Cc: david415 added

comment:2 Changed 5 years ago by hellais

Cc: dawuud added

comment:3 Changed 5 years ago by cypherpunks

reports2raw.py


#!/usr/bin/python
# -*- coding: utf-8 -*-

"""Script to move the reports generated by ooni to a "raw" directory, either in
a remote collector or from a remote collector to local.
"""
author = "fucking somebody"
copyright = "GPL v3"

import os.path
import subprocess
from settings import *

data_bouncer_path = os.path.join(src_path, data_bouncer_rel_path)
data_col_path = os.path.join(src_path, data_col_rel_path)
raw_path = os.path.join(dst_path, raw_rel_path)

src_paths = ' '.join([data_bouncer_path, data_col_path])

path_args =[remote_user + "@" + remote_host +":" + src_paths, raw_path]
print "Running rsync with " + str(path_args)
r = subprocess.call(rsync? + rsync_args + path_args)
if r == 0:

print "Sucessfully moved reports "

else:

print "Error trying to move reports"

# TODO:
# remove the empty directories
# find src_path -depth -type d -empty -delete
# or remove the reports root directory
# rm -rf data

settings.py


# settings for reports2raw.py

# ooni report directories are currently:
# data/bouncer-*/archive/
# data_col_rel_path = 'data/collector/archive/'

data_bouncer_rel_path = 'data/bouncer-*/archive/'
data_col_rel_path = 'data/collector/archive/'
raw_rel_path = 'raw'

src_path = '/'
dst_path = '/'

#just for testing
import os.path
src_path = os.path.realpath('.')
dst_path = os.path.realpath('.')

rsync_args=["-rv", "--remove-source-files"]
remote_user = 'user'
# if the report data is in localhost, just use localhost here
# and this will do rsync in local
remote_host = 'localhost'

comment:4 Changed 23 months ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

comment:5 Changed 16 months ago by teor

Keywords: archived-closed-2018-07-04 added
Resolution: wontfix
Status: newclosed

Close all tickets in archived components

Note: See TracTickets for help on using tickets.