Opened 6 years ago

Closed 2 years ago

#13566 closed task (wontfix)

Work on script that copies the reports from the collectors to the /raw directory

Reported by: hellais Owned by: hellais
Priority: Medium Milestone:
Component: Archived/Ooni Version:
Severity: Normal Keywords: ooni_data_analytics_team, archived-closed-2018-07-04
Cc: asn, sysrqb, kudrom, aagbsn, infinity0, joelanders, otr, shidash, david415, dawuud Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


When reports are submitted to a collector they end up in some various different locations:

1) They are in the local directory /data/bouncer-XXXX/archive/
2) They are in the local directory /data/collector/archive/
3) They are on a remote host in some directory XXX

The task here is to write a script that moves the reports from there into the /data/raw directory so that they are ready to go through the data processing pipeline.

Child Tickets

Change History (5)

comment:1 Changed 6 years ago by hellais

Cc: david415 added

comment:2 Changed 6 years ago by hellais

Cc: dawuud added

comment:3 Changed 6 years ago by cypherpunks

# -*- coding: utf-8 -*-

"""Script to move the reports generated by ooni to a "raw" directory, either in
a remote collector or from a remote collector to local.
author = "fucking somebody"
copyright = "GPL v3"

import os.path
import subprocess
from settings import *

data_bouncer_path = os.path.join(src_path, data_bouncer_rel_path)
data_col_path = os.path.join(src_path, data_col_rel_path)
raw_path = os.path.join(dst_path, raw_rel_path)

src_paths = ' '.join([data_bouncer_path, data_col_path])

path_args =[remote_user + "@" + remote_host +":" + src_paths, raw_path]
print "Running rsync with " + str(path_args)
r = + rsync_args + path_args)
if r == 0:

print "Sucessfully moved reports "


print "Error trying to move reports"

# remove the empty directories
# find src_path -depth -type d -empty -delete
# or remove the reports root directory
# rm -rf data

# settings for

# ooni report directories are currently:
# data/bouncer-*/archive/
# data_col_rel_path = 'data/collector/archive/'

data_bouncer_rel_path = 'data/bouncer-*/archive/'
data_col_rel_path = 'data/collector/archive/'
raw_rel_path = 'raw'

src_path = '/'
dst_path = '/'

#just for testing
import os.path
src_path = os.path.realpath('.')
dst_path = os.path.realpath('.')

rsync_args=["-rv", "--remove-source-files"]
remote_user = 'user'
# if the report data is in localhost, just use localhost here
# and this will do rsync in local
remote_host = 'localhost'

comment:4 Changed 3 years ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

comment:5 Changed 2 years ago by teor

Keywords: archived-closed-2018-07-04 added
Resolution: wontfix
Status: newclosed

Close all tickets in archived components

Note: See TracTickets for help on using tickets.