Opened 3 years ago

Last modified 4 days ago

#19332 merge_ready enhancement

Add a BridgeDB module

Reported by: mrphs Owned by: karsten
Priority: Medium Milestone:
Component: Metrics/CollecTor Version:
Severity: Normal Keywords: metrics-roadmap-2019-q2, anti-censorship-roadmap-september, s30-o21a1
Cc: asn, metrics-team, gaba, cohosh, ahf, dgoulet, phw Actual Points:
Parent ID: #31274 Points: 8
Reviewer: irl Sponsor: Sponsor30-must

Description (last modified by irl)

While talking about making a PT infographic and interesting stats to point out, asn pointed out that there are no publicly available BridgeDB stats.

This can be useful in many different fronts, such as seeing the balance between usage and available resources.

This ticket is about adding a CollecTor module that will archive the stats exposed in #9316. The actual stats and format should be discussed in #9316 and may benefit from the discussion in #29315.

Child Tickets

Change History (37)

comment:1 Changed 9 months ago by gaba

Owner: isis deleted
Points: 1
Sponsor: Sponsor19
Status: newassigned

comment:2 Changed 9 months ago by karsten

Cc: metrics-team added

This looks like a duplicate of #9316. Should we close it?

Or is this ticket supposed to cover the publish part of the statistics exported in #9316? If so, we should move it to Metrics/CollecTor and set it to needs_information until there's something to publish.

comment:3 Changed 9 months ago by gaba

Component: Obfuscation/BridgeDBMetrics/CollecTor
Parent ID: #9316
Status: assignedneeds_information

Yes. I'm doing that and setting #9316 as a parent.

comment:4 Changed 9 months ago by gaba

Cc: gaba cohosh added

comment:5 Changed 9 months ago by notirl

Owner: set to metrics-team
Status: needs_informationassigned

comment:6 Changed 9 months ago by notirl

Status: assignedneeds_information

comment:7 Changed 8 months ago by irl

Cc: ahf dgoulet phw added
Description: modified (diff)
Keywords: metrics-roadmap-2019-q2 added; metrics removed
Parent ID: #9316
Points: 18
Status: needs_informationnew
Summary: Publish BridgeDB statsAdd a BridgeDB module
Type: defectenhancement

comment:8 Changed 5 months ago by gaba

Keywords: ex-sponsor-19 added

Adding the keyword to mark everything that didn't fit into the time for sponsor 19.

comment:9 Changed 5 months ago by phw

Sponsor: Sponsor19Sponsor30-must

Moving from Sponsor 19 to Sponsor 30.

comment:10 Changed 2 months ago by karsten

#9316 has been resolved recently which unblocks this ticket, AIUI.

comment:11 in reply to:  10 ; Changed 2 months ago by phw

Replying to karsten:

#9316 has been resolved recently which unblocks this ticket, AIUI.


Yes. How would you like me to expose the metrics file on BridgeDB's host? Should it be available over HTTPS? Or do you want me to rsync it to another host?

comment:12 in reply to:  11 ; Changed 8 weeks ago by karsten

Replying to phw:

Replying to karsten:

#9316 has been resolved recently which unblocks this ticket, AIUI.


Yes. How would you like me to expose the metrics file on BridgeDB's host? Should it be available over HTTPS? Or do you want me to rsync it to another host?

Is there anything sensitive in the file that would have to be sanitized on the CollecTor host? If so, we should rsync it over ssh to colchicifolium. But if not, the preferred way would be to expose it on the BridgeDB host, so that others can fetch it, too.

Here's another question, similar to the one about Snowflake stats: Would it be possible to expose more than just the latest BridgeDB statistics? Something like 7 or 14 days, or if it's not much data, everything until it gets too big?

comment:13 in reply to:  12 Changed 8 weeks ago by phw

Replying to karsten:

Replying to phw:

Replying to karsten:

#9316 has been resolved recently which unblocks this ticket, AIUI.


Yes. How would you like me to expose the metrics file on BridgeDB's host? Should it be available over HTTPS? Or do you want me to rsync it to another host?

Is there anything sensitive in the file that would have to be sanitized on the CollecTor host? If so, we should rsync it over ssh to colchicifolium. But if not, the preferred way would be to expose it on the BridgeDB host, so that others can fetch it, too.


There's nothing sensitive. We're doing the sanitisation ourselves and have published the data before. I'll look into exposing the files over our apache.

Here's another question, similar to the one about Snowflake stats: Would it be possible to expose more than just the latest BridgeDB statistics? Something like 7 or 14 days, or if it's not much data, everything until it gets too big?


Yes, that's definitely feasible. One week worth of data shouldn't be more than ~100 KB. I'll look into logrotate, so we can expose multiple weeks worth of data at any given time.

comment:14 Changed 7 weeks ago by karsten

Any updates here? If possible, I'd like to work on this next week. Thanks!

comment:15 in reply to:  14 Changed 7 weeks ago by phw

Replying to karsten:

Any updates here? If possible, I'd like to work on this next week. Thanks!


Not yet. I'll try to get it done by early next week.

comment:16 Changed 7 weeks ago by gaba

Keywords: anti-censorship-roadmap-september added; ex-sponsor-19 removed

comment:17 Changed 7 weeks ago by gaba

Owner: changed from metrics-team to phw
Status: newassigned

comment:18 in reply to:  12 Changed 7 weeks ago by phw

Status: assignedneeds_information

Replying to karsten:

Replying to phw:

Replying to karsten:

#9316 has been resolved recently which unblocks this ticket, AIUI.


Yes. How would you like me to expose the metrics file on BridgeDB's host? Should it be available over HTTPS? Or do you want me to rsync it to another host?

Is there anything sensitive in the file that would have to be sanitized on the CollecTor host? If so, we should rsync it over ssh to colchicifolium. But if not, the preferred way would be to expose it on the BridgeDB host, so that others can fetch it, too.


Change of plan: Can we instead rsync BridgeDB's metrics to colchicifolium? Weasel isn't a fan of the idea of exposing BridgeDB's metrics on polyanthum. If CollecTor is archiving the metrics anyway, we might as well just sync them to colchicifolium.

If you are ok with this, I just need a directory to sync the metrics to.

comment:19 Changed 7 weeks ago by karsten

Sure, that works, too. How about /srv/collector.torproject.org/collector/in/bridgedb-stats/? I'll have to edit a script on the receiving side, but feel free to set up the rsync on the sending side whenever you're ready.

comment:20 in reply to:  19 Changed 7 weeks ago by phw

Reviewer: cohosh
Status: needs_informationneeds_review

Replying to karsten:

Sure, that works, too. How about /srv/collector.torproject.org/collector/in/bridgedb-stats/? I'll have to edit a script on the receiving side, but feel free to set up the rsync on the sending side whenever you're ready.


Thanks. I updated the script on polyanthum's side. It will sync all available bridgedb-metrics.log files, including the rotated ones. The format of rotated files is the same as for assignments.log: bridgedb-metrics.log-YYYYMMDD.gz, e.g., bridgedb-metrics.log-20190905.gz. The file bridgedb-metrics.log is written once per day and also rotated once a day. For now, I configured logrotate to retain 30 rotated files, mostly as a precaution, so we don't lose data in case we run into trouble.

All the changes I made are in the task/19332 branch of my bridgedb-admin repository. Here's a patch. Cecylia, can you please review these changes when you get a chance?

comment:21 Changed 6 weeks ago by cohosh

Status: needs_reviewmerge_ready

These changes look good to me.

Since all logs (including previously rotated ones) are synced each time with rsync, is there a way to detect if old logs have been corrupted and are overwriting the previously synced logs? Not sure how we want to handle a case where logs that have previously been synced have changed for some reason, or what the easiest way to deal with this is.

comment:22 in reply to:  21 ; Changed 5 weeks ago by karsten

Replying to cohosh:

These changes look good to me.

Okay, please let me know when I need to do something on colchicifolium's side.

Since all logs (including previously rotated ones) are synced each time with rsync, is there a way to detect if old logs have been corrupted and are overwriting the previously synced logs? Not sure how we want to handle a case where logs that have previously been synced have changed for some reason, or what the easiest way to deal with this is.

Thanks for thinking about such problems beforehand. In this case I think it's fine to just rsync what's on the BridgeDB host to colchicifolium. We can still decide on colchicifolium to not overwrite previously imported statistics, which I think is what we do with all other files. I'd say let's give it a try, and we can change this later if this turns out to be an issue.

comment:23 in reply to:  22 ; Changed 5 weeks ago by phw

Replying to karsten:

Okay, please let me know when I need to do something on colchicifolium's side.


The rsync is done from our side. We are already trying to sync data to /srv/collector.torproject.org/collector/in/bridgedb-stats/.

comment:24 in reply to:  21 Changed 5 weeks ago by phw

Owner: changed from phw to karsten
Status: merge_readyassigned

Replying to cohosh:

These changes look good to me.


Merged to master.

Since all logs (including previously rotated ones) are synced each time with rsync, is there a way to detect if old logs have been corrupted and are overwriting the previously synced logs? Not sure how we want to handle a case where logs that have previously been synced have changed for some reason, or what the easiest way to deal with this is.


I left the patch as it is according to Karsten's suggestion. I also reassigned the ticket to Karsten because BridgeDB's side is looking good now.

comment:25 in reply to:  23 Changed 5 weeks ago by karsten

Replying to phw:

Replying to karsten:

Okay, please let me know when I need to do something on colchicifolium's side.


The rsync is done from our side. We are already trying to sync data to /srv/collector.torproject.org/collector/in/bridgedb-stats/.

Hmm, I changed something on colchicifolium's side to accept the rsync, but it doesn't seem to work. Is there anything in the log on your side?

comment:26 Changed 5 weeks ago by phw

Here's our latest rsync attempt:

1568738247:metrics:<36>Sep 17 16:37:27 collector-ssh-wrap[29644]: The SSH_ORIGINAL_COMMAND ('rsync --server -logDtpre.iLsfxC . /srv/collector.torproject.org/collector/in/bridgedb-stats/') is not on the whitelist
1568738247:metrics:rsync: connection unexpectedly closed (0 bytes received so far) [sender]
1568738247:metrics:rsync error: error in rsync protocol data stream (code 12) at io.c(235) [sender=3.1.2]

It looks like the same issue as the one we recently solved in #31515.

comment:27 Changed 5 weeks ago by karsten

Gah, I misspelled the directory name in one place. Fixed, files are coming in as they should. Thanks!

I probably asked this before and forgot the answer: where would I find the spec for parsing the format? Thanks in advance!

comment:28 in reply to:  27 ; Changed 5 weeks ago by phw

Replying to karsten:

I probably asked this before and forgot the answer: where would I find the spec for parsing the format? Thanks in advance!


The spec is still on my todo list. Here's a quick-and-dirty summary, so you don't need to block on me:

  • A metrics file starts with bridgedb-stats-end YYYY-MM-DD HH:MM:SS (SECS s)
  • The second line determines the version of the format. It currently is bridgedb-stats-version 1.0
  • From here on, we have multiple bridgedb-metric-count lines. They are structured as follows:
    • bridgedb-metric-count DIST.PROTO.CC.[success|fail].none NUM
    • DIST is BridgeDB's distribution mechanism, which currently is http, email, or moat.
    • PROTO is the obfuscation protocol, which currently is obfs2, obfs3, obfs4, scramblesuit, or fte.
    • CC is our two-letter country code.
    • The second-to-last field is either success or fail depending on if the BridgeDB request succeeded or not.
    • The last field is currently none but will eventually be a anomaly score, perhaps normalised to [0, 1]. I would suggest to ignore it for now.
    • NUM is the approximate number of requests, rounded to the next multiple of 10.

I'll follow up with the spec once it's done. I hope this is good enough to make progress in the meanwhile.

comment:29 in reply to:  28 ; Changed 5 weeks ago by karsten

Reviewer: cohoshirl
Status: assignedneeds_review

Replying to phw:

I'll follow up with the spec once it's done. I hope this is good enough to make progress in the meanwhile.

Yes, that's a fine start. Thanks!

I wrote some code. irl, please review metrics-lib commit 820c246 and CollecTor commit 3866164.

comment:30 in reply to:  28 Changed 5 weeks ago by phw

Replying to phw:

I'll follow up with the spec once it's done. I hope this is good enough to make progress in the meanwhile.


For what it's worth, we now have a draft of the spec over at #31780.

Also, the spec isn't carved in stone, so please let me know if you would like to add or improve anything and I'll get right to it.

comment:31 in reply to:  29 Changed 5 weeks ago by phw

Replying to karsten:

Replying to phw:

I'll follow up with the spec once it's done. I hope this is good enough to make progress in the meanwhile.

Yes, that's a fine start. Thanks!


I forgot to mention that CC is actually CC/EMAIL. Our draft spec explains the details. Sorry about that.

comment:32 Changed 4 weeks ago by gaba

Parent ID: #31274

comment:33 Changed 4 weeks ago by gaba

Keywords: s30-a1 added

comment:34 Changed 4 weeks ago by gaba

Keywords: s30-o21a1 added; s30-a1 removed

comment:35 Changed 3 weeks ago by karsten

Status: needs_reviewneeds_revision

Moving to needs_revision because of pending spec changes in #31780.

comment:36 Changed 3 weeks ago by karsten

Status: needs_revisionneeds_review

irl, please review metrics-lib commit 2e6d689 and CollecTor commit 2e6d689 with updates to the BridgeDB spec changes.

comment:37 Changed 4 days ago by karsten

Status: needs_reviewmerge_ready

Setting to merge_ready as per irl's statement during yesterday's meeting: "15:41:20 <irl> i think it's ok to merge it and i can review the test cases retroactively"

Note: See TracTickets for help on using tickets.