Make BridgeDB report internal metrics

Trac:
Parent Ticket: #31274 (moved)

added anti-censorship-roadmap-2020 component::circumvention/bridgedb owner::phw parent::31274 points::2 priority::medium reviewer::agix s30-o21a1 severity::normal sponsor::30-can status::needs-information type::enhancement labels

Trac:
Description: We're done with #9316 (moved), which means that we have code in place that allows BridgeDB to export metrics. So far, all metrics are user-centric, meaning that they focus on how BridgeDB users interact with the system. BridgeDB-centric metrics would help us debug and understand BridgeDB. The following come to mind:

Number of bridges per distribution ring.
Number of requests for which we had no bridges.

We could also incorporate bridge assignments in our metrics, so we don't have to report them separately in the assignments.log file (see #29480 (moved)).

to

We're done with #9316 (moved), which means that we have code in place that allows BridgeDB to export metrics. So far, all metrics are user-centric, meaning that they focus on how BridgeDB users interact with the system. BridgeDB-centric metrics would help us debug and understand BridgeDB. The following come to mind:

Number of bridges per distribution ring.
Number of bridges per transport, similar to assignments.log (originally proposed in #14453 (moved))
Number of requests for which we had no bridges.

We could also incorporate bridge assignments in our metrics, so we don't have to report them separately in the assignments.log file (see #29480 (moved)).

Trac:
Parent: N/A to #31274 (moved)
Keywords: N/A deleted, s30-o21a1 added

We briefly discussed this in today's anti-censorship meeting. Some additional metrics we may want to add:

The number of users a single bridge has been given to over time, i.e., how long to give to 10 users, how long to give to 100, how long to give to 1000, etc.
Once we have the ability for BridgeDB to test if bridges are down (see #31874 (moved)), it would be nice to know how reliable our bridges are (how much uptime they have or how many are currently working).
A way to measure if the bridge is reachable from certain locations (see #32740 (moved)).
The number of IPv4/IPv6 requests.

Trac:
Description: We're done with #9316 (moved), which means that we have code in place that allows BridgeDB to export metrics. So far, all metrics are user-centric, meaning that they focus on how BridgeDB users interact with the system. BridgeDB-centric metrics would help us debug and understand BridgeDB. The following come to mind:

Number of bridges per distribution ring.
Number of bridges per transport, similar to assignments.log (originally proposed in #14453 (moved))
Number of requests for which we had no bridges.

We could also incorporate bridge assignments in our metrics, so we don't have to report them separately in the assignments.log file (see #29480 (moved)).

to

We're done with #9316 (moved), which means that we have code in place that allows BridgeDB to export metrics. So far, all metrics are user-centric, meaning that they focus on how BridgeDB users interact with the system. BridgeDB-centric metrics would help us debug and understand BridgeDB. The following come to mind:

Number of bridges per distribution ring.
Number of bridges per transport, similar to assignments.log (originally proposed in #14453 (moved))
Number of requests for which we had no bridges.

We could also incorporate bridge assignments in our metrics, so we don't have to report them separately in the assignments.log file (see #29480 (moved)). Let's not forget to update BridgeDB's metrics specification.

Trac:
Keywords: N/A deleted, anti-censorship-roadmap-2020Q1 added

Trac:
Cc: phw to phw, cohosh, metrics-team

Trac:
Owner: N/A to phw
Status: new to assigned

Trac:
Keywords: N/A deleted, metrics-team-roadmap-2020April added

FYI, I have work-in-progress code for this in my enhancement/31422 branch.

Changing keyword to keep it in the metrics roadmap for when is ready.

Trac:
Keywords: metrics-team-roadmap-2020April, metrics deleted, metrics-team-roadmap-2020 added

Trac:
Keywords: anti-censorship-roadmap-2020Q1, metrics-team-roadmap-2020 deleted, anti-censorship-roadmap-2020 added

I think it's time for a review of what I've done so far: https://github.com/NullHypothesis/bridgedb/compare/enhancement/31422

Here are the internal metrics that the patch is currently capturing:

Number of IPv4/IPv6 requests.
Min, max, median, and stdev of the number of users that bridges were handed out to.
The number of empty responses per distributor.
The number of bridges per (sub)hashring.

In the meanwhile, I'll spend some more time thinking about the other metrics suggestions in this ticket.

Trac:
Status: assigned to needs_review

Replying to phw:

I think it's time for a review of what I've done so far: https://github.com/NullHypothesis/bridgedb/compare/enhancement/31422

I took a brief look at the new metrics captured by your patch:

Here are the internal metrics that the patch is currently capturing:

Number of IPv4/IPv6 requests.

You're already counting lots of requests and reporting binned numbers, so this should be fine.

Min, max, median, and stdev of the number of users that bridges were handed out to.

I don't see any privacy issues with computing and reporting these four statistics.

I'm less sure about how useful they will be. The median will likely be the most interesting statistic here, but the min and max will only tell you about the smallest and largest outliers but not tell you much about how the distribution looks like. Not sure how useful the standard deviation will be.

Would it be an option to add quantiles? Your comment suggests that you'd have to require Python 3.8 in order to use the quantiles() function of the built-in statistics module. But did you consider using SciPy/NumPy to compute these? However, if neither of those is an option, I'd recommend against computing quantiles yourself, because there are just too many ways to screw up.

If you have quantiles, you might want to include first and third quartile as well as smallest and largest non-outliers within 1.5 inter-quartile ranges from the median. That's the five values you'd also find in a boxplot. We're computing these five values in our OnionPerf latency statistics. Here's the SQL code that we use. (I don't think we have Python code around for computing the high and low values.)

If you want to start with somewhat simpler statistics, be sure to include first and third quartile together with the median. You could always add the high and low values later if you need them.

The number of empty responses per distributor.

The number of bridges per (sub)hashring.

Like the first number, I don't see an issue with reporting these binned numbers.

In the meanwhile, I'll spend some more time thinking about the other metrics suggestions in this ticket.

Let me know if you want me to take another look!

Agix generously agreed to review this.

Trac:
Reviewer: N/A to agix

Nice! In general for these metrics situations, my advice is to try to frame it in terms of questions you want to learn the answers to, rather than data sources you could track.

So for example, less "how many bridges have property X" and more "how many of the people trying to get a bridge with property X are getting a working one?"

Or said another way, if we just track data sources, then it's likely to remain unclear how to actually assess whether we're succeeding.

Backing up even farther: it would be great to pick some (quantifiable) goals and then set about figuring out what we need to measure to know how we're doing at achieving them.

Replying to phw:

I think it's time for a review of what I've done so far: https://github.com/NullHypothesis/bridgedb/compare/enhancement/31422 \ Nothing to add, code looks nice & clean. Ready to merge.

Trac:
Status: needs_review to merge_ready

Replying to karsten:

I'm less sure about how useful they will be. The median will likely be the most interesting statistic here, but the min and max will only tell you about the smallest and largest outliers but not tell you much about how the distribution looks like. Not sure how useful the standard deviation will be.

Would it be an option to add quantiles? Your comment suggests that you'd have to require Python 3.8 in order to use the quantiles() function of the built-in statistics module. But did you consider using SciPy/NumPy to compute these? However, if neither of those is an option, I'd recommend against computing quantiles yourself, because there are just too many ways to screw up.

If you have quantiles, you might want to include first and third quartile as well as smallest and largest non-outliers within 1.5 inter-quartile ranges from the median. That's the five values you'd also find in a boxplot. We're computing these five values in our OnionPerf latency statistics. Here's the SQL code that we use. (I don't think we have Python code around for computing the high and low values.)

Thanks for the feedback! I removed the standard deviation and added the four metrics you suggest: 1st and 3rd quartile, and the upper and lower whiskers. Here's the patch. I used numpy to determine the quartiles. I originally hesitated to add yet another dependency – especially a bulky one like numpy – but we can remove it again once Python 3.8 (which has built-in support for quantiles) is available in Debian stable.

On an unrelated note: Karsten, do we need to coordinate on when we deploy this patch? Note that the patch bumps the key bridgedb-metrics-version to 2 and adds several new fields for our internal metrics. Does this break anything on the metrics side of things?

Trac:
Status: merge_ready to needs_information

Make BridgeDB report internal metrics

Child items 0

Activity