Opened 2 years ago

Last modified 5 days ago

#21315 new enhancement

publish some realtime stats from the broker?

Reported by: arma Owned by:
Priority: Medium Milestone:
Component: Obfuscation/Snowflake Version:
Severity: Normal Keywords:
Cc: metrics-team, cohosh, phw Actual Points:
Parent ID: #29461 Points:
Reviewer: Sponsor: Sponsor19

Description

How many snowflakes are there registered right now and happy to serve censored users?

Right now there's a big difference between 0 and 1, and it's not easy to figure out which it is.

Knowing this number would help me as a snowflake volunteer decide whether I am needed, and whether to do advocacy at this moment to get other people to be snowflakes.

Knowing this number would help the censored users too, because it would give them a sense of the health of the snowflake population, and also it can help them debug their "it's not working, I wonder if I can narrow down some possible problems" situations.

Child Tickets

Change History (10)

comment:1 Changed 2 years ago by arma

I would also be interested in stats about users and usage (including e.g. number of users being handled divided by number of snowflakes handling them), but I recognize anything involving the users is a more complicated topic, and we shouldn't do things that could put users at risk without sorting through what we ought to protect and how we can make sure it's being protected.

So, step one, tell me more about the snowflakes please. :)

One other concrete thing that I want: how many times are you giving snowflakes out? How many times did you stop giving a snowflake out because you've given it out so many times already? These questions tie into the address distribution algorithm question: it's not clear how to pick the right parameters in a vacuum, but we're not *in* a vacuum, so maybe we can gain some intuition by seeing how things play out in practice.

comment:3 Changed 13 months ago by dcf

There's an undocumented /debug URL path that shows the currently connected snowflakes.

https://snowflake-reg.appspot.com/debug (App Engine broker that we plan to move away from)
https://snowflake-broker.bamsoftware.com/debug (standalone broker from #22874)

I'm not sure it's a good idea to publish this information in this form, but for what it's worth, that's how it works now.

There should be at least 3 snowflakes on each broker at all times, because we're specifically running some fallback proxy-go instances. Obviously these are no good from a circumvention point of view, because they're on a static IP address--they're mainly there so that curious people who try the snowflake option in the alpha browser aren't immediately discouraged.

comment:4 Changed 2 months ago by irl

Cc: metrics-team added
Parent ID: #29461

Metrics Team expects to produce the corresponding CollecTor module for this within the next 6-month roadmap.

comment:5 Changed 4 weeks ago by cohosh

Cc: cohosh added

Here's a summary of the current state of things:

Eventual Goals

It sounds like we have a few things we want to achieve/learn from collected metrics:

  • Detect censorship events
  • Allow current or potential proxies to see if they are needed
  • Allow clients to see whether their connection issues are due to censorship or proxy availability
  • Help us figure out whether we should be doing something different in distributing proxies to clients

What We Have

We current collect and "publish" information on:

  • how many snowflake are currently available along with their SIDs (available at broker /debug handler). This is good for more detailed monitoring of censorship events. Even though we collect bridge usage metrics, collecting broker usage metrics will narrow down where the censorship is happening.
  • country stats of domain-fronted client connections (logged, most recent snapshot at broker /debug)
  • the roundtrip time it takes for a client to connect to get a snowflake proxy answer (available at broker /debug)
  • the usual snowflake bridge statistics (at metrics.torproject.org)

What We Want

Some of the metrics mentioned above will be easier to implement than others. The best place to collect statistics is at the broker, but some of the data mentioned would require proxies to report metrics to the broker for collection. We have to be a bit careful with this since anyone can run a proxy. It will also impact the decisions we make for #29207.

I would also be interested in stats about users and usage (including e.g. number of users being handled divided by number of snowflakes handling them)

This is a bit tricky. The broker knows which proxies it hands out the users but doesn't know the state of the clients' connections to those proxies (e.g., when they have been closed). It's also worth noting that different "types" of proxies (standalone vs. browser-based) can handle a different amount of users at once. Perhaps a more useful metric would be for snowflake proxies to advertize to the broker how many available "slots/tokens" they have when they poll for clients. This could be added to the broker--proxy WebSocket protocol. It would also avoid collecting more data on clients which is generally safer

how many times are you giving snowflakes out? How many times did you stop giving a snowflake out because you've given it out so many times already? These questions tie into the address distribution algorithm question

The above comment addresses this as well. The broker doesn't really decide whether or not they've given a snowflake out too many times. I think more important to deciding whether we are giving out proxies in a good way is to try to measure how "reliable" individual proxies have been in the past. This is related to setting up persistent identifiers (#29260).

It might also be interesting to have some kind of proxy diversity metric (e.g., whether 90% of all connections are handled by the same proxy). We can get some idea with persistent identifiers (#29260), but of course using a persistent identifier will always be optional. We can also do collection of geoip country stats of proxies.

Next steps

  • Narrow down what we want
  • Address prerequisite tickets (#29207, #29260)
  • Log all of the statistics in a reasonable format
  • coordinate with the metrics team to get these metrics collected and visualized somewhere

comment:6 Changed 3 weeks ago by karsten

I noticed that metrics-team is cc'ed, but from reading the comments it seems like you're still at design discussions like the slots/tokens idea. When is a good time for the metrics team to get involved? Should we wait until you have a better idea what you want? Or should we help with the bikeshedding right now? :)

(FWIW, I didn't understand arma's "big difference between 0 and 1" comment in the summary, and I'm not 100% certain whether SID stands for Snowflake IDentifier or Somethingelse I Don'tknow.)

comment:7 Changed 13 days ago by gaba

Sponsor: Sponsor19

comment:8 in reply to:  6 Changed 12 days ago by cohosh

Replying to karsten:

I noticed that metrics-team is cc'ed, but from reading the comments it seems like you're still at design discussions like the slots/tokens idea. When is a good time for the metrics team to get involved? Should we wait until you have a better idea what you want? Or should we help with the bikeshedding right now? :)

I think we've still got a bit of work to do before we will know enough about where we want to go to include the metrics team. We have a few other tickets we need to cover first before we will even have the data we need at the broker for some of these metrics.

comment:9 in reply to:  5 Changed 12 days ago by irl

Replying to cohosh:

It sounds like we have a few things we want to achieve/learn from collected metrics:

  • Detect censorship events
  • Allow current or potential proxies to see if they are needed
  • Allow clients to see whether their connection issues are due to censorship or proxy availability
  • Help us figure out whether we should be doing something different in distributing proxies to clients

These all seem like good goals.

We current collect and "publish" information on:

  • how many snowflake are currently available along with their SIDs (available at broker /debug handler). This is good for more detailed monitoring of censorship events. Even though we collect bridge usage metrics, collecting broker usage metrics will narrow down where the censorship is happening.
  • country stats of domain-fronted client connections (logged, most recent snapshot at broker /debug)
  • the roundtrip time it takes for a client to connect to get a snowflake proxy answer (available at broker /debug)

Should we be already archiving this data?

Some of the metrics mentioned above will be easier to implement than others. The best place to collect statistics is at the broker, but some of the data mentioned would require proxies to report metrics to the broker for collection. We have to be a bit careful with this since anyone can run a proxy. It will also impact the decisions we make for #29207.

We collect a lot of statistics at relays and bridges, which anyone can run. We are working on methods of improving robustness against these statistics being manipulated, but so far have not detected anyone reporting values that are not normal. It is good to have criteria for determining, based on stats others report, what you would be expecting so that anomalies can be detected. For example, we would expect relay bandwidth usage among relays to be proportional to consensus weight.

I would also be interested in stats about users and usage (including e.g. number of users being handled divided by number of snowflakes handling them)

This is a bit tricky. The broker knows which proxies it hands out the users but doesn't know the state of the clients' connections to those proxies (e.g., when they have been closed). It's also worth noting that different "types" of proxies (standalone vs. browser-based) can handle a different amount of users at once. Perhaps a more useful metric would be for snowflake proxies to advertize to the broker how many available "slots/tokens" they have when they poll for clients. This could be added to the broker--proxy WebSocket protocol. It would also avoid collecting more data on clients which is generally safer

This sounds like a reasonable approach. You might want to take a look at:

This will give you an idea of how we do this for other parts of Tor.

how many times are you giving snowflakes out? How many times did you stop giving a snowflake out because you've given it out so many times already? These questions tie into the address distribution algorithm question

Can this also be an indirect measurement of number of users?

The above comment addresses this as well. The broker doesn't really decide whether or not they've given a snowflake out too many times. I think more important to deciding whether we are giving out proxies in a good way is to try to measure how "reliable" individual proxies have been in the past. This is related to setting up persistent identifiers (#29260).

For relays, directory authorities track the mean time between failures, and we track this in Tor Metrics too.

It might also be interesting to have some kind of proxy diversity metric (e.g., whether 90% of all connections are handled by the same proxy). We can get some idea with persistent identifiers (#29260), but of course using a persistent identifier will always be optional. We can also do collection of geoip country stats of proxies.

We don't really have this metric for relays yet, so if you have ideas that would be applicable to relays too then that would be great. We know about country/AS distribution, but we haven't quantified the diversity using any particular formula.

  • Log all of the statistics in a reasonable format

This would ideally be a format that Tor Metrics is already handling. If it could be based on the Tor directory protocol meta-format (§1.2 dir-spec) then that would be great. We don't want to bring in dependencies for parsing yaml/toml/etc. if we can help it.

  • coordinate with the metrics team to get these metrics collected and visualized somewhere

Please also coordinate on what you want to collect, so we can consider if that information already comes from somewhere, if we already had a plan for it, and if it is safe or not.

comment:10 Changed 5 days ago by phw

Cc: phw added
Note: See TracTickets for help on using tickets.