Opened 5 months ago

Last modified 4 weeks ago

#32938 needs_revision enhancement

Have a way to test throughput of snowflake proxy

Reported by: cohosh Owned by: cohosh
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords: snowflake-webextension, ux-team, anti-censorship-roadmap-2020
Cc: arlolra, cohosh, phw, dcf Actual Points: 3
Parent ID: #31109 Points: 5
Reviewer: Sponsor:

Description

A common question from snowflake proxy volunteers is whether or not their proxy is working (see comment 11 on #31109). It would be great to have some kind of bandwidth test for proxy owners to see whether or not their proxy is reachable from a remote probe point. This might also help us find and diagnose problems with existing proxies.

Some notes:

  • we can't ask the broker to assign us a specific proxy at the moment so this test would likely be separate from the broker (unless we add an entirely new feature which I'm hesitant to do)
  • we'll have to protect this service from abuse somehow, probably by rate-limiting. See some discussion on #31874. It would be best to engineer a way so that only a proxy owner can run the test on their proxy.

Child Tickets

Change History (13)

comment:1 Changed 5 months ago by cohosh

The first question to answer is where to do the bandwidth test from. Here are some options:

  • Have the broker do it. This would require adding WebRTC libraries to the broker which would significantly impact the size
  • Have a different endpoint do it and either hardcode this endpoint at the proxies or have the broker facilitate communicating with the probe point. This way, the endpoint could be behind a NAT.
  • Measure throughput of real traffic that passes through the proxy. This has a potential impact on client privacy and we should be careful of how we do this.

comment:2 in reply to:  1 Changed 5 months ago by cohosh

Replying to cohosh:

  • Measure throughput of real traffic that passes through the proxy. This has a potential impact on client privacy and we should be careful of how we do this.

Another thing to consider here is that measuring throughput from real traffic won't give a good idea of the maximum throughput, just the amount of data any particular client is pushing through it.

comment:3 Changed 5 months ago by cohosh

I had a thought about how this might also be used for better overall network health (particularly as a potential solution for #25681). I could envision a browser proxy peforming the following set of steps at startup:

  • Do the various checks we already have for WebRTC permissions and a probe of the Snowflake bridge
  • Request a throughput check from a probe point we run (perhaps we can use the same machine that hosts bridgestrap from #31874, which is conveniently already written in Go)
  • The probe point will craft an offer and exchange SDP information with the proxy and then proceed with a download/upload test. The probe point will gather throughput statistics from this test, sign them, and hand them to the proxy.
  • When the proxy polls the broker, it sends the signed throughput statistics in the poll. The broker can then either priortize proxies based on their throughput or use an implementation of #25598 to inform the proxy how often to poll based on their throughput and the current need

comment:4 Changed 5 months ago by cohosh

Points: 3

Setting the points on this to 3 days

comment:5 Changed 5 months ago by cohosh

Actual Points: 2
Points: 35

This is a pretty big task, updating points.

comment:6 Changed 4 months ago by cohosh

Actual Points: 23

This is a start at implementing this feature. I chose to extend the bridgestrap API to take requests for Snowflake tests: https://dip.torproject.org/cohosh/bridgestrap/tree/ticket/32938

And made some corresponding changes to proxy-go that will perform the throughput test on startup: https://github.com/cohosh/snowflake/tree/ticket/32938

It still needs a lot of work. What I have left to do is:

  • calculate the round trip latency (between sending a message and receiving the echo
  • implement this feature for webextension proxies (including a UI)
  • clean up the code and commits a bit
  • perform the test every so often (perhaps 24 hours?)

This is out of scope for this ticket, but I like the idea in comment:3 of using it to tell the proxy how often to poll.

comment:7 Changed 4 months ago by gaba

Keywords: anti-censorship-roadmap-2020Q1 added; anti-censorship-roadmap-october removed

comment:8 Changed 3 months ago by cohosh

Current status: https://github.com/cohosh/snowflake/compare/ticket/32938

Okay I made some more progress on this:

  • proxy-go now performs the test every 24 hours
  • did a refactor of the proxy-go code

What I want to do before requesting a review is:

  • do another refactor pass
  • clean up the commit history
  • implement an average latency calculation

After that I can start working on the webextension changes and UI

comment:9 Changed 3 months ago by cohosh

Status: assignedneeds_review

Okay this is ready for a partial review. This only implements the throughput test for proxy-go and right now doesn't do anything with the results except log them.

Here's the changes to proxy-go: https://github.com/cohosh/snowflake/pull/19

And the changes to bridgestrap, which I'm using for the probe point: https://dip.torproject.org/cohosh/bridgestrap/merge_requests/1

To test locally

Start the probe point: ./bridgestrap -addr :8888
Start a proxy: ./proxy-go -probe http://localhost:8888

Notes

  • The reported latencies are quite high, not sure whether this is a result of the webrtc library or if the latency calculation is too slow because of locks and the select call
  • We have the same concerns as #31874 with rate limiting and preventing abuse

comment:10 Changed 3 months ago by cohosh

I've started tracking my changes to the browser-based proxies in this branch: https://github.com/cohosh/snowflake/tree/ticket/32938-proxy

The changes are a bit hacky at the moment, I'll probably do a refactor to make it cleaner.

comment:11 Changed 3 months ago by cohosh

I've been reflecting on this ticket a bit as I continue implementing it. Right now this seems to be taking the form of a usability improvement that may eventually lead to performance improvements. I'm wondering how useful it will be in the long run. Some insights:

  • This feature does nothing to prevent an adversary from swamping the broker with malicious proxies. Having the broker tell a proxy how often to poll is an entirely trust-based mechanic to improve network health. Similarly, this test makes it easy for a proxy to distinguish between the throughput test and actual clients. They can perform well for the test and then poorly for client traffic if they want to cause trouble. Maybe this is okay, and this is just a tool for users to see if their proxy works (similar to how bridgestrap was designed to be used for #31874).
  • If we want to make this throughput test more rigorous against adversarial proxies, we're going to have to do things like introduce persistent identifiers for snowflake proxies (#29260), and make some modifications to the broker to track proxy performance and periodically scan proxies for malicious behaviour. Doing so makes snowflake proxies more and more similar to Tor relays and I'm not sure this fits the model we have in mind for proxies to be simple, lightweight, and ephemeral.
  • Ultimately, this may be more trouble than it's worth if it means we have to maintain a separate deployed probe point. Rolling this functionality into the broker would be better for this reason since we need a broker deployment already for the whole system to work.

comment:12 Changed 3 months ago by dcf

Status: needs_reviewneeds_revision

The refactoring in proxy-go, allowing proxy-go to treat the broker and the bridgestrap mostly equivalently, looks reasonable. I don't see this as something meant to be secure against adversarial proxies, only psychological reassurance for honest proxy operators. I really don't think this function should be rolled into the broker; actually I think the broker should be more compartmentalized overall. Even if it's running on the same host, I feel it should be a separate process.

I'm having trouble understanding the control flow between Snowflake and ThroughputLogger in bridgestrap. Snowflake.runTest calls Snowflake.MarkWritten to store a timestamp in ThroughputLogger, then does Snowflake.Write which results in calls to Snowflake.dc.OnMessage, which calls ThroughputLogger.AddBytes, which writes into a channel read by Snowflake.Log, which then looks at the previously stored timestamp. I wonder if there's more of a straight-line way to write it.

The high latency you mentioned in comment:9 seems to be a bug. Even in my localhost test, I get a latency of around 5 seconds. In the time.Since(start) computation, time.Now() is increasing faster than start is. In my local test, the difference increased smoothly and monotonically from 0.04 to 9.66 seconds over 940 iterations in runTest. Maybe there is some kind of buffering happening where packets are "sent" much faster than they are received; like maybe I can send in 10,000 iterations almost instantly, while really those are being buffered by the OS and not really sent immediately.

https://gitlab.torproject.org/cohosh/bridgestrap/blob/c76fb1c24eacdeefddab699aa7ac2bf111c5e63f/snowflake.go#L153-160
AverageLatency will panic with a division by zero if for whatever reason count is 0 (if there were no messages received). Why round the average latency to 1 second?

https://gitlab.torproject.org/cohosh/bridgestrap/blob/c76fb1c24eacdeefddab699aa7ac2bf111c5e63f/snowflake.go#L59
The OnMessage callback assumes it has at least 4 bytes to work with and will panic if it does not. The design relies on the blocks sent by runTest retaining message boundaries when they come back into OnMessage, which isn't guaranteed. It's worth thinking about what a malicious proxy could do by falsifying the count value at the beginning of each buffer. It may be better to do something simpler like: send 10 KB, receive 10 KB, and only do another iteration once you've received the same number of bytes that you sent.

I think there's a memory leak in idToSnowflake if the test never completes. A proxy could hit the /api/snowflake-poll route to add entries to the map and never hit /api/snowflake-test to remove them.

https://gitlab.torproject.org/cohosh/bridgestrap/blob/c76fb1c24eacdeefddab699aa7ac2bf111c5e63f/snowflake.go#L192
Rand.Read is documented never to return an error, so I would prefer a panic rather than an error return here.

APISnowflakeRequest and APISnowflakeTest need a byte limit to prevent someone sending an infinite JSON object and using up all memory. A read deadline would make sense, too.

An alternative design would be to reverse the direction of traffic flow. Let the proxy send data and bridgestrap reflect it. The proxy can compute its own throughput and latency locally. The bridgestrap part could then be made stateless except for the offer–answer matching.

Like you, I'm not sure of the long-term utility of the throughput test feature. Maybe we'll soon see enough organic client use to cause proxies to actually be used, but that's hard to predict. "Give me a fake client on demand" could be a useful diagnostic feature to have. Conceivably we could do the same thing probabilistically in the normal course of operation of proxies: sometimes you get a real client, sometimes you get a "canary" client whose only purpose is to allow the proxy to assess its own health.

comment:13 Changed 4 weeks ago by gaba

Keywords: anti-censorship-roadmap-2020 added; anti-censorship-roadmap-2020Q1 removed

No more Q1 for 2020.

Note: See TracTickets for help on using tickets.