Opened 12 days ago

Last modified 11 days ago

#28651 new enhancement

Prepare all pieces of the snowflake pipeline for a second snowflake bridge

Reported by: arma Owned by:
Priority: Medium Milestone:
Component: Obfuscation/Snowflake Version:
Severity: Normal Keywords:
Cc: dcf, arlolra Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Right now there is one snowflake bridge, and its fingerprint is hard-coded in tor browser.

Eventually we will have enough load, and/or want more resiliency, that we want to set up a second snowflake bridge.

To be able to do that, I think we need changes at the client, changes at the snowflake, and changes at the broker.

(A) At the snowflake side, the snowflake needs to tell the broker which bridge(s) it is willing to send traffic to. Additionally, we either want to declare that each snowflake sends to only one bridge, or we need to add a way for the client to tell the snowflake which bridge it wants to reach.

(B) At the broker side, we need it to be able to learn from snowflakes which bridge(s) they use, and we need it to be able to learn from clients which bridge they want to use, and we need it to match clients with snowflakes that will reach that bridge.

(C) At the client side, we need it to tell the broker which bridge it wants to use, and (depending on our design choice in A above) we might also need the client to be able to tell the snowflake which bridge it wants to use.

(There is an alternative approach, where we assume that every snowflake is always running the newest javascript, so it is willing to reach every bridge on our master list. Then the broker doesn't need to do anything new, and we just need to add a way for the client to tell the snowflake which bridge it wants. I don't have a good handle on how realistic this assumption is.)

Child Tickets

Change History (4)

comment:1 Changed 12 days ago by dcf

You could simplify further by removing (A). Don't have every proxy keep a whitelist of bridges; rather let it be willing to connect to any address the broker gives it. How this would work is: the client sends a bridge fingerprint or other identifier to the broker; the broker looks up the fingerprint in its own whitelist mapping fingerprint to IP:port; the broker gives the IP:port to the proxy.

What you would lose with this design is a measure of proxies' self-defense against a malicious broker. The broker could get a proxy to initiate a WebSocket connection to any destination.

comment:2 Changed 12 days ago by dcf

Another design alternative, requiring changes in core tor: let a bridge line describe not just a single bridge fingerprint, but a set of them. The client is satisfied if any fingerprint in the set matches. The broker (or the proxy) knows the current set of bridges, and randomly selects one without any control by the client.

Adding a new bridge to the set would require pushing out new bridge lines to users (i.e., making a new Tor Browser release). But if new bridges are only needed to increase capacity, that should be a frequent enough pace.

comment:3 in reply to:  2 ; Changed 12 days ago by teor

I don't think we need to make any major design changes to snowflake or Tor.
Instead, we can achieve what we want by configuring Tor and snowflake (and perhaps adding a small amount of code).

Replying to dcf:

Another design alternative, requiring changes in core tor: let a bridge line describe not just a single bridge fingerprint, but a set of them. The client is satisfied if any fingerprint in the set matches. The broker (or the proxy) knows the current set of bridges, and randomly selects one without any control by the client.

Tor isn't really built to have more than one fingerprint per bridge. Instead, if Tor is configured with multiple bridge lines, it tries to connect to all of the bridges, then selects between available bridges at random.

Here's the current design:

  • each client bridge line has a broker, bridge, and (maybe?) STUN server
  • each broker knows its corresponding bridge
  • each proxy is allocated to a broker/bridge

This design can be gracefully upgraded to:

  • a multi-bridge client, by distributing different bridge lines with different brokers, bridges, and (at least 2) different STUN servers
  • a multi-bridge broker, by using a different port on the broker for each bridge
  • a multi-broker/bridge proxy, by having the proxy connect to multiple brokers, then assign client offers from each broker to the corresponding bridge
    • alternately, each proxy can choose a single bridge/broker at random

Adding a new bridge to the set would require pushing out new bridge lines to users (i.e., making a new Tor Browser release). But if new bridges are only needed to increase capacity, that should be a frequent enough pace.

New bridges are also needed if one of the bridges goes down.

comment:4 in reply to:  3 Changed 11 days ago by dcf

Replying to teor:

Tor isn't really built to have more than one fingerprint per bridge.

Yes, I realize that. That is why I said "requiring changes in core tor." I'm only brainstorming.

Instead, if Tor is configured with multiple bridge lines, it tries to connect to all of the bridges, then selects between available bridges at random.

Here's the current design:

  • each client bridge line has a broker, bridge, and (maybe?) STUN server
  • each broker knows its corresponding bridge
  • each proxy is allocated to a broker/bridge

If I understand you, this would use multiple bridge lines in torrc, one for every valid possibility of bridge/broker. So for example, if there were one broker and two bridges with fingerprints 1234..., 5555..., and ABCD...:

Bridge snowflake 0.0.3.0:1 1234...1234 broker=https://broker/ front=broker.front
Bridge snowflake 0.0.3.0:2 ABCD...ABCD broker=https://broker/ front=broker.front
Bridge snowflake 0.0.3.0:3 5555...5555 broker=https://broker/ front=broker.front

What is potentially unexpected about this approach is that, in my experience, tor does not select just one of its many bridge lines at random; rather it selects several and tries all of the simultaneously. So here, the snowflake-client would simultaneously send out three registration messages (over domain fronting or something else). I guess is isn't too big a problem, but it makes me worry a bit more about fingerprinting the registration process—especially if there are two brokers with two different domain fronts, connecting to them both at the same time could be a tell that is not present in normal traffic.

Here is a torrc file that demonstrates that tor selects more than one of its bridge lines:

Log info stderr
Log info file tor.log
SafeLogging 0
DataDirectory datadir
ClientTransportPlugin obfs4 exec /usr/bin/obfs4proxy
UseBridges 1
Bridge obfs4 85.31.186.26:443 91A6354697E6B02A386312F68D82CF86824D3606 cert=PBwr+S8JTVZo6MPdHnkTwXJPILWADLqfMGoVvhZClMq/Urndyd42BwX9YFJHZnBB3H0XCw iat-mode=0
Bridge obfs4 216.252.162.21:46089 0DB8799466902192B6C7576D58D4F7F714EC87C1 cert=XPUwcQPxEXExHfJYX58gZXN7mYpos7VNAHbkgERNFg+FCVNzuYo1Wp+uMscl3aR9hO2DRQ iat-mode=0

In the log you will see connections made to both bridges. This is why I was trying to think of a design that only requires one bridge line.

Nov 29 10:38:41.000 [info] connection_ap_make_link(): Making internal direct tunnel to 85.31.186.26:443 ...
Nov 29 10:38:41.000 [info] connection_ap_make_link(): Making internal direct tunnel to 216.252.162.21:46089 ...
Nov 29 10:38:41.000 [info] connection_read_proxy_handshake(): Proxy Client: connection to 216.252.162.21:46089 successful
Nov 29 10:38:42.000 [info] connection_read_proxy_handshake(): Proxy Client: connection to 85.31.186.26:443 successful
Nov 29 10:38:42.000 [info] add_an_entry_guard(): Chose $0DB8799466902192B6C7576D58D4F7F714EC87C1~noisebridge01 at 216.252.162.21 as new entry guard.
Nov 29 10:38:43.000 [info] add_an_entry_guard(): Chose $91A6354697E6B02A386312F68D82CF86824D3606~zipfelmuetze at 85.31.186.26 as new entry guard.

This design can be gracefully upgraded to:

  • a multi-bridge client, by distributing different bridge lines with different brokers, bridges, and (at least 2) different STUN servers
  • a multi-bridge broker, by using a different port on the broker for each bridge

I don't understand you here, "a different port on the broker." We envision the client connecting to the broker over some covert channel, like domain fronting or DNS, that doesn't allow control of the destination port. Why encode the selected bridge in transport-layer metadata anyway? The client registration message is basically a blob—already around 1000 bytes because of ICE metadata—that can encode k=v pairs, so you can augment it to contain the name or fingerprint of the desired bridge.

  • a multi-broker/bridge proxy, by having the proxy connect to multiple brokers, then assign client offers from each broker to the corresponding bridge
    • alternately, each proxy can choose a single bridge/broker at random

I get that you're going for redundancy and resilience with multiple brokers. It is a good idea to have multiple brokers running, but there may not be a need to actually encode knowledge of this fact at the client. The difficulty with bridge lines is that each one contains a fingerprint—so a client really does have to store locally a list of every bridge it may want to connect to. But with the broker we have additional layers of indirection. Current clients are using the broker https://snowflake-broker.bamsoftware.com/, but the string "snowflake-broker.bamsoftware.com" doesn't appear anywhere on the client—the actual server the broker is on could change its name or IP address without clients needing to know about it. For example, if one broker goes down, we can change the CDN configuration and point the domain front at a backup one. Or with DNS registration, we can change the IP address of the authoritative DNS server, or potentially even round-robin across multiple brokers, all on the backend. If the system ever gets really big, then the broker doesn't even have to be one thing: it can be a distributed system with e.g. a shared database and its own internal redundancy. I feel that these are implementation decisions that can achieve resilience without needing to be exposed to clients.

Note: See TracTickets for help on using tickets.