Opened 4 months ago

Last modified 4 weeks ago

#30704 new task

Plan for snowflake update versioning and backwards compatability

Reported by: cohosh Owned by:
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords:
Cc: cohosh, dcf, arlolra, phw Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by cohosh)

We have some upcoming changes that will change the way snowflake components talk to each other. We should decide (and possibly on a case-by-case basis) how to handle these updates.

  • Do we make sure changes are backwards compatible with clients/proxies that haven't updated yet?
  • Should we think about introducing some concept of versioning?
  • If we support older versions, how long until we no longer support them?

Some examples of tickets that we'll need to think about this for:

Child Tickets

Change History (2)

comment:1 Changed 3 months ago by dcf

Description: modified (diff)

Ideally, we can keep the proxies as protocol-ignorant as possible, so that they don't impede changes at the endpoints (end-to-end principle). #29206, for example, likely won't require any changes in the proxy, which can continue blindly forwarding anything it receives. Similarly, #25985 ideally only requires changes in the client and broker, not the proxy and server. The exception to this principle is if we need to change something about the WebRTC tunnel rather than the data tunnelled within it, for example if we want to try an unreliable channel.

If we do need to upgrade proxies, my mental model is that proxies are easy to upgrade (at least the web-based ones) because they reboot themselves once a day. Flash proxy had the same feature, and in this graph you can see how quickly proxies upgraded when we had them start sending a cookie. (-- is cookie-naive; unset and 1 are cookie-aware.)

Let's take #29206 as an example. It changes the format of the stream between client and server to add framing. Here are a few potential ways to handle it:

  1. Flag day. Ignore backward compatibility. Push out an upgraded client in Tor Browser and try to coordinate an upgrade of the server at roughly the same time. This of course breaks all clients that do not upgrade, but we can perhaps get away with that at this stage.
  2. Backward-compatible protocol versioning. I think this is what cohosh is suggesting in comment:13:ticket:29206. We know that the old protocol is a raw TLS stream, so we can add a header of 00000000 or something to the new protocol, anything that enables easy distinguishing from the old protocol. The server peeks at the first few bytes to know what protocol is in use, and then switches to raw-stream mode or framing mode as appropriate. 00000000 could change to different numbers to represent future upgrades. The downside here is code complexity, maintaining two or more code paths.
  3. Parallel deployment. We make a branch for the old protocol and merge the new protocol into master. We deploy two instances of the server, one speaking the old protocol and one speaking the new. These could be on separate IP addresses/domain names, or could even be on the same host, say wss://snowflake.bamsoftware.com/ and wss://snowflake.bamsoftware.com/v2, with a reverse proxy diverting requests as appropriate. When a client registers with the broker, it includes a signal that indicates which protocol version the client supports. The proxy will need to know which server to connect to, so either we have the have the broker tell the proxy which server to connect to instead of having that information hardcoded in the proxy (something like #25598), or else we maintain two pools of proxies, one that uses the old server and one that uses the new. Eventually we deactivate the old server. This way would put the code complexity in the broker rather than the server, so it depends on the nature of the code change whether the trade is worth it.

As for sending additional information (such as a version flag) from the client to the broker, I would ideally like to see that bundled into the registration blob. comment:16:ticket:29206 suggests using parallel metadata such as a URL path or HTTP header, but that only works with HTTP-based rendezvous, not for others that are proposed in #25594. Currently the rendezvous blob is just the raw text of the RTCSessionDescription JSON:

{
  "type": "offer",
  "sdp": "v=0\r\no=...\r\n"
}

It would be better if there were another layer so that we could put other metadata into the blob. Like:

{
  "version": 1,
  "foo": "bar",
  "sessiondescription": {
    "type": "offer",
    "sdp": "v=0\r\no=...\r\n"
  }
}

The benefit is that we can bundle all the later into e.g. a DNS request, and in that way we make the rendezvous method independent of the contents of the rendezvous method. We could adapt to the nested format in a backward-compatible way by having the broker check whether there is a "sessiondescription" key at the top level, and if not, synthesize a new message that has the entire former message nested under that key. The additional code complexity is not bad: just check for the old format and convert it to the new format if needed before doing any other processing.

Something similar applies to the broker's response messages toward the client and proxies. Currently the messages depend on HTTP metadata, namely the status code (comment:2:ticket:29293). They look like this:

  • HTTP/2.0 200 OK
    Content-Length: 742
    
    {
      "type":"answer",
      "sdp":"v=0\r\no=...\r\n"
    }
    
  • HTTP/2.0 504 Gateway Timeout
    Content-Length: 0
    
    

It would be better if all the necessary information were in the HTTP body, because that's something that can be easily bundled up into other channels like DNS or AMP cache. Something like this:

  • HTTP/2.0 200 OK
    Content-Length: 780
    
    {
      "status": 200,
      {
        "type":"answer",
        "sdp":"v=0\r\no=...\r\n"
      }
    }
    
  • HTTP/2.0 504 Gateway Timeout
    Content-Length: 21
    
    {
      "status": 504
    }
    

Then we upgrade clients and proxies to only look at the HTTP body and ignore the status code. We keep sending the old status codes for the benefit of older clients. Now that I think of it, this is backward-compatible for error responses, because old clients/proxies will only look at the status code and ignore the body, but not backward compatible for status 200, because the format of the body message will change. Maybe we could keep the toplevel "type"/"body" as they are, to signify an implicit status 200. Also, "status": 504 is just an example; we may prefer to represent that as a meaningful token like "status": "no-proxies".

comment:2 Changed 4 weeks ago by cohosh

Description: modified (diff)
Note: See TracTickets for help on using tickets.