Opened 12 months ago

Closed 2 months ago

Last modified 2 months ago

#29207 closed enhancement (fixed)

New design for broker -- proxy protocol for snowflakes

Reported by: cohosh Owned by: cohosh
Priority: High Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords: snowflake, design, ex-sponsor-19, anti-censorship-roadmap
Cc: dcf, arlolra, cohosh Actual Points: 2
Parent ID: Points: 5
Reviewer: Sponsor: Sponsor28-must

Description (last modified by cohosh)

This is related to the Snowflake protocol design tickets #29206 and #29293.

We want to write these protocols in a way that is not Snowflake-specific but allows any type of bridge to connect to or poll our broker/BridgeDB bridge distribution service.

The idea is that in the beginning we will start with very reliable "bridge" (which could be snowflakes) that perhaps rotate IP addresses every month or so. After that we can collect measurements and move towards more ephemeral "bridges".

Some things to keep in mind are the types of information that the snowflakes give to the broker (such as proxy version/type) to allow for metrics. This information might change so we'll want a flexible and extensible protocol.

Child Tickets

TicketTypeStatusOwnerSummary
#29734enhancementclosedcohoshBroker should receive country stats information from Proxy and Client

Change History (35)

comment:1 Changed 12 months ago by cohosh

Cc: cohosh added

comment:2 Changed 12 months ago by cohosh

Sponsor: Sponsor19

comment:3 Changed 12 months ago by cohosh

Description: modified (diff)
Keywords: snowflake design added

comment:4 Changed 12 months ago by gaba

Points: 5

comment:5 Changed 12 months ago by cohosh

As referenced in #29426, the broker currently gives proxies a 504 message if no client is available which is a questionable design:

2019/02/07 12:11:51 broker returns: 504
INFO: peerconnection.go:468: fired OnIceCandidateError: 143
2019/02/07 12:12:01 broker returns: 504

At the very least it makes logs confusing.

comment:6 Changed 11 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 added

comment:7 Changed 11 months ago by cohosh

We have some preliminary notes about what the protocols between each part of the Snowflake system should accomplish: https://github.com/ahf/snowflake-notes/blob/master/Protocol.markdown

comment:8 in reply to:  7 Changed 11 months ago by dcf

Replying to cohosh:

We have some preliminary notes about what the protocols between each part of the Snowflake system should accomplish: https://github.com/ahf/snowflake-notes/blob/master/Protocol.markdown

Regarding the question

Does the "token bucket" algorithm in Snowflake right now do what we think?

I suspect the answer is no. I'm pretty sure BucketRateLimit didn't work before ab34f8e889 (part of #28732), and I won't swear that the commit fixed it :) I was tempted, then, to rip out the rate-limit code for the sake of starting from a simpler basis, but decided not to.

I'm pretty sure BucketRateLimit and DummyRateLimit are based off objects of the same name in the flash proxy code. As I recall, they were working in flash proxy.

comment:9 Changed 9 months ago by gaba

Keywords: network-team-roadmap-2019-Q1Q2 removed

comment:10 Changed 8 months ago by gaba

Keywords: ex-sponsor-19 added

Adding the keyword to mark everything that didn't fit into the time for sponsor 19.

comment:11 Changed 8 months ago by phw

Sponsor: Sponsor19Sponsor28-must

Moving from Sponsor 19 to Sponsor 28.

comment:12 Changed 7 months ago by gaba

Keywords: anti-censorship-roadmap added
Owner: set to ahf
Status: newassigned

comment:13 in reply to:  5 ; Changed 5 months ago by serna

Replying to cohosh:

As referenced in #29426, the broker currently gives proxies a 504 message if no client is available which is a questionable design

I suggest changing the status to 204 No Content, it would represent that there's no error but it also no one to connect to.

comment:14 Changed 5 months ago by cohosh

Thanks serna! I like the idea of providing a 2XX response. This will also help with the error/warning messages produced in the browser-based proxy logs.

comment:15 in reply to:  13 ; Changed 5 months ago by dcf

Replying to serna:

Replying to cohosh:

As referenced in #29426, the broker currently gives proxies a 504 message if no client is available which is a questionable design

I suggest changing the status to 204 No Content, it would represent that there's no error but it also no one to connect to.

I disagree here. It's better not to hide any necessary information in the HTTP layer, because not every way of interacting with the broker will have those HTTP features. See for example comment:11:ticket:25985. Even assuming HTTP, codes like 204 are probably less likely to pass untouched by proxies. I propose to just always use status code 200, unless there is a genuine internal server error, and encode all necessary information in the HTTP body. We're not designing a REST interface here.

comment:16 in reply to:  15 Changed 5 months ago by cohosh

Replying to dcf:

Replying to serna:

Replying to cohosh:

As referenced in #29426, the broker currently gives proxies a 504 message if no client is available which is a questionable design

I suggest changing the status to 204 No Content, it would represent that there's no error but it also no one to connect to.

I disagree here. It's better not to hide any necessary information in the HTTP layer, because not every way of interacting with the broker will have those HTTP features. See for example comment:11:ticket:25985. Even assuming HTTP, codes like 204 are probably less likely to pass untouched by proxies. I propose to just always use status code 200, unless there is a genuine internal server error, and encode all necessary information in the HTTP body. We're not designing a REST interface here.

Okay, this makes sense to me. So the idea would be to change all codes to 200 and then do a redesign as suggested in #30704.

comment:17 Changed 5 months ago by cohosh

Owner: changed from ahf to cohosh

comment:18 Changed 5 months ago by cohosh

Priority: Very HighHigh
Type: taskenhancement

comment:19 Changed 5 months ago by cohosh

Summary of how things work now

Note: see Broker.markdown for documentation of the Snowflake broker. This is a more specific proxy-focused break down of the messages sent.

Proxy Poll

The proxy sends

POST [broker URL] HTTP [version]
X-Session-ID: [session id]

[session id]

and the broker confirms that the session id given in the header matches that given in the body.

The broker then responds with one of three messages:

  • If the session ID in the header did not match the session ID in the body, it sends:
     HTTP 400 Bad Request
    
  • If there is a client matched to the proxy, it sends:
     HTTP 200 OK
    
    {
        type: offer
        sdp: [WebRTC SDP]
    
    }
    
    

where the HTTP response body is a serialized WebRTC Session description offer

  • If there are no clients matched the proxy, it sends:
     HTTP 504 Gateway Timeout
    

Proxy Answers

The proxy sends

POST [broker URL] HTTP[version]
X-Session-ID: [session id]

{
    type: answer
    sdp: [WebRTC SDP]
}

where the HTTP response body is a serialized WebRTC Session description answer.

The broker then uses the provided session ID to match this answer with the correct snowflake and provides one of three responses:

  • If the proxy took too long to respond, it sends:
     HTTP 410 Gone
    
  • If the body of the POST request was empty or surpassed the read limit, it sends:
     HTTP 400 Bad Request
    
  • If the answer was sent to the client, it sends:
     HTTP 200 OK
    

comment:20 Changed 5 months ago by cohosh

Overall, we want to move to putting all relevant information in the HTTP response body and provide 200 OK responses from the broker for all valid interactions. I think it's okay to leave the 400 Bad Request responses alone for requests we catch that definitely deviate from the snowflake protocol.

We want to do this in a backwards compatable way to accomodate proxies that have not yet updated. Note that we have control over the broker deployment at this time. I propose the following changes:

Proxy Poll

Updated proxies send

POST [broker URL] HTTP [version]

{
    version: "1.0",
    sid: [session id]
}

The broker checks to see if the response body contains a JSON format with a version field that specifies version 1.0. If not, revert to old method but don't compare the HTTP header with the body (I'm not sure what the usefulness of that is).

The broker then responds with one of two messages:

  • If there are no clients matched the proxy, it sends:
     HTTP 200 OK
    
     {
         status: "no proxies"
     }
    

For old snowflakes (determined as described above), it sends the old HTTP 504 Gateway Timeout response.

We might be able to avoid some branching logic and send the new 200 response to old proxies. I think this is OK to do for backwards compatability because the call to webrtc.Deserialize will fail and return a null offer for proxy-go instances and browser-based proxies will also fail to received the offer and continue polling. However, it feels a bit risky.

  • If there is a client matched to the proxy, it sends to updated proxies:
     HTTP 200 OK
     {
         status: "client match",
         {
             type: offer,
             sdp: [WebRTC SDP]
    
         }
     }
    

and the old message to non updated proxies. I'm not sure if the new message will work, I have to read more about how the JSON functions in Go and node respond to this structure. To receive the offer, the proxy needs to be able to find the "type" and "sdp" fields in the JSON object.

Proxy Answers

Updated proxies send:

POST [broker URL] HTTP[version]

{
    version: "1.0",
    sid: [session id],
    {
        type: answer,
        sdp: [WebRTC SDP]
    }
}

The broker then checks the version as above to determine whether it's an old or new proxy and uses the provided session ID to match this answer with the correct snowflake and provides one of three responses:

  • If the proxy took too long to respond, it sends:
     HTTP 200 OK
    
     {
         status: "client gone"
     }
    
    to updated proxies and HTTP 410 Gone to old ones. We could send the new message to old proxies, however old proxies assume the answer was sent to a client and will wait for a data channel to open if they receive a 200 OK message. We do have a timeout built in now since #25688 and #31100 so the token will eventually free a slot for more clients.
  • If the body of the POST request was empty or surpassed the read limit, it still sends HTTP 400 Bad Request to all proxies.
  • If the answer was sent to the client, it sends to all proxies:
     HTTP 200 OK
    
     {
         status: "success"
     }
    
Last edited 5 months ago by cohosh (previous) (diff)

comment:21 in reply to:  15 Changed 5 months ago by dcf

Replying to dcf:

Replying to serna:

I suggest changing the status to 204 No Content, it would represent that there's no error but it also no one to connect to.

I disagree here. It's better not to hide any necessary information in the HTTP layer, because not every way of interacting with the broker will have those HTTP features. See for example comment:11:ticket:25985. Even assuming HTTP, codes like 204 are probably less likely to pass untouched by proxies. I propose to just always use status code 200, unless there is a genuine internal server error, and encode all necessary information in the HTTP body. We're not designing a REST interface here.

Let me slightly amend my statement here. I was mistakenly thinking that this was about the client–broker protocol, not the proxy–broker protocol. While I do think that it's important to avoid HTTP entanglement in the client–broker protocol (#29293), in the proxy–broker protocol there's probably no harm in assuming HTTP in the proxy–broker protocol. That said, there's also no harm in moving that information into the body, as in comment:20.

comment:22 Changed 4 months ago by cohosh

Status: assignedneeds_review

Here is a pull request for this ticket: https://github.com/cohosh/snowflake/pull/8.

Even though we decided it wasn't strictly necessary, this puts all information in the body of the HTTP request/response. I think it's cleaner this way and we'll be using the same method for redoing the broker-client protocol eventually anyway.

I opted for the "hard break" method of updating, since the worst case scenario is that a proxy that hasn't updated won't be able to usefully poll. As mentioned in #30704, web based proxies reboot themselves once a day. The standalone proxies will throw bad request error messages which should hopefully prompt those running them to update.

comment:23 Changed 4 months ago by cohosh

Actual Points: 2

Updating actual points with time spent on this ticket so far.

comment:24 Changed 3 months ago by dcf

Status: needs_reviewneeds_information

This looks good. Just a few points.

  • If you prefer to use lowercase JSON keys, you can do it with struct tags like
    type ProxyPollRequest struct {
    	Sid     string `json:"sid"`
    	Version string `json:"version"`
    }
    
  • StatusInternalServerError seems more appropriate than StatusBadRequest for an error where the server fails to marshal (as opposed to unmarshal) some information.
  • In proxyAnswers, the check for r.Body == nil is unnecessary, because net/http guarantees it will always be non-nil.
  • The test "with error if the proxy is not recognized" still refers to X-Session-ID, which it should probably be rewritten to use a JSON structure with "Sid": "invalid".
  • A raw string literal will save some escaping in strings like "{\"Sid\":\"ymbcCMto7KHNGYlp\",\"Version\":\"1.0\"}".
  • The comment at the top of proxy.go looks like it's erroneously copied from safelog/log.go.

comment:25 Changed 3 months ago by cohosh

Status: needs_informationneeds_review

Thanks! I addressed the feedback above.

I also added the necessary changes for the browser-based proxies and updated those tests accordingly: https://github.com/cohosh/snowflake/pull/8

Lowercase vs. uppercase doesn't really bother me. I just left it with uppercase JSON keys for now.

comment:26 in reply to:  25 Changed 3 months ago by phw

Status: needs_reviewmerge_ready

Replying to cohosh:

Thanks! I addressed the feedback above.

I also added the necessary changes for the browser-based proxies and updated those tests accordingly: https://github.com/cohosh/snowflake/pull/8

Lowercase vs. uppercase doesn't really bother me. I just left it with uppercase JSON keys for now.


Looks good to me. I only have a nitpick related to 5e1fab6: Do we want constants like STATUS_CLIENT_MATCH instead of string literals like "client match"? I could see constants being easier to maintain but I don't know if this is idiomatic to JavaScript. I'll leave this up to you to decide.

comment:27 Changed 3 months ago by cohosh

Status: merge_readyneeds_review

I agree, I pushed a change to the same branch. If you think it looks good, I'll merge this today. Thanks!

comment:28 in reply to:  27 Changed 3 months ago by phw

Status: needs_reviewmerge_ready

Replying to cohosh:

I agree, I pushed a change to the same branch. If you think it looks good, I'll merge this today. Thanks!


Looks good to me.

comment:29 Changed 3 months ago by cohosh

Now to talk about backwards compatibility and upgrading. The way this is written right now, it will kick out proxies that haven't updated. I think this is a feature, we have some performance problems right now that might be due to proxies that haven't updated. It also gives a means to exclude proxies in the future for not updating (by the Version field supplied in the polls). I tested this out using snowbox and the broker handles new as well as outdated requests just fine, and it returns a 400 Bad Request for outdated proxies. What I could see desirable here is a way for the proxy to disable itself if it receives a 4XX status code.

We also currently have two brokers, one on the new migrated host and one on the old host due to #29258. I guess the best method is to deploy it at both hosts simultaneously.

Anyone opposed to me starting this deployment? I plan to update the brokers first and then quickly do a proxy update.

comment:31 Changed 2 months ago by cohosh

Okay upgrade in progress. I updated the snowflake broker and it has been running the new version since the timestamp:
2019/11/13 16:47:02 Starting HTTP-01 listener

Note that there's a lot of Invalid data. log messages due to old polling proxies.

I also updated the four proxy-go instances running on the snowflake bridge, so the broker is showing 4 standalone proxies currently running.

I was able to bootstrap a client connection fully through the newly deployed bridge and proxy-go instances. So it seems to be working. I'll start the webextension update now.

comment:32 Changed 2 months ago by cohosh

Okay the Firefox and Chrome snowflake web extensions have been updated. It will take a while for us to see the updated proxies trickling in and it will be interesting to see how many we loose from Cupcake.

comment:33 Changed 2 months ago by cohosh

Alright I pushed a patch to remove the invalid data log messages from the broker because they were filling up the logs far too quickly. I think the logs will still compress down to quite small so we probably don't need to fix previous logs.

comment:34 Changed 2 months ago by dcf

Resolution: fixed
Status: merge_readyclosed

I deployed webext-0.1.0 to snowflake.torproject.org at 2019-11-14 20:22:29.

Last edited 2 months ago by dcf (previous) (diff)

comment:35 Changed 2 months ago by dcf

Updated the new broker from #29258 to 7557e96a8d41c778a0b039b03969c66f91bd108a at 2019-11-14 20:47:00.

Note: See TracTickets for help on using tickets.