flashproxy facilitator: Allow clients to specify transports

added component::archived/flashproxy owner::dcf parent::7167 priority::medium resolution::fixed status::closed type::task labels

Items that need to be done to achieve the goal of this ticket:

Implement advanced client registrations, where clients can also specify a transport when they ask for a flashproxy. David in comment:2:ticket:5578 said that registrations currently look like this:

client=1.2.3.4:9000

and they could be changed to look similar to this:

client-1.2.3.4:9000&client-webrtc=1.2.3.4:10000

Facilitators should be able to have multiple bridges that they can suggest to flashproxies. The facilitator currently reads a single bridge using the --address CLI switch (https://gitweb.torproject.org/flashproxy.git/blob/e85b4a8ee5d603c34fc63ef3c9878ae06378da94:/facilitator/init.d/facilitator#l19). We will need to pass multiple bridges in the future, so we might want to use a config file, where we put our bridges and annotate them with the transports they support. Possible example of such a config file:

websocket 1.2.3.4:5555
webrtc 1.2.3.4:6555
obfs2-websocket 1.2.3.5:5555

The facilitator should parse the new client-registration, match it up with a registered bridge, and return the bridge as part of its response. We might also want to make a response for "transport name not known".

All of the tasks above should be spec'ed. Doesn't look too hard!

I atttach a patch for doc/design.txt of flashproxy to document the new advanced client registrations.

David, do you also want documentation for the format of the file that holds bridges for the facilitator? My plan is to make it a simple YAML file with -> mappings. Where should I document this? Is doc/design.txt the right place?

Trac:
0001-Add-pluggable-transport-client-registration-specific.patch

Replying to asn:

I atttach a patch for doc/design.txt of flashproxy to document the new advanced client registrations.

Thank you, my man. design.txt is a good place to document it. Beware that design.txt is somewhat out of date. Here are my thoughts.

There are three things to consider: the set of transport chains the client supports, the set of transport chains the proxy supports, and the set of transport chains the relay supports.

By "transport chain" I mean something like obfs3|websocket. What is nice about this is that the proxy only needs to support the outermost layer. A proxy that supports websocket can connect a client and server that speak obfs3|websocket (or rot13|obfs3|websocket etc.), without needing to know that there is obfs3 underneath. Everything up to the last component in the chain must be identical between client and relay; e.g., an obfs3|websocket client would not be able to talk to a rot13|websocket relay, even if there is a websocket proxy between them.

To give three examples of realistic outermost layers, we have websocket, webrtc, and plain tcp. We can imagine, for instance, a JavaScript proxy capable of speaking both websocket and webrtc. Such a proxy could connect an obfs3|webrtc client with an obfs3|websocket relay. A standalone proxy might be capable of making plain tcp connections; it could connect an obfs3|tcp client (obfs3 with a tcp listener shim) with any old obfs3 relay.

So we need a way for:

clients to say, "these are the transport chains I support, and the address I'm listening on for each one."
proxies to say, "these are the outer transports I support."
the facilitator to know a static list of relays, with their transport chains and the listening address for each.

First I propose to deprecate the existing client= notation and make it synonymous with client-websocket=.

Let's consider the use case of a client that supports websocket, obfs3|websocket, and obfs3|tcp. It may send a registration message like,

client-websocket=1.2.3.4:1000&client-obfs3%7cwebsocket=1.2.3.4:2000&client-obfs3%7ctcp=1.2.3.4:3000

Proxies send some information in their GET request. (Currently just a protocol revision number and a list of clients, see here.) So the proxy might send its list of supported transports:

GET /r=1&transport=websocket&transport=webrtc HTTP/1.1

Then the facilitator needs to find a client and a relay where 1) the components in the transport chain are the same up to the last component, and 2) the last component is websocket or webrtc. Let's say it finds an obfs3|websocket relay at 10.10.10.10:9902. Then it can send a reply to the proxy:

client-obfs3%7cwebsocket=1.2.3.4:2000&relay-obfs3%7cwebsocket=10.10.10.10:9902

Embedding the transport chain in the "name" part of application/x-www-form-urlencoded name-value pairs is kind of an abuse, but it's not too bad. Perhaps there is a better way to do it. I originally used application/x-www-form-urlencoded because it was already implemented in JavaScript. If I were starting over, I would probably use JSON as there are standard functions to parse JSON in browsers and lots of other languages support it.

I think, because of the "outermost transport" thing, that we will have to specify transport chains as parseable strings (for instance using the pipe notation I used above), and not opaque identifiers like "obfs3_flash" or "obfs-in-websocket".

I agree a config file to specify a list of known relays and their transports is a good idea.

Trac:
0001-Add-specification-for-pluggable-transport-client-reg.patch

I attached another design.txt patch. This one also specifies the new flashproxy poll format.

I understand what you say about the outermost transport thing. I usually think of this as the transport layer (in contrast with obfs3 etc. which I think as obfuscation layer or presentation layer or something like this).

BTW, do you think I should also specify the way that the facilitator should handle flashproxy polls that include transports? As I see it, when the facilitator gets a flashproxy poll that includes transport X, it should: a) See the client registrations that use transports, and see if any of them have X as their outermost transport. To do this, we will need to modify get_reg_for_proxy() and RegSet (and maybe more stuff).

b) Then it needs to find a registered bridge that supports the transport chain that the client registration asked for. We will need a config file containing bridges and some utility functions to do this.

c) Finally it needs to send the new-style response to flashproxy that contains the client, the relay and the transports they support. >To do this, we will need to modify fac.py:get_reg() or something like that.

I'm not sure I understand the way RegSets work. What's this tier business? Should I make RegSets transport-aware or would you prefer to do this in another way?

Also, is the IPC mechanism of flashproxy documented somewhere (the FROM, PUT etc. commands that are passed around?)

Finally, I'm fine with using the pipe symbol as the transport separator (transport names are C identifiers btw).

(I might also need some tips on testing/debugging the facilitator.)

Thanks!

Trac:
Cc: N/A to arlo@torproject.org

Replying to asn:

I attached another design.txt patch. This one also specifies the new flashproxy poll format.

I agree with this patch.

BTW, do you think I should also specify the way that the facilitator should handle flashproxy polls that include transports? As I see it, when the facilitator gets a flashproxy poll that includes transport X, it should: a) See the client registrations that use transports, and see if any of them have X as their outermost transport. To do this, we will need to modify get_reg_for_proxy() and RegSet (and maybe more stuff).

b) Then it needs to find a registered bridge that supports the transport chain that the client registration asked for. We will need a config file containing bridges and some utility functions to do this.

c) Finally it needs to send the new-style response to flashproxy that contains the client, the relay and the transports they support. >To do this, we will need to modify fac.py:get_reg() or something like that.

I'm not sure I understand the way RegSets work. What's this tier business? Should I make RegSets transport-aware or would you prefer to do this in another way?

I think your understanding is correct. We don't have to specify the internal steps taken by the facilitator, only that it it returns to the proxy a client and relay address compatible with one of the proxy's offered transports.

We say that the client with the fewest proxies is the one that should be served next. A reg is put in the tier equal to the number of proxies it currently has. You can get the next client in O(1) by popping the lowest non-empty tier. Likewise moving between tiers is O(1). It could also be implemented as e.g. a priority queue.

RegSet is just a bag of registrations, that knows how to extract the registration with the highest priority.

Currently we have get_reg_for_proxy--the only thing we use the proxy address for is to decide IPv4 versus IPv6. This is handled by having two instances of RegSet: REGS_IPV4 and REGS_IPV6. So maybe we can have one instance of RegSet for each outermost transport, and get_reg_for_proxy will also get the list of transports the proxy supports.

We can check, at client registration time, whether the client has any known matching relays (matching in the sense that they have a compatible transport chain). Otherwise we just drop the useless registration. That way we can assume that all client regs have a relay match, and are only waiting for a compatible proxy to appear. Suppose a websocket proxy appears, we consult the websocket RegSet and find the highest-priority websocket client, then do a fast lookup to find the relay it should be matched with.

Also, is the IPC mechanism of flashproxy documented somewhere (the FROM, PUT etc. commands that are passed around?)

Sorry, it is hardly documented. The doc comment on parse_transation shows the syntax. My goal was to make the protocol hard to wrongly implement. Quoting of strings is obligatory. A PUT transaction currently looks like:

PUT CLIENT="1.1.1.1:1111" FROM="2.2.2.2:2222"

PUT happens when a client makes an HTTP POST registration request. The FROM part is not used currently; I intended it to be used for rate limiting, or to allow trusted registrants not to be bound by rate limiting. A GET transaction is

GET FROM="3.3.3.3:3333"

Here, FROM is the proxy's address, and it is what gets passed to get_reg_for_proxy. Responses have the same format. They can look like

OK CLIENT="1.1.1.1:1111" RELAY="4.4.4.4:4444" CHECK-BACK-IN="600"

or

NONE CHECK-BACK-IN="600"

(I might also need some tips on testing/debugging the facilitator.)

doc/facilitator-howto.txt tells how it is set up. You can skip some steps for your own testing, for example you don't need an SSL cert, and you don't have to use Apache, you can use any simple web server capable of serving CGI. You don't need to run facilitator-email-poller nor facilitator-reg-daemon; just use the HTTP rendezvous and flashproxy-reg-http for testing.

Some programs allow you to override the default public facilitator. For example use flashproxy-reg-http -f http://localhost:8000.

Try the -d option to facilitator to log to stdout.

Arlo has set up a facilitator in the past.

Noting a related ticket: #7945 (closed) "Modify facilitator to hand out multiple relays."

Thanks for the clarifications.

BTW, design.txt -- which I changed -- onyly specifies the HTTP registration format, it doesn't specify the format of the email registration etc. Should I assume that the body of the POST, is the same as the body of the email (this seems to be the case in the code)?

Also, what about the OSS registration? Should we also change this to be transport-aware? Is its format specified somewhere?

For "release early; release often" purposes, I pushed a branch with changes on the client-side applications to support pluggable transports and the new client registrations.

It can be found in https://git.torproject.org/user/asn/flashproxy.git under branch bug9349_client_side. I've only briefly tested it.

Replying to asn:

For "release early; release often" purposes, I pushed a branch with changes on the client-side applications to support pluggable transports and the new client registrations.

It can be found in https://git.torproject.org/user/asn/flashproxy.git under branch bug9349_client_side. I've only briefly tested it.

Thanks George, you're awesome.

The --help test for --transport should say what the default value is.

Set

    transport = DEFAULT_TRANSPORT

in the global options class, not in the code just before option parsing.

+    transport_part = [""] # default to empty string

I think that should rather be an empty list; otherwise I'm pretty sure the helpers get an extra empty argument.

I saw you moved some duplicated code into a module flashproxy_reg_utils. Please see #6810 (closed) for more about reducing code duplication. I'm afraid doing it this way will break make install. Alexandre tried breaking out a module in #6810 (closed), and it didn't quite work. So if you can separate the deduplication part of the patch, I think it's better for the purpose of this ticket.

Replying to asn:

Thanks for the clarifications.

BTW, design.txt -- which I changed -- onyly specifies the HTTP registration format, it doesn't specify the format of the email registration etc. Should I assume that the body of the POST, is the same as the body of the email (this seems to be the case in the code)?

Also, what about the OSS registration? Should we also change this to be transport-aware? Is its format specified somewhere?

Basically everything uses the format understood by facilitator-reg-daemon. This program listens on a socket and reads a base64-encoded ciphertext (See Handler.handle in facilitator-reg-daemon). Decrypted, the plaintext format appears to be newline-separated name-value pairs (check find_client_addr). I'm not sure why it's using this homebrew format and not www-url-encoded, which would be easier to handle with respect to escaping.

facilitator-reg-daemon exists as a separate process for privilege separation reasons. It's the only program that has to be able to read the facilitator's private key. When the email or appspot helpers get their base64 blob, they just pass it straight to facilitator-reg-daemon. Check url_reg in facilitator.cgi for how appspot is handled and handle_message in facilitator-email-poller for how email is handled.

At https://github.com/arlolra/flashproxy/compare/raw, I modified the standalone flashproxy to terminate the websocket connection and make tcp connections to generic relays not supporting the websocket pt.

This could be useful for testing the proxy transport chain and the modified facilitator from #7945 (closed).

Thanks for the review David.

Again for "release early; release often" purposes, I pushed a branch of the facilitator changes. You can find it in branch bug9349_server_side_squashed_XXX of https://git.torproject.org/user/asn/flashproxy.git. Or in https://gitweb.torproject.org/user/asn/flashproxy.git/shortlog/refs/heads/bug9349_server_side_squashed_XXX.

It's completely untested. My next move is to test it and write unit tests. I should also fix the issues that you found in my client-side branch.

Pushed improved facilitator branch at branch bug9349_server_side_draft in https://git.torproject.org/user/asn/flashproxy.git.

Gave it a little testing and it seems to work. More testing is needed for sure.

Also pushed trivial flashproxy modifications in branch bug9349_proxy_side.

Replying to arlolra:

At https://github.com/arlolra/flashproxy/compare/raw, I modified the standalone flashproxy to terminate the websocket connection and make tcp connections to generic relays not supporting the websocket pt.

This could be useful for testing the proxy transport chain and the modified facilitator from #7945 (closed).

Thanks for this. I think what we will want to do is build an abstraction layer for sockets, and then adapt both WebSocket and plain TCP to it.

We're not like to adopt the model where you try to connect to some relay and then fall back to another transport if it fails. Instead, in #9349 (closed) we let the proxies tell the facilitator what transports they support, and the facilitator gies them an appropriate relay.

I think the idea of plain TCP between proxy and client is even more interesting than between proxy and relay.

Replying to asn:

Also pushed trivial flashproxy modifications in branch bug9349_proxy_side.

+        params.push(["transports", "websocket"]);

I think I prefer transport here, not transports. If there are multiple transports I want the query string to look like

transport=websocket&transport=webrtc

not

transports=websocket,webrtc

so that we don't have to invent our own format for list serialization, and concomitant worries about escaping within URL escaping. To use such a multiple-valued query string is easy, for example you can use Python's FieldStorage.getlist.

Likewise in facilitator transactions, I would like to see multiple TRANSPORT= instead of one TRANSPORTS=. You will have to add a new function in fac.py that is like param_first but it returns the whole list of values.

+    transports = transport_list.join(",") # XXX wtf
     try:
-        command, params = transact(f, "GET", ("FROM", format_addr(proxy_addr)))
+        command, params = transact(f, "GET", ("FROM", format_addr(proxy_addr)), ("TRANSPORTS", transports))

In the facilitator, let's break backward compatibility and redefine the -r option to be the name of the relay file to load.

Let's use a tuple to represent a transport chain internally--parse it with str.split("|") as soon as it's read, and format it with "|".join only just before output. Then get_outermost_transport(chain) is just chain[-1].

options.relays should be indexed not by complete transport chains, but by transport chains excluding their last element. It should be possible for an obfs3|websocket client to talk to an obfs3|tcp relay, if there is a proxy that speaks both websocket and tcp. To be specific you should key by the tuple ("obfs3",) and not ("obfs3", "websocket").

I don't want to make a distinction between "new-style" and "old-style" registrations. There is just one backward-compatible style. In your loop over fs.keys(), notice a key that is exactly client, and treat it the same as client-websocket.

I'm also hoping you will address the client comments from comment:10.

My plan for merging is to first do the proxy, because that's trivial and doesn't require other changes, then merge the facilitator. We can then run client registrations manually to test obfs-flash.

Replying to dcf:

Replying to asn:

Also pushed trivial flashproxy modifications in branch bug9349_proxy_side. {{{
   params.push(["transports", "websocket"]);
}}} I think I prefer transport here, not transports. If there are multiple transports I want the query string to look like {{{ transport=websocket&transport=webrtc }}} not {{{ transports=websocket,webrtc }}} so that we don't have to invent our own format for list serialization, and concomitant worries about escaping within URL escaping. To use such a multiple-valued query string is easy, for example you can use Python's FieldStorage.getlist.

Likewise in facilitator transactions, I would like to see multiple TRANSPORT= instead of one TRANSPORTS=. You will have to add a new function in fac.py that is like param_first but it returns the whole list of values. {{{

transports = transport_list.join(",") # XXX wtf try:
   command, params = transact(f, "GET", ("FROM", format_addr(proxy_addr)))
   command, params = transact(f, "GET", ("FROM", format_addr(proxy_addr)), ("TRANSPORTS", transports))
}}}

Fixed both comments above.

In the facilitator, let's break backward compatibility and redefine the -r option to be the name of the relay file to load.

Done. I did not know how to update the init script though.

Let's use a tuple to represent a transport chain internally--parse it with str.split("|") as soon as it's read, and format it with "|".join only just before output. Then get_outermost_transport(chain) is just chain[-1].

Done.

options.relays should be indexed not by complete transport chains, but by transport chains excluding their last element. It should be possible for an obfs3|websocket client to talk to an obfs3|tcp relay, if there is a proxy that speaks both websocket and tcp. To be specific you should key by the tuple ("obfs3",) and not ("obfs3", "websocket").

Not yet done.

I don't want to make a distinction between "new-style" and "old-style" registrations. There is just one backward-compatible style. In your loop over fs.keys(), notice a key that is exactly client, and treat it the same as client-websocket.

Done.

I'm also hoping you will address the client comments from comment:10.

Not yet done.

My plan for merging is to first do the proxy, because that's trivial and doesn't require other changes, then merge the facilitator. We can then run client registrations manually to test obfs-flash.

I pushed the facilitator changes in bug9349_server_side_draft. I pushed the flashproxy changes (s/transports/transport) to bug9349_proxy_second_take.

flashproxy facilitator: Allow clients to specify transports

Child items ...

Activity