The idea of being able to connect flashproxy clients to relays that support specific transports has occured in many places, like comment:5:ticket:7944 comment:2:ticket:5578 and lately comment:17:ticket:7167.
This would allow clients to speak obfs3-over-websocket instead of the current websocket protocol.
To accomodate this, clients should be able to signal to the facilitator that they need bridges that support specific transports. The facilitator should be able to search its database for such bridges and point its users to them.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Items that need to be done to achieve the goal of this ticket:
Implement advanced client registrations, where clients can also specify a transport when they ask for a flashproxy. David in comment:2:ticket:5578 said that registrations currently look like this:
client=1.2.3.4:9000
and they could be changed to look similar to this:
client-1.2.3.4:9000&client-webrtc=1.2.3.4:10000
Facilitators should be able to have multiple bridges that they can suggest to flashproxies. The facilitator currently reads a single bridge using the --address CLI switch (https://gitweb.torproject.org/flashproxy.git/blob/e85b4a8ee5d603c34fc63ef3c9878ae06378da94:/facilitator/init.d/facilitator#l19). We will need to pass multiple bridges in the future, so we might want to use a config file, where we put our bridges and annotate them with the transports they support. Possible example of such a config file:
The facilitator should parse the new client-registration, match it up with a registered bridge, and return the bridge as part of its response. We might also want to make a response for "transport name not known".
All of the tasks above should be spec'ed.
Doesn't look too hard!
I atttach a patch for doc/design.txt of flashproxy to document the new advanced client registrations.
David, do you also want documentation for the format of the file that holds bridges for the facilitator? My plan is to make it a simple YAML file with -> mappings. Where should I document this? Is doc/design.txt the right place?
I atttach a patch for doc/design.txt of flashproxy to document the new advanced client registrations.
Thank you, my man. design.txt is a good place to document it. Beware that design.txt is somewhat out of date. Here are my thoughts.
There are three things to consider: the set of transport chains the client supports, the set of transport chains the proxy supports, and the set of transport chains the relay supports.
By "transport chain" I mean something like obfs3|websocket. What is nice about this is that the proxy only needs to support the outermost layer. A proxy that supports websocket can connect a client and server that speak obfs3|websocket (or rot13|obfs3|websocket etc.), without needing to know that there is obfs3 underneath. Everything up to the last component in the chain must be identical between client and relay; e.g., an obfs3|websocket client would not be able to talk to a rot13|websocket relay, even if there is a websocket proxy between them.
To give three examples of realistic outermost layers, we have websocket, webrtc, and plain tcp. We can imagine, for instance, a JavaScript proxy capable of speaking both websocket and webrtc. Such a proxy could connect an obfs3|webrtc client with an obfs3|websocket relay. A standalone proxy might be capable of making plain tcp connections; it could connect an obfs3|tcp client (obfs3 with a tcp listener shim) with any old obfs3 relay.
So we need a way for:
clients to say, "these are the transport chains I support, and the address I'm listening on for each one."
proxies to say, "these are the outer transports I support."
the facilitator to know a static list of relays, with their transport chains and the listening address for each.
First I propose to deprecate the existing client= notation and make it synonymous with client-websocket=.
Let's consider the use case of a client that supports websocket, obfs3|websocket, and obfs3|tcp. It may send a registration message like,
Proxies send some information in their GET request. (Currently just a protocol revision number and a list of clients, see here.) So the proxy might send its list of supported transports:
GET /r=1&transport=websocket&transport=webrtc HTTP/1.1
Then the facilitator needs to find a client and a relay where 1) the components in the transport chain are the same up to the last component, and 2) the last component is websocket or webrtc. Let's say it finds an obfs3|websocket relay at 10.10.10.10:9902. Then it can send a reply to the proxy:
Embedding the transport chain in the "name" part of application/x-www-form-urlencoded name-value pairs is kind of an abuse, but it's not too bad. Perhaps there is a better way to do it. I originally used application/x-www-form-urlencoded because it was already implemented in JavaScript. If I were starting over, I would probably use JSON as there are standard functions to parse JSON in browsers and lots of other languages support it.
I think, because of the "outermost transport" thing, that we will have to specify transport chains as parseable strings (for instance using the pipe notation I used above), and not opaque identifiers like "obfs3_flash" or "obfs-in-websocket".
I agree a config file to specify a list of known relays and their transports is a good idea.
I attached another design.txt patch. This one also specifies the new flashproxy poll format.
I understand what you say about the outermost transport thing. I usually think of this as the transport layer (in contrast with obfs3 etc. which I think as obfuscation layer or presentation layer or something like this).
BTW, do you think I should also specify the way that the facilitator should handle flashproxy polls that include transports? As I see it, when the facilitator gets a flashproxy poll that includes transport X, it should:
a) See the client registrations that use transports, and see if any of them have X as their outermost transport. To do this, we will need to modify get_reg_for_proxy() and RegSet (and maybe more stuff).
b) Then it needs to find a registered bridge that supports the transport chain that the client registration asked for. We will need a config file containing bridges and some utility functions to do this.
c) Finally it needs to send the new-style response to flashproxy that contains the client, the relay and the transports they support. >To do this, we will need to modify fac.py:get_reg() or something like that.
I'm not sure I understand the way RegSets work. What's this tier business? Should I make RegSets transport-aware or would you prefer to do this in another way?
Also, is the IPC mechanism of flashproxy documented somewhere (the FROM, PUT etc. commands that are passed around?)
Finally, I'm fine with using the pipe symbol as the transport separator (transport names are C identifiers btw).
(I might also need some tips on testing/debugging the facilitator.)
I attached another design.txt patch. This one also specifies the new flashproxy poll format.
I agree with this patch.
BTW, do you think I should also specify the way that the facilitator should handle flashproxy polls that include transports? As I see it, when the facilitator gets a flashproxy poll that includes transport X, it should:
a) See the client registrations that use transports, and see if any of them have X as their outermost transport. To do this, we will need to modify get_reg_for_proxy() and RegSet (and maybe more stuff).
b) Then it needs to find a registered bridge that supports the transport chain that the client registration asked for. We will need a config file containing bridges and some utility functions to do this.
c) Finally it needs to send the new-style response to flashproxy that contains the client, the relay and the transports they support. >To do this, we will need to modify fac.py:get_reg() or something like that.
I'm not sure I understand the way RegSets work. What's this tier business? Should I make RegSets transport-aware or would you prefer to do this in another way?
I think your understanding is correct. We don't have to specify the internal steps taken by the facilitator, only that it it returns to the proxy a client and relay address compatible with one of the proxy's offered transports.
We say that the client with the fewest proxies is the one that should be served next. A reg is put in the tier equal to the number of proxies it currently has. You can get the next client in O(1) by popping the lowest non-empty tier. Likewise moving between tiers is O(1). It could also be implemented as e.g. a priority queue.
RegSet is just a bag of registrations, that knows how to extract the registration with the highest priority.
Currently we have get_reg_for_proxy--the only thing we use the proxy address for is to decide IPv4 versus IPv6. This is handled by having two instances of RegSet: REGS_IPV4 and REGS_IPV6. So maybe we can have one instance of RegSet for each outermost transport, and get_reg_for_proxy will also get the list of transports the proxy supports.
We can check, at client registration time, whether the client has any known matching relays (matching in the sense that they have a compatible transport chain). Otherwise we just drop the useless registration. That way we can assume that all client regs have a relay match, and are only waiting for a compatible proxy to appear. Suppose a websocket proxy appears, we consult the websocket RegSet and find the highest-priority websocket client, then do a fast lookup to find the relay it should be matched with.
Also, is the IPC mechanism of flashproxy documented somewhere (the FROM, PUT etc. commands that are passed around?)
Sorry, it is hardly documented. The doc comment on parse_transation shows the syntax. My goal was to make the protocol hard to wrongly implement. Quoting of strings is obligatory. A PUT transaction currently looks like:
PUT CLIENT="1.1.1.1:1111" FROM="2.2.2.2:2222"
PUT happens when a client makes an HTTP POST registration request. The FROM part is not used currently; I intended it to be used for rate limiting, or to allow trusted registrants not to be bound by rate limiting. A GET transaction is
GET FROM="3.3.3.3:3333"
Here, FROM is the proxy's address, and it is what gets passed to get_reg_for_proxy.
Responses have the same format. They can look like
OK CLIENT="1.1.1.1:1111" RELAY="4.4.4.4:4444" CHECK-BACK-IN="600"
or
NONE CHECK-BACK-IN="600"
(I might also need some tips on testing/debugging the facilitator.)
doc/facilitator-howto.txt tells how it is set up. You can skip some steps for your own testing, for example you don't need an SSL cert, and you don't have to use Apache, you can use any simple web server capable of serving CGI. You don't need to run facilitator-email-poller nor facilitator-reg-daemon; just use the HTTP rendezvous and flashproxy-reg-http for testing.
Some programs allow you to override the default public facilitator. For example use flashproxy-reg-http -f http://localhost:8000.
Try the -d option to facilitator to log to stdout.
BTW, design.txt -- which I changed -- onyly specifies the HTTP registration format, it doesn't specify the format of the email registration etc. Should I assume that the body of the POST, is the same as the body of the email (this seems to be the case in the code)?
Also, what about the OSS registration? Should we also change this to be transport-aware? Is its format specified somewhere?
For "release early; release often" purposes, I pushed a branch with changes on the client-side applications to support pluggable transports and the new client registrations.
It can be found in https://git.torproject.org/user/asn/flashproxy.git under branch bug9349_client_side. I've only briefly tested it.
For "release early; release often" purposes, I pushed a branch with changes on the client-side applications to support pluggable transports and the new client registrations.
It can be found in https://git.torproject.org/user/asn/flashproxy.git under branch bug9349_client_side. I've only briefly tested it.
Thanks George, you're awesome.
The --help test for --transport should say what the default value is.
Set
transport = DEFAULT_TRANSPORT
in the global options class, not in the code just before option parsing.
+ transport_part = [""] # default to empty string
I think that should rather be an empty list; otherwise I'm pretty sure the helpers get an extra empty argument.
I saw you moved some duplicated code into a module flashproxy_reg_utils. Please see #6810 (closed) for more about reducing code duplication. I'm afraid doing it this way will break make install. Alexandre tried breaking out a module in #6810 (closed), and it didn't quite work. So if you can separate the deduplication part of the patch, I think it's better for the purpose of this ticket.
BTW, design.txt -- which I changed -- onyly specifies the HTTP registration format, it doesn't specify the format of the email registration etc. Should I assume that the body of the POST, is the same as the body of the email (this seems to be the case in the code)?
Also, what about the OSS registration? Should we also change this to be transport-aware? Is its format specified somewhere?
Basically everything uses the format understood by facilitator-reg-daemon. This program listens on a socket and reads a base64-encoded ciphertext (See Handler.handle in facilitator-reg-daemon). Decrypted, the plaintext format appears to be newline-separated name-value pairs (check find_client_addr). I'm not sure why it's using this homebrew format and not www-url-encoded, which would be easier to handle with respect to escaping.
facilitator-reg-daemon exists as a separate process for privilege separation reasons. It's the only program that has to be able to read the facilitator's private key. When the email or appspot helpers get their base64 blob, they just pass it straight to facilitator-reg-daemon. Check url_reg in facilitator.cgi for how appspot is handled and handle_message in facilitator-email-poller for how email is handled.
At https://github.com/arlolra/flashproxy/compare/raw, I modified the standalone flashproxy to terminate the websocket connection and make tcp connections to generic relays not supporting the websocket pt.
This could be useful for testing the proxy transport chain and the modified facilitator from #7945 (closed).
At https://github.com/arlolra/flashproxy/compare/raw, I modified the standalone flashproxy to terminate the websocket connection and make tcp connections to generic relays not supporting the websocket pt.
This could be useful for testing the proxy transport chain and the modified facilitator from #7945 (closed).
Thanks for this. I think what we will want to do is build an abstraction layer for sockets, and then adapt both WebSocket and plain TCP to it.
We're not like to adopt the model where you try to connect to some relay and then fall back to another transport if it fails. Instead, in #9349 (closed) we let the proxies tell the facilitator what transports they support, and the facilitator gies them an appropriate relay.
I think the idea of plain TCP between proxy and client is even more interesting than between proxy and relay.
Also pushed trivial flashproxy modifications in branch bug9349_proxy_side.
+ params.push(["transports", "websocket"]);
I think I prefer transport here, not transports. If there are multiple transports I want the query string to look like
transport=websocket&transport=webrtc
not
transports=websocket,webrtc
so that we don't have to invent our own format for list serialization, and concomitant worries about escaping within URL escaping. To use such a multiple-valued query string is easy, for example you can use Python's FieldStorage.getlist.
Likewise in facilitator transactions, I would like to see multiple TRANSPORT= instead of one TRANSPORTS=. You will have to add a new function in fac.py that is like param_first but it returns the whole list of values.
In the facilitator, let's break backward compatibility and redefine the -r option to be the name of the relay file to load.
Let's use a tuple to represent a transport chain internally--parse it with str.split("|") as soon as it's read, and format it with "|".join only just before output. Then get_outermost_transport(chain) is just chain[-1].
options.relays should be indexed not by complete transport chains, but by transport chains excluding their last element. It should be possible for an obfs3|websocket client to talk to an obfs3|tcp relay, if there is a proxy that speaks both websocket and tcp. To be specific you should key by the tuple ("obfs3",) and not ("obfs3", "websocket").
I don't want to make a distinction between "new-style" and "old-style" registrations. There is just one backward-compatible style. In your loop over fs.keys(), notice a key that is exactly client, and treat it the same as client-websocket.
I'm also hoping you will address the client comments from comment:10.
My plan for merging is to first do the proxy, because that's trivial and doesn't require other changes, then merge the facilitator. We can then run client registrations manually to test obfs-flash.
Also pushed trivial flashproxy modifications in branch bug9349_proxy_side.
{{{
params.push(["transports", "websocket"]);
}}}
I think I prefer transport here, not transports. If there are multiple transports I want the query string to look like
{{{
transport=websocket&transport=webrtc
}}}
not
{{{
transports=websocket,webrtc
}}}
so that we don't have to invent our own format for list serialization, and concomitant worries about escaping within URL escaping. To use such a multiple-valued query string is easy, for example you can use Python's FieldStorage.getlist.
Likewise in facilitator transactions, I would like to see multiple TRANSPORT= instead of one TRANSPORTS=. You will have to add a new function in fac.py that is like param_first but it returns the whole list of values.
{{{
In the facilitator, let's break backward compatibility and redefine the -r option to be the name of the relay file to load.
Done. I did not know how to update the init script though.
Let's use a tuple to represent a transport chain internally--parse it with str.split("|") as soon as it's read, and format it with "|".join only just before output. Then get_outermost_transport(chain) is just chain[-1].
Done.
options.relays should be indexed not by complete transport chains, but by transport chains excluding their last element. It should be possible for an obfs3|websocket client to talk to an obfs3|tcp relay, if there is a proxy that speaks both websocket and tcp. To be specific you should key by the tuple ("obfs3",) and not ("obfs3", "websocket").
Not yet done.
I don't want to make a distinction between "new-style" and "old-style" registrations. There is just one backward-compatible style. In your loop over fs.keys(), notice a key that is exactly client, and treat it the same as client-websocket.
Done.
I'm also hoping you will address the client comments from comment:10.
Not yet done.
My plan for merging is to first do the proxy, because that's trivial and doesn't require other changes, then merge the facilitator. We can then run client registrations manually to test obfs-flash.
I pushed the facilitator changes in bug9349_server_side_draft.
I pushed the flashproxy changes (s/transports/transport) to bug9349_proxy_second_take.
options.relays should be indexed not by complete transport chains, but by transport chains excluding their last element. It should be possible for an obfs3|websocket client to talk to an obfs3|tcp relay, if there is a proxy that speaks both websocket and tcp. To be specific you should key by the tuple ("obfs3",) and not ("obfs3", "websocket").
Hm, would this functionality work currently?
To have an obfs3|websocket client talk to an obfs3|tcp relay, doesn't the flashproxy have to first strip off the websocket frame? Does this happen currently? It was my impression that there is a bridge-side websocket transport that strips off the websocket frames.
For what's it's worth, latest branches at this point are:
Facilitator: bug9349_server_side_draft
flashproxy-client: bug9349_client_side
flashproxy: bug9349_proxy_second_take
(..) We will need to pass multiple bridges [to facilitators] in the future, so we might want to use a config file, where we put our bridges and annotate them with the transports they support. Possible example of such a config file:
This might not scale well, since each bridge will need to distribute this information to every facilitator. Is there any way we can have the facilitator read this information from the bridge instead? From pt-spec.txt:
{{{
363 Bridges use transport lines in their extra-info documents to
364 advertise their pluggable transports:
365
366 transport SP SP address:port [SP arglist] NL
I think the idea of a transport chain needs work. I don't believe what has been said so far is precisely coherent:
the browser proxy communicates to client and server via TCP channels, regardless of the contents. Having "tcp" as a valid transport name like "obfs|tcp" doesn't make sense. "" (empty string) would be the appropriate name for the transport chain that carries raw user data.
I also don't understand this whole "outermost transport" thing, since the browser proxy just passes bytes and doesn't need to speak the outermost transport in order to do this.
However, in order to match client vs server, it needs to match the entire chain. a client that speaks "obfs|websocket" isn't going to be able to talk to a server that speaks "xxx|websocket"
Here is my understanding:
Certain types of PTs are what I'll call a "byte-transform" PT - i.e. at its heart, it transforms input bytes to some output bytes, and the underlying transport mechanism (TCP in this case) is undisturbed. obfsproxy is a "byte-transform" PT, but flashproxy isn't, since it does extra stuff to the underlying channel, so that the output of flashproxy cannot by fed into a "byte-transform" PT.
A transport chain only makes sense if each component in the chain is a byte-transform PT. A browser proxy can pass the data stream transparently, or it can strip off layers in order to adapt between client/servers:
a client of a|b|c can talk to a server a|b|c and the proxy doesn't need to do anything, just pass bytes
a client of a|b|c can talk to a server a|d|e but the proxy needs to apply the transformation e(d(b-inv(c-inv(_)))) to the data stream from the client, and vice-versa from the server. (This is essentially what [comment:12 arlolra] did in his "raw tcp" commit.)
(A more general framework generalises the idea of a "byte-transform" PT - each PT has an input interface, and an output interface, then you can chain PTs by matching input interfaces to output interfaces. But we can stick with byte-to-byte PTs for now.)
OK, I think I get the "outermost transport" thing now - a proxy running in a browser has to make use of something like websocket in other to talk to the client/server in the first place; OTOH a standalone proxy running on e.g. node.js can open raw sockets, like in arlolra's commit. And if I understood correctly, a raw TCP-TCP proxy can proxy anything including a obfs|websocket stream, assuming that it's valid to cut out the middle man in our websocket transport.
So what really matters, is not the "outermost layer", but a "suffix constraint" for each proxy, which must be matched against the full transport chain. In the case of a raw TCP-TCP proxy, this suffix constraint is empty, and therefore matches all transport chains.
So what really matters, is not the "outermost layer", but a "suffix constraint" for each proxy, which must be matched against the full transport chain. In the case of a raw TCP-TCP proxy, this suffix constraint is empty, and therefore matches all transport chains.
Continuing down this path then, instead of matching the "outermost layer", a totally generalised protocol would have each proxy to declare its client-constraints [C1,C2,...] and server-constraints [S1,S2,...] to the facilitator, where each C/S is a string "t|t|..." of transport-chain suffixes, possibly the empty chain "" for a raw data stream. For the currently-implemented proxy, the client/server constraints would be !["websocket"]/!["websocket"], and for arlolra's raw-TCP-capable proxy, they would be !["websocket"]/["websocket",""].
In order to match a client supporting transports [CT1,CT2,...] to a server supporting transports [ST1,ST2,...], the facilitator needs to find a proxy with client suffix-constraints [C1,C2,...] and server suffix-constraints [S1,S2,...] such that CTi == PREFIX + Ca == PREFIX + Sb == STj for some i,j,a,b,PREFIX, where:
i,j,a,b are indexes into the relevant lists for preciseness purposes
PREFIX is the opaque data that the proxy doesn't need to understand
Ca/Sb are the transformations that the proxy understands and can strip off / attach on. For the current default browser proxy, this would just be websocket/websocket.
CTi,STj is the underlying data that needs to be matched between the client / server.
edit: the current proxy design treats client/server the same transport-wise, so we can combine the client/server suffix-constraints into one constraint that's used for both, then Ca==Sb and CTi==STj. This is currently implemented as the "transport" param in the client-facilitator request protocol, but I suggest renaming to "transport_suffix" to be much clearer.
Hopefully my brain dump has been clear enough to include in the documentation, so that users can understand what model we're using. It was useful for me, anyway. :p
options.relays should be indexed not by complete transport chains, but by transport chains excluding their last element. It should be possible for an obfs3|websocket client to talk to an obfs3|tcp relay, if there is a proxy that speaks both websocket and tcp. To be specific you should key by the tuple ("obfs3",) and not ("obfs3", "websocket").
As I understand it, this would require additional changes on the client side to parse the response from the facilitator and initiate connections via the correct engines (when we finally do support more than 2 types of relay e.g. out of websocket/webrtc/plain-tcp). At the moment the proxy only sends "transport=websocket", it doesn't parse the response and assumes websocket-websocket proxying.
options.relays should be indexed not by complete transport chains, but by transport chains excluding their last element. It should be possible for an obfs3|websocket client to talk to an obfs3|tcp relay, if there is a proxy that speaks both websocket and tcp. To be specific you should key by the tuple ("obfs3",) and not ("obfs3", "websocket").
Not yet done.
I've gone ahead and completely re-done this part of the code, including ripping out the RegSet class. Instead we now have a separate Endpoints class to keep track of both client and server endpoints (separately of course).
The added benefit is that we support MOAR THINGS. The obfs3|websocket -> obfs3|webrtc proxying now works in principle, as well as obfs3|websocket -> obfs3 (aka obfs3|raw-tcp) proxying for proxies that can open raw sockets. Additionally, the facilitator now has the ability to ask such raw proxies to work a websocket -> websocket tunnel, which wasn't possible with the previous "outermost-transport"-based design, which would have confined such raw proxies to work raw -> raw tunnels only.
I know this is quite disruptive (282 insertions, 205 deletions) so I've also heavily documented the new class as well as written quite a lot of tests for it. I've also fixed a bunch of things, so that "make test" actually passes on the facilitator. You can see it here:
And if I understood correctly, a raw TCP-TCP proxy can proxy anything including a obfs|websocket stream, assuming that it's valid to cut out the middle man in our websocket transport.
This is not true, for perhaps a subtle reason. You may be thinking of the flashproxy client as being a WebSocket client talking to a WebSocket server--but that is not the case. Both the flashproxy client and the Tor relay are WebSocket servers, and the proxy is a client in both directions. Yes, if a proxy is just tunnelling bytes, a client acting as a WebSocket client could tunnel through the proxy and talk to a WebSocket server, but that's not how the model works. These outermost transports are always client–server in the proxy–client and proxy–relay directions.
The transport on the outside means "the proxy has to be able to establish this kind of connection." A proxy that can open a TCP connection doesn't necessarily have code to establish a WebSocket connection. Not to mention that WebRTC is UDP-based; a TCP proxy can't tunnel just anything. I think it makes sense to leave "tcp" explicit for this reason; it means the proxy has to be capable of making a normal TCP connection. After all, something like obfs3|sctp might make a lot of sense.
What you said would be true if a flash proxy worked like an ordinary proxy, receiving a connection from a client and forwarding it to the server. But a flash proxy uses a connect-back model.
So what really matters, is not the "outermost layer", but a "suffix constraint" for each proxy, which must be matched against the full transport chain.
What you say here is true, but for the sake of simplicity I want to deliberately ignore full generality and insist that the proxy speak only the outermost layer.
options.relays should be indexed not by complete transport chains, but by transport chains excluding their last element. It should be possible for an obfs3|websocket client to talk to an obfs3|tcp relay, if there is a proxy that speaks both websocket and tcp. To be specific you should key by the tuple ("obfs3",) and not ("obfs3", "websocket").
Hm, would this functionality work currently?
To have an obfs3|websocket client talk to an obfs3|tcp relay, doesn't the flashproxy have to first strip off the websocket frame? Does this happen currently? It was my impression that there is a bridge-side websocket transport that strips off the websocket frames.
Yes, the way it works currently is that the proxy strips off the WebSocket container from the client, and adds its own WebSocket container again to the relay. The WebSocket API doesn't even give us visibility into the raw WebSocket stream; all we see are the bytes that are inside it.
Remember, a flash proxy isn't tunnelling a client–server WebSocket connection between client and relay. There isn't even any WebSocket client code in flashproxy-client. Rather, the flash proxy transports client–server Tor TLS between client and relay, and it does this by means of separate, independent WebSocket connections to client and relay.
(..) We will need to pass multiple bridges [to facilitators] in the future, so we might want to use a config file, where we put our bridges and annotate them with the transports they support. Possible example of such a config file:
{{{
websocket 1.2.3.4:5555
webrtc 1.2.3.4:6555
obfs2-websocket 1.2.3.5:5555
}}}
This might not scale well, since each bridge will need to distribute this information to every facilitator. Is there any way we can have the facilitator read this information from the bridge instead? From pt-spec.txt:
{{{
363 Bridges use transport lines in their extra-info documents to
364 advertise their pluggable transports:
365
366 transport SP SP address:port [SP arglist] NL
}}}
Is the facilitator able to read this information?
In the future, perhaps, the facilitator will be able to automatically pull relay descriptors from the consensus, so you don't have to set them up manually. But setting them up manually is what we should do in the short and maybe even long term. The facilitator doesn't need to know about every relay. The only reason to configure lots of relays is to deal with load--currently, one websocket relay is enough to handle the load. It's not like the case with bridges where you need a lot of them to make them hard to enumerate, apart from their capacity.
For what's it's worth, latest branches at this point are:
Facilitator: bug9349_server_side_draft
flashproxy-client: bug9349_client_side
flashproxy: bug9349_proxy_second_take
I merged bug9349_proxy_second_take. As [comment:25 Ximin observes], we'll also need something that walks the query string keys of the facilitator response, to look for client-<transport> keys, somewhere around here.
I'm going to set up a public test facilitator where we can test the server branch.
I'm not going to worry too much about the client side for now; the only thing we need is a modified registration, and that's easy to temporarily hack in when we have a running facilitator. I think it will be easy to merge at that point.
And if I understood correctly, a raw TCP-TCP proxy can proxy anything including a obfs|websocket stream, assuming that it's valid to cut out the middle man in our websocket transport.
This is not true, for perhaps a subtle reason. You may be thinking of the flashproxy client as being a WebSocket client talking to a WebSocket server--but that is not the case. Both the flashproxy client and the Tor relay are WebSocket servers, and the proxy is a client in both directions. Yes, if a proxy is just tunnelling bytes, a client acting as a WebSocket client could tunnel through the proxy and talk to a WebSocket server, but that's not how the model works. These outermost transports are always client–server in the proxy–client and proxy–relay directions.
The transport on the outside means "the proxy has to be able to establish this kind of connection." A proxy that can open a TCP connection doesn't necessarily have code to establish a WebSocket connection. Not to mention that WebRTC is UDP-based; a TCP proxy can't tunnel just anything. I think it makes sense to leave "tcp" explicit for this reason; it means the proxy has to be capable of making a normal TCP connection. After all, something like obfs3|sctp might make a lot of sense.
What you said would be true if a flash proxy worked like an ordinary proxy, receiving a connection from a client and forwarding it to the server. But a flash proxy uses a connect-back model.
Ok, understood.
So what really matters, is not the "outermost layer", but a "suffix constraint" for each proxy, which must be matched against the full transport chain.
What you say here is true, but for the sake of simplicity I want to deliberately ignore full generality and insist that the proxy speak only the outermost layer.
In this case, I think we ought to reconsider the chaining syntax, and probably stop using term "chain" at all. They strongly suggest an abstract model which is not consistent with the model you're proposing - which would be just a prefix/suffix pair model.
The prefix/suffix ought to be opaque strings such that the separation is unambiguous (as opposed to a chain "a|b|c" where it's ambiguous where to split it into two). Then the matching algorithm remains as I described in the above post, but it's much easier to implement since each client/server transport has exactly one possible prefix-suffix separation.
edit: to clarify, I understand that defining the suffix (as you do) to be "the last component" is unambiguous, but the nature of the chain syntax suggests extensions to this scheme (as what I did), but these extensions are not compatible with your model. So this is more of a "understandability for other people" rather than a semantic change.
(on a side note, it would be possible to support the arbitrary-suffix syntax, if we recognised that certain protocols have a client/server directionality, but of course that is too complex for the time being. my point though, is that anything that suggests arbitrary composition / chaining of protocols will need to take this stuff into account, so it's best not to suggest chaining capabilities if we are far from being able to support them.)
Ximin and I had a discussion and came up with an idea for what I think is a better syntax. Instead of embedding information into parameter names, let's just separate multiple registrations by newlines:
{{{
client=1.2.3.4:1000&transport=websocket
client=1.2.3.4:2000&transport=obfs3%7cwebsocket
client=1.2.3.4:3000&transport=obfs3%7ctcp
The [facilitator code currently in master](https://gitweb.torproject.org/flashproxy.git/blob/cc0a4d12a18a458acbaaabe4db5f4f9eb6544d0b:/facilitator/facilitator-reg-daemon#l79) looks for a line starting with "client=", so we were already using a line-oriented format, even if not documented. We weren't documented to use URL query string syntax either, but I think it is a reasonable (and backward-compatible) choice.This is the last "feature" type change I think we should make. (I think it is worth making.) I attach a patch to do it.**Trac**: **Status**: new **to** needs_review
This is my Endpoints implementation/refactoring plus a load of tests. Behavioural changes on top of the refactoring:
don't match ipv6 proxies to ipv4 servers <- this is a break from the old code that seemed to make sense, since we can't assume that IPv4 servers support IPv6. But this also means that a proxy that supports both IPv4/IPv6 can't connect a IPv4 client / IPv6 server - do we want to support this use-case?
I also got rid of the "chain" terminology and tweaked the fac.py transaction interface to suit this.
I'll review and integrate the url-param stuff next. But just from scanning the previous post - is the newline really necessary? I thought URL params allowed for multiple key/values, so we could just do zip(params.getlist("client"), params.getlist("transport")) ?
As I understand, there are 3 input sources to the facilitator:
GET / for proxy polls, going through facilitator.cgi
POST / for cleartext client registrations, going through facilitator.cgi
various rendezvous channels for encrypted client registrations, going through facilitator-reg-daemon
The patch for the newline-based client registration syntax only changes facilitator-reg-daemon - so am I correct in thinking that facilitator.cgi also needs to be updated?
In that case, we should probably put the registration-parsing code in fac.py. I'll be proceeding with this.
Merging #6810 (closed) first would also make this easier to implement on the client side - at the moment, asn's bug9349_client branch has some duplicated code. But we can do that later, since what we're doing with the facilitator/proxy is backwards-compatible with old clients.
And to refresh what was said before, we'll also need to change the proxy to check that the facilitator's response sets transport=websocket for both the client and relay. To match the urlparam syntax, I'd suggest changing it to client-addr=$addr&client-transport=$transport, rather than the current client-transport=addr
Here is my code implementing the url-param syntax stuff. It builds on top of endpoints, since it takes advantage of some of the encapsulated data structures introduced in that branch.
In the interests of a more consistent language for representing an (address,transport) pair, I tweaked dcf's suggested syntax above slightly:
client registration requests now look like "client-addr=_&client-transport=_"
facilitator responses now look like "client-addr=_&client-transport=_&relay-addr=_&relay-transport=_"
the transactional representation in fac.py now looks like "CLIENT addr=_&transport=_" and "RELAY addr=_&transport=_", reusing the qs parse/format code
(I changed the facilitator response, since the reason we did the syntax in the first place was to get rid of dynamic keys in the param list. I added "-addr" so that the transactional representation is sane and constant.)
Old client registrations of the form "client=_" still work, with implied transport=websocket.
At the moment, simply specifying "client-addr=_" will raise an error, but I can have client-transport default to "websocket" if that is preferred.
The addition of "-addr" does cause one slight untidiness - previously, the facilitator gave an empty "client=" value as a response to mean "no registrations available". This sort of fit into the old syntax, but is not really consistent with the new syntax. The old client= behaviour remains; we could change it to "look more" like the new syntax; but actually IMO we should pick an entirely different way to communicate this, since it is an exceptional status for the proxy.
Hmm, actually thinking about it, we can get rid of "-addr" with only minor tweaks to what I coded. The reason I added it, is because it doesn't generalise if we want to represent a contextless (addr,transport) pair - removing the context (the client- prefix) would give us something like "=&transport=". But currently we do always have context, and the advantage with removing it is that we remain backward-compatible with old proxies, without any extra code.
Let me know which option you prefer.
Another tweak we could do is remove the transport prefixes from the facilitator response. It would add a little more complexity, but would follow the "no more information than required" principle more closely.
edit: actually it would save us complexity on the proxy side, so I think I am going to go ahead and just do it.
As I understand, there are 3 input sources to the facilitator:
GET / for proxy polls, going through facilitator.cgi
POST / for cleartext client registrations, going through facilitator.cgi
various rendezvous channels for encrypted client registrations, going through facilitator-reg-daemon
The patch for the newline-based client registration syntax only changes facilitator-reg-daemon - so am I correct in thinking that facilitator.cgi also needs to be updated?
Yes, you're right. All registrations go through facilitator-reg-daemon except for one special case: POST requests directly to facilitator.cgi.
Here is my code implementing the url-param syntax stuff. It builds on top of endpoints, since it takes advantage of some of the encapsulated data structures introduced in that branch.
In the interests of a more consistent language for representing an (address,transport) pair, I tweaked dcf's suggested syntax above slightly:
client registration requests now look like "client-addr=_&client-transport=_"
facilitator responses now look like "client-addr=_&client-transport=_&relay-addr=_&relay-transport=_"
(I changed the facilitator response, since the reason we did the syntax in the first place was to get rid of dynamic keys in the param list. I added "-addr" so that the transactional representation is sane and constant.)
Old client registrations of the form "client=_" still work, with implied transport=websocket.
We should try to be backward compatible in this case. We won't be able to update all proxies instantly. As you say in comment:36, let's have client and relay in place of client-addr and relay-addr. client-transport and relay-transport are good names.
the transactional representation in fac.py now looks like "CLIENT addr=_&transport=_" and "RELAY addr=_&transport=_", reusing the qs parse/format code
I really don't like this :( Here's what I see the facilitator returning to a GET:
OK CLIENT="client-transport=websocket&client=1.1.1.1%3A9000" RELAY="relay-transport=websocket&relay=0.0.1.0%3A1" CHECK-BACK-IN="600"
I want the internal facilitator protocol to be very simple and not have embedded syntaxes.
Is there something wrong with the straightforward syntax?
OK CLIENT="1.1.1.1:9000" CLIENT-TRANSPORT="websocket" RELAY="0.0.1.0:1" RELAY-TRANSPORT="websocket" CHECK-BACK-IN="600"
The addition of "-addr" does cause one slight untidiness - previously, the facilitator gave an empty "client=" value as a response to mean "no registrations available". This sort of fit into the old syntax, but is not really consistent with the new syntax. The old client= behaviour remains; we could change it to "look more" like the new syntax; but actually IMO we should pick an entirely different way to communicate this, since it is an exceptional status for the proxy.
client= is at least an unambiguous and backward-compatible way to indicate that there is no client registration. In the very early days we might have used a 404 to do it, but stopped because that caused a problem with Flash's HTTP retriever or XMLHttpRequest or both. I don't care too much as long as it works. If you can think of a better way, that's fine.
Getting no client happens much more frequently than getting a client.
We should try to be backward compatible in this case. We won't be able to update all proxies instantly. As you say in comment:36, let's have client and relay in place of client-addr and relay-addr. client-transport and relay-transport are good names.
OK, I've pushed this to the branch above.
the transactional representation in fac.py now looks like "CLIENT addr=_&transport=_" and "RELAY addr=_&transport=_", reusing the qs parse/format code
I really don't like this :( Here's what I see the facilitator returning to a GET:
{{{
OK CLIENT="client-transport=websocket&client=1.1.1.1%3A9000" RELAY="relay-transport=websocket&relay=0.0.1.0%3A1" CHECK-BACK-IN="600"
}}}
I want the internal facilitator protocol to be very simple and not have embedded syntaxes.
Is there something wrong with the straightforward syntax?
{{{
OK CLIENT="1.1.1.1:9000" CLIENT-TRANSPORT="websocket" RELAY="0.0.1.0:1" RELAY-TRANSPORT="websocket" CHECK-BACK-IN="600"
}}}
We could do this and I actually even already wrote code that does exactly that. But it made the code shorter to reuse the qs-parsing stuff. Is it important for the transactional protocol to look nice? I think of it more as "black-box representation" rather than "embedded syntax".
The addition of "-addr" does cause one slight untidiness - previously, the facilitator gave an empty "client=" value as a response to mean "no registrations available". This sort of fit into the old syntax, but is not really consistent with the new syntax. The old client= behaviour remains; we could change it to "look more" like the new syntax; but actually IMO we should pick an entirely different way to communicate this, since it is an exceptional status for the proxy.
client= is at least an unambiguous and backward-compatible way to indicate that there is no client registration. In the very early days we might have used a 404 to do it, but stopped because that caused a problem with Flash's HTTP retriever or XMLHttpRequest or both. I don't care too much as long as it works. If you can think of a better way, that's fine.
Getting no client happens much more frequently than getting a client.
OK, I've stuck with this for now because of the "-addr" removal. I might think about it a bit more if I get around to it.
I have a working facilitator/proxy up and running at siteb.
You can test it out by running the obfs-flash client from #7167 (moved). Only, instead of visiting the proxy link mentioned in [comment:17:ticket:7167] with a hard-coded client/relay, do this:
-----BEGIN PUBLIC KEY-----MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0tb2qhhQ8xJ0fOqw9XQBR83zRQBiK76q0Q4zrsk5XE6Vm/+FYr3Cww5WTwSgv/HvY2TWJdU2I4H8eGeCXmIo42NXxwqHJTPTuEnXNgRP/Yob8r8zV5shQGe74nQs8m6p70FK0ic/i5ChesabtgLlMldsD1VtEJjswQEdobbcnXEdPkxns82fakRw31mdSzQKjxReBBm1epC7fNMUhJ27rDAWmWmSiVQoPlzIqlJwbiNzWNeqKepFryZvaVNpU4kEns9JoK0mujhKQOeNUAnwKuy8g7O0s0HZjdB/q7xO8gBzpkha/vSY+BZ8yqa0kqvvcnOZmCY8jivxTv4bZNZIhwIDAQAB-----END PUBLIC KEY-----
c) Run this to send an encrypted registration. (Eventually we'll merge the bug9349_client branch and have obfs-flash set this automatically, and this and the previous steps won't be necessary.)
I tweaked the initial poll time to 20 seconds and made it visit /fac instead to get around a DNS block I experienced earlier.
Wait a few seconds, then (if your ISP/firewall isn't blocking anything) obfs-flash-client should connect, and the browser proxy will say something like Facilitator: got client:{ host: "(your IP)", port: 9000 } relay:{ host: "173.255.221.44", port: 9500 }.
Next I will incorporate the Endpoints simplifications we talked about on IRC.
For "release early; release often" purposes, I pushed a branch with changes on the client-side applications to support pluggable transports and the new client registrations.
It can be found in https://git.torproject.org/user/asn/flashproxy.git under branch bug9349_client_side. I've only briefly tested it.
Thanks George, you're awesome.
The --help test for --transport should say what the default value is.
Set
transport = DEFAULT_TRANSPORT}}}in the global options class, not in the code just before option parsing.{{{+ transport_part = [""] # default to empty string}}}I think that should rather be an empty list; otherwise I'm pretty sure the helpers get an extra empty argument.I saw you moved some duplicated code into a module flashproxy_reg_utils. Please see #6810 for more about reducing code duplication. I'm afraid doing it this way will break `make install`. Alexandre tried breaking out a module in #6810, and it didn't quite work. So if you can separate the deduplication part of the patch, I think it's better for the purpose of this ticket.
I squashed and rebased and made these suggested changes in https://git.torproject.org/user/dcf/flashproxy.git branch bug9349_client_side. Currently these four commits:
{{{
a86d340 Add --transport to flashproxy-client.
9a02028 Add --transport option to reg programs.
3ec8472 Send client-transport=websocket in registrations.
c3029c9 Tolerate other URL parameters in client regisration lines.
This code is using the newest registration syntax,
My hope is to merge these client changes right away, because then the changes to the client are done. c3029c9 is a small change to facilitator-reg-daemon so it doesn't freak out at the `client-transport` field. It's sufficient to make the reg-url step in comment:40 work with no further code changes, it remains backward compatible, and it's forward compatible with what we plan to implement wrt to multiple registrations.That is, I expect c3029c9 to be replaced by other code soon, but I'd like to merge and deploy it now, so that we can merge the client changes and have one less thing to worry about.
Seems good for now, in that it probably won't conflict with my other branches. What is currently deployed in siteb is my merge-all branch on github that also pulls in common-sub (#6810 (closed)) and fac-build (#9668 (closed)), and I can probably merge this client code too if I first revert c3029c9.
I am also using git rerere, and can send over the resolution state files if anyone wants to try it out themselves. However it's not perfect and you still need to resolve the conflict in facilitator-howto.txt by hand, which just involves copying the "security setup" section from doc/facilitator-howto.txt (unfortunately detected as "deleted" from fac-build) into the relevant place in facilitator/doc/facilitator-howto.txt.
Seems good for now, in that it probably won't conflict with my other branches. What is currently deployed in siteb is my merge-all branch on github that also pulls in common-sub (#6810 (closed)) and fac-build (#9668 (closed)), and I can probably merge this client code too if I first revert c3029c9.
Excellent, thanks. Client-side --transport code is now merged into master. I think that leaves us with only facilitator changes to finish and merge.
I've done the Endpoint simplifications and updated the url-param tweaking code. It's up and running at siteb and I've tested it with both flashproxy-client and obfs-flash-client using automatic registration via reg-http.
My merge-next branch merges all of that into the current master - if you decide to merge separately, you can diff the result against that branch to see if there's anything either of us did wrong.
I looked at Ximin's bug9349_server_endpoints branch. The changes were too many for thorough reviewing, but all in all it looks good.
It would feel better with a bit more docs (match() needs a function doc, _findPrefixesForSuffixes too. It's not at all obvious what they do.) Also, an ASCII-art-like doc on how Endpoints and all the other classes fit together would be nice.
Let's not have these changes block the deliverable though, if it's too much effort we can create tickets and fix them afterwards.