There should be a program that is a managed-transport wrapper: it understands pt environment variables, has a SOCKS listener, and forks a command of your choice to transform input to output and vice versa. For example,
If one command is given, the same command is used for encoding and decoding. If two commands are given, one is used for encoding and one is used for decoding. -m gives the method name.
asn and I made progress on this during the weekend as an internal transport for obfsproxy. We have a toy case working for transports that take 1 byte of input to 1 byte of output, but arbitrary transforms on streams will need a better asynchronous architecture that uses Twisted's ProcessProtocol.
and play byte-tennis between the two nc processes.
Still need to handle the case where either child dies unexpectedly, but the basic gist of it works. Theoretically it should work with multiple circuits (spawns new process-pair for each circuit), but I haven't tested that yet.
(I didn't bother doing the "one command = same dec/enc" yet; it's probably not going to be that common anyways.)
To test with your own transformer process-pair, fiddle about with TEST_SPECS. The stdbuf_workaround parameter is thoroughly documented; hopefully it explains clearly when/why you may need to use it.
rot13 is in the bsdgames package on Debian, but I just noticed you have an implementation that uses tr above which I'll copy over at some point.
To solve your TODO(infinity0): close the circuit with an error when your process terminates, you might want to see how obfs2 raises a PluggableTransportError exception when its protocol fails.
We should also look at how to make this configurable so that it can accept arbitrary commands by the user (although some good recipes a la TESP_SPECS are also good). The problem with this is that the current way of passing parameters to pluggable transports is using k=v values in the ClientTransportPlugin and ServerTransportOptions torrc options. Passing shell commands using k=v values is not nice.
Note that this includes the fix for #10342 (closed) cherry-picked into this branch. This may cause a merge conflict. If that is committed first, I can rebase it.
All that is left to do is to (perhaps) think of a better way of defining user transports. Currently I am reading them from ezpt.spec in the working directory (or whatever is defined by EZPT_SPEC), mostly because obfsproxy can't take command-line parameters in managed mode. This could be acceptable, being "good enough", however.
(This implementation also let me detect two errors in the OP - N-ZA-M not N-Za-M, and we need to backlash-escape the quotes around "$_$_")
It was just more convenient to do so - we only had to create one single class__file__ (well and some minor refactoring tweaks) rather than re-implement the whole SOCKS-ExtORPort handling and reactor/circuit setup stuff.
Do you see a major advantage having a separate ezpt proxy? We could certainly do it, but it would take a bit more effort.
It was just more convenient to do so - we only had to create one single class__file__ (well and some minor refactoring tweaks) rather than re-implement the whole SOCKS-ExtORPort handling and reactor/circuit setup stuff.
Do you see a major advantage having a separate ezpt proxy? We could certainly do it, but it would take a bit more effort.
Making it an obfsproxy module seems to be raising some design questions with awkward answers. An ezpt.spec file, for example, is quite different from the design in the ticket description (as I envisioned it, anyway), and so far the patch has a surprising lot of changes to obfsproxy itself. As it is, it's more "here's an easy way to create an obfsproxy module" than "here's an easy way to create a new pluggable transport." I'm not saying that's a bad thing, and it may actually be more useful in the long run.
The need for SOCKS and ExtORPort support is a pretty important consideration. We should try to make those things part of pyptlib. I remember George and I talked about it--it wasn't so easy to do because you can't assume that programs will be using Twisted.
Or maybe we should just recommend building on obfsproxy in general for Python pluggable transports? Use obfsproxy as the basic library, rather than pyptlib? It's what everyone will do anyway, because it's the easiest. It worked for ScrambleSuit; maybe it's a good idea in general?
It was just more convenient to do so - we only had to create one single class__file__ (well and some minor refactoring tweaks) rather than re-implement the whole SOCKS-ExtORPort handling and reactor/circuit setup stuff.
Do you see a major advantage having a separate ezpt proxy? We could certainly do it, but it would take a bit more effort.
Making it an obfsproxy module seems to be raising some design questions with awkward answers. An ezpt.spec file, for example, is quite different from the design in the ticket description (as I envisioned it, anyway), and so far the patch has a surprising lot of changes to obfsproxy itself. As it is, it's more "here's an easy way to create an obfsproxy module" than "here's an easy way to create a new pluggable transport." I'm not saying that's a bad thing, and it may actually be more useful in the long run.
The need for SOCKS and ExtORPort support is a pretty important consideration. We should try to make those things part of pyptlib. I remember George and I talked about it--it wasn't so easy to do because you can't assume that programs will be using Twisted.
Or maybe we should just recommend building on obfsproxy in general for Python pluggable transports? Use obfsproxy as the basic library, rather than pyptlib? It's what everyone will do anyway, because it's the easiest. It worked for ScrambleSuit; maybe it's a good idea in general?
You raise some good points David.
It's true that ezpt requires some awkward changes in the obfsproxy codebase that might not be useful for other pluggable transports. Still, I think that the required changes can be implemented in a way that are harmonious with the rest of the obfsproxy codebase. If we can't implement ezpt in a nice way, I don't want to merge ezpt in obfsproxy either (btw keep in mind that a good part of the changes in Ximin's branch were from #10342 (closed)).
I like the point you raise about recommending obfsproxy as the basic library instead of pyptlib. It's certainly worth considering, and in some sense it's what we are already doing (but we are not suggesting it loud enough). That is, if someone develops a plugin for obfsproxy that screws the whole codebase, we won't merge it in the mainline obfsproxy but we can keep it as a separate transport. This happened in the old obfsproxy with stegotorus. Stegotorus required too many extra functions from obfsproxy, so they ended up forking it.
However, if we end up suggesting obfsproxy as a library that people can use and modify to build their pluggable transport, we can certainly modify obfsproxy to better facilitate this need. That is, make obfsproxy a program that people use to develop a single pluggable transport, instead of the pluggable transport suite that it currently is. Of course, I'm not sure if this task is worth prioritizing right now.
(FWIW, Scramblesuit's code is going to be incorporated into the obfsproxy codebase in the future. It's changes are simple enough and they fit with the general obfsproxy architecture.)
Just re-grouped the commits, same link as above. I basically did a manual rebase on top of the new master, post-#10342, and squashed some back-and-forth redundant diffs.
Everything from "add a transport_name field to BaseTransport" onwards (currently the final 3 commits) relates to how we integrate EzptTransport into the rest of the system and is the most controversial part. That commit itself tweaks some obfsproxy interfaces, which affects other plugin authors too. (However, I believe it's a beneficial change which is minor, and makes future plugins more flexible.)
I'll think a bit more about how to do this better/differently, if we can.
The code looks OK and I don't think it screws up with obfsproxy internals that much.
I think I'd be up for merging this now, and potentially remove it in the future if we make a dedicated ezpt program. David what do you think?
Some notes:
Would be nice if we had unittests for the file parsing methods. They are not trivial. See obfsproxy/test/transports/ for transport-specific unit tests.
Function docs would be helpful. e.g. in checkExit, get_all_transports, parse.
The circuitDestroyed functionality is quite complicated and undocumented. Does this recursion need to happen? Can it be made more cleanly?
We need to document the EZPT_SPEC env variable. Maybe even rename it to something more explicit: OBFSPROXY_EZPT_FILE or something.
I'm not sure if I like ezpt transports clobbering the transport list of obfpsroxy. However, it's true that ezpt transports are pluggable transports too, so they should be there. We should at least disable ezpt_id and maybe think of some smarter recipes for the default ezpt transports.
Also see branch ezpt_review in https://git.torproject.org/user/asn/obfsproxy.git for some additional tweaks and changes.
Also also, can you rebase this branch so that the ezpt.py changes are all in a single squashed commit, and all the other obfsproxy changes are in logical individual commits? Feel free to also squash my initial commits even if it loses authorship information.
An addition I could do is have ezpt take arguments in external mode for you to specify a custom command line. That way, the usage would be more like dcf envisaged in the OP - so you can experiment with things in torrc. But the downside is you will need multiple obfsproxy instances to run multiple ezpt transports like that.
(By contrast, the managed/EZPT_SPEC way will let you run multiple transports using one instance of obfsproxy, but you need to edit that extra file.)
The code looks OK and I don't think it screws up with obfsproxy internals that much.
I think I'd be up for merging this now, and potentially remove it in the future if we make a dedicated ezpt program. David what do you think?
You should merge it if you think it makes sense. You should ask: How do you see it being used as implemented?
I worry a bit that the objective drifted to "implement a stdin–stdout filter at all costs" from "implement an easy way to prototype transports." I feel like this is a transport that would have been good to first prototype as a shell script. Maybe ezpt would be a good application of goptlib? It has the SOCKS and extorport stuff you need. It would probably be only 100 lines on top of each of the example plugins.
I don't think we should encourage the use of commands like tr, nor even mention the possibility in documentation. It's probably not safe to expose random commands to the network.
I don't think ezpt should make itself responsible for the buffering of its subprocesses; i.e., I think the stdbuf stuff should be removed. I understand the reason for it. But this is a well-known problem ("buffering is really going to ruin your day") with a lot of workarounds--we shouldn't codify just one of them without at least an examination of the alternatives. People can always stdbuf their own commands if that works for them. This also ties in with not plugging unsuspecting programs to ezpt.
I also tend to think that there's no reason to try and get multiple transports running in the same process. Process isolation exists for a reason, and it works really well. Personally I would want each separate ezpt to be in a separate process, for better robustness. Running in one process is fine too, but I don't think we should compromise anything else to make it possible.
I worry a bit that the objective drifted to "implement a stdin–stdout filter at all costs" from "implement an easy way to prototype transports." I feel like this is a transport that would have been good to first prototype as a shell script. Maybe ezpt would be a good application of goptlib? It has the SOCKS and extorport stuff you need. It would probably be only 100 lines on top of each of the example plugins.
It may be the case that some future transports will be implemented as stdin-stdout filters - I don't see anything inherently wrong with that. PT authors might even prefer it, if it suits their model, and if the PT interface changes, they don't need to do anything, just wait for ezpt to update.
I don't think we should encourage the use of commands like tr, nor even mention the possibility in documentation. It's probably not safe to expose random commands to the network.
I don't think ezpt should make itself responsible for the buffering of its subprocesses; i.e., I think the stdbuf stuff should be removed. I understand the reason for it. But this is a well-known problem ("buffering is really going to ruin your day") with a lot of workarounds--we shouldn't codify just one of them without at least an examination of the alternatives. People can always stdbuf their own commands if that works for them. This also ties in with not plugging unsuspecting programs to ezpt.
I'm not sure how you can give secure-yet-simple examples for ezpt. We can definitely mention that the examples should not be used in production, though.
stdbuf isn't enabled by default, the user has to ask for it - and all it does is prepend stdbuf -o0 to the command. It's a documented option, but we could remove this option as long as we still document the issue and workarounds very visibly and obviously. (And make sure future maintainers don't remove this as "indirect trivia"). It might be well-known if you have specific background knowledge, but we can't expect PT authors in general to know about it. (It blocked both George and I on the day we started coding, and it took me several more hours on a different day, questions in 2 IRC channels and about 6 people theorising, only some of which was correct, to figure it out properly.)
I also tend to think that there's no reason to try and get multiple transports running in the same process. Process isolation exists for a reason, and it works really well. Personally I would want each separate ezpt to be in a separate process, for better robustness. Running in one process is fine too, but I don't think we should compromise anything else to make it possible.
This is more of an issue with the PT spec - managed mode implies multiple transports running in the same process. But architecturally, I can see that it would be nicer to have ezpt be a separate process. Also, the advertised purpose of obfsproxy is a bit different from what we want to do with ezpt.
So yes, I do sympathise with the intention to have ezpt be a separate PT of its own. I've been meaning to get more into Go, so perhaps I'll try to re-do this with goptlib at some point.
But I also don't want to see this work go to waste, so I will get it into a merge-ready state at least, and we could either merge it but not announce it, or keep it in a separate branch for future PT authors to play with.
But I also don't want to see this work go to waste, so I will get it into a merge-ready state at least, and we could either merge it but not announce it, or keep it in a separate branch for future PT authors to play with.
That sounds fine to me.
Don't let me be too contrarian. If you have a strong vision for how to do something, go ahead and do it :) Anyway, I never wrote any code for obfsproxy so I trust George's opinion more than mine here.
Now, ezpt is hidden from the user until they set both EZPT_ENCODE and EZPT_DECODE (which in torrc, can be done using env(1)), at which point they can access it using "ezpt". This restricts the user to using one transport at a time, and one ezpt transport per torrc file, but this is OK since we are not planning to mass-deploy it in production.
I don't think this is going to go anywhere; experience has shown that the bottleneck in obfs-like protocols is more the proxy address distribution, not so much the specific channel obfuscation. Implementing an obfs4proxy module seems to be simple enough.