Create an XMPP pluggable transport

added component::circumvention/pluggable transport owner::feynman priority::medium severity::normal status::accepted type::task labels

feynman from IRC is looking into this atm. He wrote http://sourceforge.net/projects/aeftimiamisc/files/hexchat/ in the past, which is a XMPP tunnel.

Unfortunately, it seems that in its current form it's quite slow and unusable for web browsing. It also seems that some XMPP servers are throttling streams of messages of large size (4k byte chunks).

More research is needed.

News from irc is that the throttling / slowness was due to a bug in his threading code, and actually it's remarkably fast now.

feynman posted his updated code to github: https://github.com/aeftimia/hexchat

It seems that the topology of an XMPP transport would be:


                        teh censor
     +-------------+       \\\       +-------------+         +----------+
     |  hexchat    |       \\\       |             |         | hexchat  |
     |  client     |<------\\\------>| XMPP server |<------->| XMPP bot |
     |(XMPP client)|       \\\       |             |         |          |
     +-------------+       \\\       +-------------+         +----------+
           ^               \\\                                    ^
           |               \\\                                    |
           |               \\\                                    |
           |               \\\                                    |
           v               \\\                                    v
     +------------+        \\\                             +------------+
     |            |        \\\                             |            |
     | Tor client |        \\\                             | Tor bridge |
     |            |        \\\                             |            |
     +------------+        \\\                             +------------+
                           \\\

Also, the simplest and easiest deployment of hexchat would probably resemble the current deployment of flashproxy. That is, the client-side would expose a SOCKS-server but in reality it would ignore the SOCKS handshake. It would connect to an XMPP server, and speak with a specific XMPP bot (that would run the server-side of hexchat). The XMPP bot would extract the Tor data out of the XMPP traffic, and pass them to a specific-hardcoded bridge.

The above system is easier to deploy on the client-side, since the client doesn't need to specify an XMPP server, the XMPP bot username, or the bridge address. This is similar to how flashproxy works currently. In the future, we can think of how the client can specify specific parameters for his hexchat session (like a specific XMPP bot username, or a specific bridge).

Also, it's worth noting that in the hexchat system, the IP of the client is exposed to the XMPP server. The server-side hexchat XMPP bot should not be able to get the IP of the client, since it's always speaking to the client through the server.

(BTW, obviously the name hexchat might change if feynman wants to change it.)

This pluggable transport will (a) give hostile firewall vendors an incentive to block all XMPP-like traffic, and (b) give XMPP server operators an incentive to deploy censorship software to detect and block hexchat.

Trac:
Username: feynman
Cc: N/A to alexeftimiades@gmail.com

Trac:
Username: feynman
Owner: asn to feynman
Status: new to accepted

Replying to asn:

I should make a couple of notes here. First of all, the client really controls every aspect of initializing the connection. The xmpp bot on the server side just logs into an xmpp server and listens for traffic. It does not even bind to a port.

On the client side, the xmpp bot binds to one or more ports and listens for traffic. It associates each of these ports with: *An xmpp username to forward traffic to *An ip:port that the xmpp bot on the server side should try to connect to

When it gets a connection, it sends a message "connect me!" to the xmpp bot on the server through the chatline. It puts the ip:port the server should try to connect to in the xml subject tag and the ip:port of its newly spawned connect socket in the xml nick tag (used for nicknames). This way, the xmpp bot on the server side has a way to send data to that connected socket when replying. The client xmpp bot also creates an entry in its routing table that associates the following tuple: (ip:port of connected socket, server's xmpp username, server ip:port to connect to) with the newly connected socket.

When the xmpp bot on the server side gets a connection request, it creates a new socket and tries to connect it to the ip:port specified in the subject tag. If successful, it adds an entry to its routing table that associates the following tuple: (ip:port that the server just connected to, client's xmpp username, client's connected socket's ip:port) with the newly connected socket.

Now, when either socket receives data, they are prompted to send the data over the chat server using the socket's key in the routing table to construct the appropriate nick and subject xml tags. When a message is received over an xmpp server, the routing table key is constructed from the username of the computer that sent it, along with the nick and subject xml tags. The data is then forwarded to the appropriate socket.

An analogous process takes place for disconnections, starting with a closing socket sending a "disconnect me!" message to the xmpp bot on the other side of the chat server.

Trac:
Username: feynman

Replying to rransom:

This pluggable transport will (a) give hostile firewall vendors an incentive to block all XMPP-like traffic, and (b) give XMPP server operators an incentive to deploy censorship software to detect and block hexchat.

Yes, these are valid concerns. Another issue with this PT, is that the XMPP server (e.g. google) learns the IP address of our users.

Unfortunately, I don't know how to evaluate the severity of these concerns, and whether it's a good idea to deploy such a transport to users.

Replying to asn:

Replying to rransom:

This pluggable transport will (a) give hostile firewall vendors an incentive to block all XMPP-like traffic, and (b) give XMPP server operators an incentive to deploy censorship software to detect and block hexchat.

Yes, these are valid concerns. Another issue with this PT, is that the XMPP server (e.g. google) learns the IP address of our users.

Unfortunately, I don't know how to evaluate the severity of these concerns, and whether it's a good idea to deploy such a transport to users.

I don't see a problem here. Predicting the future is hard -- maybe some XMPP servers will choose to censor, and maybe some won't. Probably some firewall vendors will offer a 'filter xmpp button' (probably some of them do already). But whether firewall operators choose to press the button remains a complex tradeoff.

Wrt the 'xmpp servers censoring their content' question: that means the hexchat design should consider whether it can achieve its goals with fewer/no regexpable "headers" in its chat setup.

Replying to feynman:

Replying to asn:

I should make a couple of notes here. First of all, the client really controls every aspect of initializing the connection. The xmpp bot on the server side just logs into an xmpp server and listens for traffic. It does not even bind to a port.

On the client side, the xmpp bot binds to one or more ports and listens for traffic. It associates each of these ports with: *An xmpp username to forward traffic to *An ip:port that the xmpp bot on the server side should try to connect to

When it gets a connection, it sends a message "connect me!" to the xmpp bot on the server through the chatline. It puts the ip:port the server should try to connect to in the xml subject tag and the ip:port of its newly spawned connect socket in the xml nick tag (used for nicknames). This way, the xmpp bot on the server side has a way to send data to that connected socket when replying. The client xmpp bot also creates an entry in its routing table that associates the following tuple: (ip:port of connected socket, server's xmpp username, server ip:port to connect to) with the newly connected socket.

When the xmpp bot on the server side gets a connection request, it creates a new socket and tries to connect it to the ip:port specified in the subject tag. If successful, it adds an entry to its routing table that associates the following tuple: (ip:port that the server just connected to, client's xmpp username, client's connected socket's ip:port) with the newly connected socket.

Now, when either socket receives data, they are prompted to send the data over the chat server using the socket's key in the routing table to construct the appropriate nick and subject xml tags. When a message is received over an xmpp server, the routing table key is constructed from the username of the computer that sent it, along with the nick and subject xml tags. The data is then forwarded to the appropriate socket.

An analogous process takes place for disconnections, starting with a closing socket sending a "disconnect me!" message to the xmpp bot on the other side of the chat server.

Hey there! I have a quick code review and comments. If you are bored fixing my comments, just say so, and I will do it when I get some free time.

All your git commit messages say 'master'. Instead commit messages are supposed to be a summary of what the commit does. Check out https://gitweb.torproject.org/tor.git for example.
Your comments are too verbose some times (https://github.com/aeftimia/hexchat/blob/master/fast/hexchat.py#L53 https://github.com/aeftimia/hexchat/blob/master/fast/hexchat.py#L110 https://github.com/aeftimia/hexchat/blob/master/fast/hexchat.py#L25), which makes the code hard to read, and some other times insufficient (https://github.com/aeftimia/hexchat/blob/master/fast/hexchat.py#L21 how do the keys/values of this dict look like?) (https://github.com/aeftimia/hexchat/blob/master/fast/hexchat.py#L123 bottleneck?) If you want an example of a robustly commented Python codebase check out https://gitweb.torproject.org/stem.git or Twisted or something.
Also, check out http://www.python.org/dev/peps/pep-0008/ for Guido's ideas on Python comments.
Which XMPP plugins do we really need? For example, do we need Multi-User Chat?
Instead of using print(), use the Python logging module for your logs.
Stuff like "send '_' for empty data" must be mentioned in the spec.
Maybe add a more paranoid "(dis)connect me!" string so that it's even more unlikely to be encountered in a normal XMPP concersation? Add some numbers, and symbols, and stuff.
You are using rfind() but not checking the retval. You are also using functions like find() without checking for exceptions. Don't assume that the data you receive are correctly formatted. Your program might crash with an exception.
Try not to do catch-all excepts: http://ischenko.blogspot.gr/2005/01/exception-based-code-antipatterns.html
I still think sockbot is a weird name -- and you also needed two lines of comments to explain it. Why don't you name the class Hexchat or something?
What's the deal with the lambda in the "disconnected" event handler function pointer? Or the lambda: False? Am I missing something?
Maybe split the codebase to more files? One for the client and another for the server? This way you won't need to have variables like client_socks and server_socks that are only used in one mode.
Using sock as an abbreviation for socket continues to be confusing in names like server_socks. Maybe expand sock to socket?

All in all, code looks good, the new comments are helpful, and I think I kind of understand how it works.

Here is a list more things that must be done till the transport is deployable:

Write a SOCKS-server for the client-side. We should look at how flashproxy does it.
We need to write a spec on how the transport works. We also need to write a threat model. See https://gitweb.torproject.org/pluggable-transports/obfsproxy.git/blob/HEAD:/doc/obfs3/obfs3-protocol-spec.txt and https://gitweb.torproject.org/pluggable-transports/obfsproxy.git/blob/HEAD:/doc/obfs2/obfs2-threat-model.txt .
We need pyptlib support. I see you started implementing it, but don't worry about it. I can do it tomorrow or the day after.
SSL support.
We need to move the stuff from the config file to hardcoded parameters and command line switches. That's how we currently deploy pluggable transports. Check out how flashproxy is currently deployed (open up a pluggable transport bundle, and check the torrc).

Replying to asn: I updated the files with most of the changes you suggested. Here are the things I did not change:

More paranoid "(dis)connect" message: I left this the way it was so it would be easy to spot in a debug file. I do not think people should run hexchat using their usual XMPP chat server accounts anyway, so making messages distinguishable from normal chats should be unnecessary.
Split the code base to more files: Clients and servers are objects of the same class because the program does not really distinguish them the way TCP does. In fact, any client can also act as a server.
Write a SOCKS-server for the client side: I will do this if I must, but it seems like a hack around an unnecessary limitation tor places on pluggable transports. I would personally prefer that tor be configured to have hexchat listen on a local address and tor configured to use that local address as a bridge. Then hexchat would forward the connection to the actual bridge. This would leave hexchat in the most versatile form. Then tor, or any other program could still use it as something other than a SOCKS proxy.

-pyptlib support: You said you could/would take care of this. If you do not have time tomorrow or want me to take care of this, let me know. Otherwise, I will leave it to you to finish this off with pyptlib.

-SSL support: If you were referring to SSL support with the chat server, it already supports that (sleekxmpp does this transparently).

Trac:
Username: feynman

Replying to feynman:

Replying to asn: I updated the files with most of the changes you suggested. Here are the things I did not change:

More paranoid "(dis)connect" message: I left this the way it was so it would be easy to spot in a debug file. I do not think people should run hexchat using their usual XMPP chat server accounts anyway, so making messages distinguishable from normal chats should be unnecessary.

Split the code base to more files: Clients and servers are objects of the same class because the program does not really distinguish them the way TCP does. In fact, any client can also act as a server.

Write a SOCKS-server for the client side: I will do this if I must, but it seems like a hack around an unnecessary limitation tor places on pluggable transports. I would personally prefer that tor be configured to have hexchat listen on a local address and tor configured to use that local address as a bridge. Then hexchat would forward the connection to the actual bridge. This would leave hexchat in the most versatile form. Then tor, or any other program could still use it as something other than a SOCKS proxy.

-pyptlib support: You said you could/would take care of this. If you do not have time tomorrow or want me to take care of this, let me know. Otherwise, I will leave it to you to finish this off with pyptlib.

-SSL support: If you were referring to SSL support with the chat server, it already supports that (sleekxmpp does this transparently).

Sounds good. Thanks for the fixes. I will also do some code cleaning of my own when I get the time.

BTW, with regards to the SOCKS-server thing, have you tried using hexchat with tor? If you can manage to make hexchat work with Bridge lines and ClientTransportPlugin lines, then I guess we don't need to do the SOCKS-server thing. You might be able to do it with something like this:

Bridge 127.0.0.1:5555 # actual hexchat address
Bridge hexchat 0.0.0.1:1233 # dummy bridge line just to spawn up 'hexchat' transport
ClientTransportPlugin hexchat exec /usr/bin/hexchat --blabla --managed # this line will force tor to spawn hexchat

Although this is a hack, so I can't promise it's going to work. Check out how ClientTransportPlugin and the managed proxy interface works: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/180-pluggable-transport.txt

I fixed a bug in how the program handles "connection refused" errors on the server side. Also, it appears that Linux handles threads differently than OSX and I needed to throw in an infinite loop to keep the program running. As of now, I have NOT gotten it working with tor, but I hope to do so over the next 24 hours.

Trac:
Username: feynman

Trac:
Cc: alexeftimiades@gmail.com to alexeftimiades@gmail.com, dcf@torproject.org

I want to thank everyone on the IRC that helped me test this program.

At this point I was able to connect and use a bridge through hexchat after making some minor modifications to the code. It now acts completely (or so I hope) transparently as a means of forwarding data from one computer over a chatline to another computer.

This allows you to tell tor to use your local computer as a bridge and have hexchat waiting to forward data byte for byte to another computer (which would be running its own instance of hexchat).

There is a lot of room for flexibility here. For example, the computer with an uncensored internet connection could be behind a NAT and does not even have to be running tor. As long as the computer can: a) Connect to and use an XMPP chat server b) Connect to the requested bridge (or run a bridge itself) , the computer is a viable relay for hexchat.

A further consideration is the distribution of JIDs (xmpp usernames of the form username@chatserver) of people running hexchat. Remember, you do not have to know the IP address of the bridge you are connecting to if the bridge itself is running hexchat (in which case you would tell your client hexchat to connect to a 127.0.0.1 address on the remote computer).

Finally, I want to note that at this point, running hexchat would probably be a security risk. Someone could connect to a computer running hexchat, then connect from there to any IP, local or remote, and send arbitrary data from that computer. The good news is that this is quite easy to fix. I can throw in another command line arguement that gives the computer a list of ip:ports it is authorized to connect to.

All in all, the program is near complete. It just needs some means to limit the ip:ports it can connect to, and a pyptlib interface.

Trac:
Username: feynman

Hm, https://github.com/aeftimia/hexchat/commit/bff1134bc9d17e8e0532bcc99d3a77b975ba1946 is a bit weird. It seems like your non-blocking connect() never succeeded (which makes sense, since you never connect to a remote host instantly) and you turned it into a blocking connect().

Problem with a blocking connect() is that hexchat will block till it connects. Imagine this on the server-side, where the hexchat bot gets 100 connect me! messages a second, and it blocks for every connect.

You will probably need to introduce some kind of asynchronous networking there. You want to do a non-blocking connect() and run add_client_socket() only when it's completed. Are you familiar with any asynchronous Python networking libraries (like asyncore or twisted or something)?

Since you prefer to not do it the SOCKS way, and instead use the address of hexchat as Bridge, we might not even need the managed-proxy interface and pyptlib.

Specifically, if hexchat is an application with the following CLI: hexchat-client <listenaddr> <xmpp_server> <jid/password> and hexchat-server <pushaddr> <xmpp_server> <jid/password> we might be able to deploy this without the managed-proxy interface.

On the client-side, we do the dummy Bridge/ClientTransportPlugin trick. On the server-side, we just fire up hexchat-server and point it to the ORPort of our bridge without even informing Tor about it.

In the future, if we want the managed-proxy interface, we can add pyptlib support.

(Also, can you clean up your repo so that the correct hexchat.py is obvious to the casual observer? Maybe you can put the secure version in a misc/ repository (or even better in a different git branch). Also, I guess we can remove the pluggable-transports directory till we implement correct pyptlib support.)

Also, check my branch docs_and_refactoring_2 for some more code cleanups.

Some more code comments:

Why do you "resend the message" on error in get_message(). Is that what you are suppposed to do in XMPP?
b64decode can throw an exception (triggered remotely by sending a wrongly formatted base64 chunk). We should catch that exception, and also check for other uncaught exceptions.

As of about 12 hours ago, I made an unfortunate discovery. Gtalk was not transmitting my messages most of the time--especially while watching youtube videos. Instead, it was bouncing the message with an error code. Hexchat thought the error message was the response and worked with it as though it came from the other party. I am very sorry for the confusion and I am quite disappointing at this point.

It seems that Gtalk will only deliver so many messages in a given period of time. I tried other chat servers, and they are much slower. I can NOT watch youtube videos with hexchat.

Though I seem to be able to access facebook.

I will post any updates as they come.

Trac:
Username: feynman

Replying to feynman:

It seems that Gtalk will only deliver so many messages in a given period of time. I tried other chat servers, and they are much slower. I can NOT watch youtube videos with hexchat.

Is there a reason you're passing all traffic through <message> stanzas? Many servers will throttle those to avoid spam, <iq> stanzas are a lot more likely to work well. You could look at XEP-0047: In-Band Bytestreams for how this can be done.

In fact, you might be able to use that specification for the actual content data (leaving the signaling to <message>s or other <iq>s), that would hide the data making it look like ordinary file transfers.

Trac:
Username: xnyhps

Replying to xnyhps:

Replying to feynman:

It seems that Gtalk will only deliver so many messages in a given period of time. I tried other chat servers, and they are much slower. I can NOT watch youtube videos with hexchat.

Is there a reason you're passing all traffic through <message> stanzas? Many servers will throttle those to avoid spam, <iq> stanzas are a lot more likely to work well. You could look at XEP-0047: In-Band Bytestreams for how this can be done.

In fact, you might be able to use that specification for the actual content data (leaving the signaling to <message>s or other <iq>s), that would hide the data making it look like ordinary file transfers.

This sounds like a good idea. I looked into <iq>s, but it seems they do not come with enough text fields (although I could be mistaken). I need four text fields to send a message:

One for the client ip:port
One for the server ip:port
One for the actual data
The JID of the computer that sent the message

Unless all four fields can be stuck somewhere in an iq message, this route will not work. Maybe with some hacks I could be wrong, but at first glance, this looks like a dead end.

Trac:
Username: feynman

Replying to feynman:

This sounds like a good idea. I looked into <iq>s, but it seems they do not come with enough text fields (although I could be mistaken). I need four text fields to send a message:

One for the client ip:port

One for the server ip:port

One for the actual data

The JID of the computer that sent the message

Unless all four fields can be stuck somewhere in an iq message, this route will not work. Maybe with some hacks I could be wrong, but at first glance, this looks like a dead end.

<iq>s can carry arbitrary XML, which servers will route to the client you're addressing. It doesn't need to follow an already defined protocol or extension.

You just have to keep the following in mind:

They must contain a single child element (which might contain further children), which should be in some custom XML namespace.
Everything must be valid UTF8.
There's a size limit in stanzas.

So you could define your own protocol where you send an <iq> like:

<iq type="set" to="pluggabletransport@jabber.org/Hex" id="1234">
    <initiate xmlns="https://www.torproject.org/transport/xmpp">
        <host>www.google.com</host>
        <port>443</port>
    </initiate>
</iq>

and the transport replies:

<iq type="result" from="pluggabletransport@jabber.org/Hex" id="1234">
   <success sid="abcd567" xmlns="https://www.torproject.org/transport/xmpp" /> 
</iq>

which the client uses to open an IBB connection:

<iq id="1235" to="pluggabletransport@jabber.org/Hex" type="set">
    <open xmlns="http://jabber.org/protocol/ibb" block-size="4096" sid="abcd567" stanza="iq" />
</iq>

I haven't read the code for all the details of the information you need to exchange, but in principle you can stick whatever you want in those <iq>s. :)

Trac:
Username: xnyhps

I think I got an iq method worked out. I just need to figure out how to register the protocol so gtalk will not return a "feature-not-implemented" error.

The code will need cleaning up, but all in all, this will be an improvement on the old method, if not for speed then for more robust code.

Trac:
Username: feynman

I assume you mean the other contact is returning "feature-not-implemented"?

If you use a custom iq-class in Sleek:

class Initiate(ElementBase):
    name = 'initiate'
    namespace = 'https://www.torproject.org/transport/xmpp'
    plugin_attrib = 'tor_initiate'
    interfaces = set(('host', 'port'))
    sub_interfaces = interfaces

And call:

register_stanza_plugin(Iq, Initiate)

Then you can use:

self.register_handler(Callback('Tor XMPP Transport Handler', StanzaPath('iq@type=set/tor_initiate'), self.handle_transport))

To register the self.handle_transport callback to be called every time a message matching the class comes in.

If you use the iq-stanza format I proposed, then you can access the fields with stanza['tor_initiate']['host'] and stanza['tor_initiate']['port'].

Trac:
Username: xnyhps

Replying to xnyhps:

I assume you mean the other contact is returning "feature-not-implemented"?

If you use a custom iq-class in Sleek:

{{{ class Initiate(ElementBase): name = 'initiate' namespace = 'https://www.torproject.org/transport/xmpp' plugin_attrib = 'tor_initiate' interfaces = set(('host', 'port')) sub_interfaces = interfaces }}}

And call:

{{{ register_stanza_plugin(Iq, Initiate) }}}

Then you can use:

{{{ self.register_handler(Callback('Tor XMPP Transport Handler', StanzaPath('iq@type=set/tor_initiate'), self.handle_transport)) }}}

To register the self.handle_transport callback to be called every time a message matching the class comes in.

If you use the iq-stanza format I proposed, then you can access the fields with stanza['tor_initiate']['host'] and stanza['tor_initiate']['port'].

Unfortunately, I am beginning to think that the chat server is sending the error message. I consistently get the same error messages whether the other hexchat bot is logged in as the recipient or not. It would appear as though the server does not like custom IQs.

If you have the time, could you confirm that you are able to send custom Iqs with sleekxmpp? If you are not willing or able, that is fine, but an example of working code would really be a help here.

Trac:
Username: feynman

I experimented a bit with your code last night to see if my idea could work and committed that here: https://github.com/xnyhps/hexchat/commit/07cb3a192c7d24fa19b1eec33741c39d948562bd. Setting up the connection works with it, but handling closed sockets/streams properly is still unfinished.

I changed a couple of things, I wasn't sure why the "local address" is communicated to the host and I left it out. It's up to you if you want to use my changes, or just look at it for inspiration. :)

Trac:
Username: xnyhps

Oh, almost forgot. About the "feature-not-implemented", are you addressing the <iq>s to the full JID of the contact? So pluggabletransport@jabber.org/Hexchat, not just pluggabletransport@jabber.org. <iq>s don't get forwarded the same way as messages are.

Trac:
Username: xnyhps

I got the protocol working with IQs and uploaded the code here: https://gitweb.torproject.org/user/asn/hexchat.git

Some comments:

The protocol is quite different now and I need to update the protocol-spec in the "doc" directory to reflect this.
I used custom IQ stanzas rather than a stream (which is after all just a bunch of custom IQ stanzas).
The code is poorly commented at this point. I need to fix that, but for now, I thought it was important to keep everyone updated on progress.
I still cannot watch youtube videos, and the software has a tendency to randomly start refusing connections. However, when it is working, it is reasonably fast.
Sometimes messages are still dropped. I tried buffers, delivery confirmation messages and locks to try to fix this. None of those techniques worked. Please let me know if you can think of any new ways of ensuring messages are delivered quickly and in the right order.
I encourage others to test the code themselves and let me know whether you can think of any ways of improving it.

Trac:
Username: feynman

I can watch about 10 seconds of a youtube video before something gets messed up. I want to note that the video does not seem to stop loading due to lack of speed. Rather, hexchat bots are sending disconnect requests then receiving data. The data is of course dropped since the socket already disconnected. I unlike connecting, I cannot wait for a confirmation of disconnect because by the time a computer has sent a disconnect request through a chat server, the socket it would write to has already closed. This is an inherent flaw in trying to forward data from a connection oriented socket. The good news is proxies manage to do it all the time--so it is possible (perhaps when the connection through the proxy--in this case a chat server--is fast enough).

Anyway, I am not sure how, or even whether it is possible to fix this problem with the disconnecting process, but I am doing everything I can to get youtube to work here.

Trac:
Username: feynman

I am definitely getting closer. I found that gtalk drops IQs when you send too many to a given person (or possibly group of people) too quickly. I added code that saves data received from a socket into a buffer and sends the data out in large chunks every second. This gave me much better results, but google still seems to start dropping IQs somewhere around 1.5 minutes into a video at 240p. The situation only gets worse with higher quality videos. This might be because the bandwidth of gtalk for xmpp messages is inherently slower than the rate at which youtube sends data when streaming videos higher than 240p.

I tried using zlib to compress the data before base 64 encoding it and sending it over the chatline to see if my messages were too long, but this did not seem to help.

More testing is necessary.

Trac:
Username: feynman

I introduced caching and garbage collection into the protocol. Now hexchat will throttle, cache, and empty caches when too much data is stored. This is still not enough to consistently watch youtube videos, but it makes the whole system more consistent in its performance and it does a much better job of delivering data--at least when Google is not dropping too many IQ packets.

I am trying to think of other ways of dealing with lots of dropped packets. I have delivery confirmation in place, and I might implement a timer that disconnects a socket when it goes a certain amount of time without receiving confirmation that its packets are being delivered.

Trac:
Username: feynman

I added code that measures the amount of time lapsed between sending data and receiving an acknowledgement.

Unfortunately, it seems even with all the error handling functionality I put in, I still cannot stream youtube videos--even at low quality--at least not without a lot of buffering time. This is from a computer with about 350kb/s (max) internet access. When I run the tests from a computer with much faster internet access (1Mb/s max), I can stream low quality youtube videos. Unfortunately, I doubt the most potential users of this software will have access to 1Mb/s internet access.

Furthermore, gmail chat seems to be the only server (of the three I tested) that provides a fast enough service to stream youtube videos.

Even with gmail chat, I occasionally receive a "too many bytes sent per hour" error which kicks me off my account for a while (I am guessing an hour, but I have not measured). I am already compression my data with zlib at its highest compression rate before base 64 encoding the data and sending it over the chat server.

I am beginning to doubt that this will be a scalable and practical means of connecting to tor bridges. I will keep the ticket open for a while and certainly update it if I make any breakthroughs--though for now, I am out of ideas.

As a final note, I would like to mention that my protocol currently has enough error handling that it might be a suitable starting point for tunneling TCP over UDP. If that would be useful to Tor, please let me know and/or make a new ticket.

Trac:
Username: feynman

I added a new feature in which you can use multiple accounts to send and receive data. In doing so, I discovered that when using gmail, you can only send and receive data to other users on your contact list. I tried to work around this by setting up a multiuser chat (MUC), though it did not seem to work.

Also, webpages seem to load less reliably when using more than one account. I have no idea why.

Trac:
Username: feynman

Replying to feynman:

Also, webpages seem to load less reliably when using more than one account. I have no idea why.

Are messages/packets being reordered?

Replying to rransom:

Replying to feynman:

Also, webpages seem to load less reliably when using more than one account. I have no idea why.

Are messages/packets being reordered?

Even if they were, I have a system in place that should take that into account. I have each message marked with a sequential identifier, and each computer acknowledges messages based on the identifier they receive. They also keep track of the identifier of the last message they receive so they know what parts of the message (if any) contain redundant information (the message may be partially or entirely composed of caches, but there is a stanza that indicates how many bytes each cache takes up).

In theory, this should all compensate for data coming out of order.

Trac:
Username: feynman

Hey feynman,

thanks for all the new features, and sorry for being less active on this lately.

BTW, due to the encryption of TLS, I'm not sure how helpful the caching is, since all TLS records should look unique on the wire. For the same reason, zlib might not find much stuff to compress in your TLS traffic.

Also, could you document your TCP-like functionality in the spec? That is, how you calculate sequence identifiers and do ACKs, etc.

Also, where did the file transfer idea go? Does inbound file transfer (the one where files go through the server) work in Google's XMPP servers?

Also, check out this weird proposal that just hit the XMPP standards mailing list: http://mail.jabber.org/pipermail/standards/2013-June/027690.html

It's probably not relevant to the transport, but might give you some nice ideas.

Replying to asn:

Hey feynman,

thanks for all the new features, and sorry for being less active on this lately.

BTW, due to the encryption of TLS, I'm not sure how helpful the caching is, since all TLS records should look unique on the wire. For the same reason, zlib might not find much stuff to compress in your TLS traffic.

TLS encryption should be completely independent of caching. It is not caching the TLS packet, but the data it sends before it gets encrypted with TLS. The same goes for the zlib compression stuff.

Also, could you document your TCP-like functionality in the spec? That is, how you calculate sequence identifiers and do ACKs, etc.

I will document all this functionality ASAP (probably over the next couple of days). For now, let me give you a run down of what happens:

There is data to be read from the socket. a. Data is read from a socket and added to a buffer, which is periodically checked. b. When data is found in the buffer or the cache, the buffered data is added to the cached data, the length of the buffered data (if greater than zero) is appended to a separate list of cache lengths, and the current time is appended to a list of timestamps. c. All cached data is compressed, base 64 encoded, and put in a "data" stanza d. All the lengths of each cache is comma separated in a "chunks" stanza e. Local and remote ips and ports are set in their respective stanzas f. A comma separated list of all the accounts that the computer controls and are connected to the chat server are set in an "aliases" stanza. g. The socket's id variable is incremented by one (mod sys.maxsize). h. The iq message's id is set to the socket's id variable. i. The above stanzas are appended to the iq message in a 'packet' stanza j. The recipient of the message is selected from a list of potential addresses given during the connection phase (not mentioned here). k. The sender of the message is selected from a list of accounts connected to the chat server. l. The message is sent over the chat server
A message containing data is received. a. The computer computes "id_diff"="id in the message" - "last id received with the same local and remote ip and ports and set of aliases" b. If id_diff<=0 and id_diff>=-"peer's sys.maxsize"/2 (the latter quantity is established during the connection phase) then the message is declared redundant and a confirmation is sent regarding the id containing the most recent data (i.e. not the id of the message that was just received). c. If the message is not completely redundant, mod id_diff with "peer's sys.maxsize" to get the number of new chunks of data. d. Compute the number of bytes of data to ignore from the number of new chunks of data computed in (c) and the list of chunk sizes in the "chunks" stanza. e. Unzip the data, discarding the number of bytes computed in (d). f. Set the socket's "last id received" to the id of the current message and send a confirmation. g. Send the data to the socket.
A confirmation of data is received. a. Compute the difference between the id of the message acknowledged with the appropriate socket's current id variable, storing the result as id_diff b. Mod the result of (a) with sys.maxsize c. Subtract the result of (b) from the number of caches stored. d. If the result of (d) is positive move on to e. e. set the new throttle rate (the period over which the socket waits before checking its buffer) to a complicated function, "F", of difference between the current time stamp and the time stamp recorded "result of (d) - 1" records ago. The complicated function "F" rescales the throttle rate to never goes above a maximum throttle rate/number of accounts connect to the chat server (so each account never sends messages slower than a certain rate) and the throttle rate never goes below a minimum throttle rate/number of accounts connected to the chat server (so each account never sends messages faster than a certain rate). f. The rate at which the socket reads data is adjusted based on the new throttle rate so that garbage collection need not happen for a certain minimum amount of time. This minimum amount of time is computed from the new throttle rate, together with a global constant "MAXIMUM_DATA" which contains the number of bytes that can be safely sent over the chat server, and another global constant "NUM_CACHES" which contains the minimum number of times the system should cache data before the cache size reaches MAXIMUM_DATA (and garbage collection takes place). g. The appropriate number of caches are cleared along with their recorded data lengths and time stamps (see 1a).

I know that I could us a global constant to mod data rather than sys.maxsize (which varies from one architecture to another), but getting the system to run quickly and efficiently is more important at the moment. In the mean time, consider this an outline of the full protocol spec to come.

Trac:
Username: feynman

Replying to feynman:

Replying to asn:

Hey feynman,

thanks for all the new features, and sorry for being less active on this lately.

BTW, due to the encryption of TLS, I'm not sure how helpful the caching is, since all TLS records should look unique on the wire. For the same reason, zlib might not find much stuff to compress in your TLS traffic.

TLS encryption should be completely independent of caching. It is not caching the TLS packet, but the data it sends before it gets encrypted with TLS. The same goes for the zlib compression stuff.

Tor connections are encrypted (and authenticated) using TLS before they reach your XMPP transport.

Replying to rransom:

Replying to feynman:

Replying to asn:

Hey feynman,

thanks for all the new features, and sorry for being less active on this lately.

BTW, due to the encryption of TLS, I'm not sure how helpful the caching is, since all TLS records should look unique on the wire. For the same reason, zlib might not find much stuff to compress in your TLS traffic.

TLS encryption should be completely independent of caching. It is not caching the TLS packet, but the data it sends before it gets encrypted with TLS. The same goes for the zlib compression stuff.

Tor connections are encrypted (and authenticated) using TLS before they reach your XMPP transport.

That would imply the zlib compression would be quite useless when relaying Tor traffic, but the caching scheme should work all the same. The whole XMPP transport does no analysis on what it is reading. It simply passes on the data byte for byte. The caching scheme combined with id numbers for packets should help ensure chunks of data get to the proper destination consistently and in the right order.

Whatever Tor does when it reads and writes to a TCP socket should work independently from the mechanism that actually delivers the data to its destination. My understanding is that the packet would ordinarily be encoded with an IP header and sent directly through a gateway to the internet.

When using hexchat, the data is sent to a local TCP socket running hexchat (call it hexchat1). Hexchat1 then reads the data (thereby stripping it of its TCP header) and passes it over a chat server to another hexchat program (call it hexchat2) that sends the data to the appropriate ip:port (giving it a new TCP header in the process).

The client thinks it is sending the data to hexchat1, and the server thinks it is receiving data from hexchat2, but the data itself is never changed. It might be broken into smaller chunks or combined into bigger chunks, and it might be delivered at unpredictable rates, but it is never altered.

That at least is how this should work in principle.

Trac:
Username: feynman

I updated the protocol spec here: https://raw.github.com/aeftimia/hexchat/master/doc/protocol-spec.txt

There is still work to be done.

JIDs are often given random strings for their so called "resources" (or if a resource is requested, a random string is often appended to it). To send an IQ, one must know the recipient's resource. This is great for security, but bad for this particular application. To get around this, I use a message (which can be sent without a resource) to send a connection request to a JID with an unknown resource. When the recipient responds, thus disclosing their resource, their full JID (including the resource) is added to a table that keeps track of JIDs and resources.

The problem is if one of the computers disconnects and reconnects, they acquire a new resource and their is no way (currently) for the other computer to update its table.

Another problem is that messages that have no resource specified can only be sent to people on your contact list. Thus, I may have to carry on with the multi-user chat scheme and devise a secure way of acquiring the target's resource by first sending a message to everyone in the chat room. The obvious way of handling this would be to use asymmetric encryption to send initial connection messages in an encrypted form to everyone in the chat room, then have the recipient decrypt it and respond via IQ.

However, before I continue with this, I would like some feedback concerning the practicality of the protocol thus far. Here are some questions I want to consider:

Is the protocol lacking anything that has not been mentioned? Is it too complicated? Is the program still too slow to be useful?

Trac:
Username: feynman

Replying to feynman:

JIDs are often given random strings for their so called "resources" (or if a resource is requested, a random string is often appended to it).

(I just want to point out that this is pretty uncommon for XMPP servers except GTalk. Most normal XMPP servers just give you the resource you request.)

To send an IQ, one must know the recipient's resource. This is great for security, but bad for this particular application. To get around this, I use a message (which can be sent without a resource) to send a connection request to a JID with an unknown resource. When the recipient responds, thus disclosing their resource, their full JID (including the resource) is added to a table that keeps track of JIDs and resources.

The problem is if one of the computers disconnects and reconnects, they acquire a new resource and their is no way (currently) for the other computer to update its table.

Another problem is that messages that have no resource specified can only be sent to people on your contact list.

This also sounds like a limitation set by GTalk.

Why do you want to avoid needing to have someone on your contact list to use this? If you want to properly exchange messages/iqs with someone, it helps to be able to know on which resources they are online. This should also make it much easier to automatically handle the case where the other side disconnected and reconnected on a different resource.

If you're worried about privacy... I don't really see why you would authorize someone to use your connection as a proxy to the internet when you don't want them to know when you're online. It sounds fair to inform them when you're available to proxy a connection for you.

Trac:
Username: xnyhps

Replying to xnyhps:

Replying to feynman:

JIDs are often given random strings for their so called "resources" (or if a resource is requested, a random string is often appended to it).

(I just want to point out that this is pretty uncommon for XMPP servers except GTalk. Most normal XMPP servers just give you the resource you request.)

To send an IQ, one must know the recipient's resource. This is great for security, but bad for this particular application. To get around this, I use a message (which can be sent without a resource) to send a connection request to a JID with an unknown resource. When the recipient responds, thus disclosing their resource, their full JID (including the resource) is added to a table that keeps track of JIDs and resources.

The problem is if one of the computers disconnects and reconnects, they acquire a new resource and their is no way (currently) for the other computer to update its table.

Another problem is that messages that have no resource specified can only be sent to people on your contact list.

This also sounds like a limitation set by GTalk.

Why do you want to avoid needing to have someone on your contact list to use this? If you want to properly exchange messages/iqs with someone, it helps to be able to know on which resources they are online. This should also make it much easier to automatically handle the case where the other side disconnected and reconnected on a different resource.

If you're worried about privacy... I don't really see why you would authorize someone to use your connection as a proxy to the internet when you don't want them to know when you're online. It sounds fair to inform them when you're available to proxy a connection for you.

My main concern is not really for the sake of the user so much as for the person running the proxy service. I figured that people who run proxy services are not going to want to constantly log in to their chat accounts and accept strangers' requests to be added to their contact list. I do not think that would be a very scalable approach.

I imagined that this would work in a more automated fashion like other Tor plugins. Take for example, obfsproxy. You do not need to give someone permission to connect to your IP address for obfsproxy to work. The user simple plugs in the ip:port to Tor, and Tor connects. I think having to ask people to add you to their contact lists would discourage users from trying the software, and discourage people that manage proxies from running the service. It is just too much maintenance.

In case there was any doubt, I want to assert that I think that using your usual chat accounts to run proxy services is a bad idea. Your chat accounts are not only a piece of identifying information, they are an easy form of contact information--especially if you are using an email (like in the case of GTalk). That just sounds like a bad idea from the start.

Trac:
Username: feynman

Replying to asn:

Also, where did the file transfer idea go? Does inbound file transfer (the one where files go through the server) work in Google's XMPP servers?

I looked into the file transfer protocol and it seemed easier to make my own protocol than try to sneak all the parameters (port numbers, ip addresses, etc) into fields of an existing one. As far as I can tell, the chat server does not treat the file transfer protocol any differently than any other xml protocol, and it is really up to the users of the file transfer protocol to manage the actual exchange of data.

I looked into what the file transfer protocol can do with regards to making sure data gets to the client and in the right order. It would seem I already have the same system of safeguards integrated into the hexchat protocol. I also have new things like dynamic throttling, dynamic rates at which sockets are read, and caching.

Trac:
Username: feynman

I recently discovered that the caching and delivery confirmation were doing more harm than good. I think they were simply using too much bandwidth. It seems that by spawning a new thread for closing a socket and acquiring a lock that blocks the reading of other sockets, I could greatly improve the speed. It is still far from ideal, but I can usually get through a couple of minutes of low quality youtube videos at this point (even with a very slow internet connection).

The code and protocol specs are updated. The old code is stored in the misc directory of the git repository.

Unfortunately, using more than one JID is still very unreliable. I am beginning to think that rransom was on the right track in thinking that the messages were getting reordered--especially since I am no longer verifying anything with IDs. Youtube pages load when using more than one JID, but the video itself never plays (despite the loading bar swiftly moving across the screen).

I hope to find other ways to make the program faster.

Trac:
Username: feynman

Replying to feynman:

I recently discovered that the caching and delivery confirmation were doing more harm than good. I think they were simply using too much bandwidth. It seems that by spawning a new thread for closing a socket and acquiring a lock that blocks the reading of other sockets, I could greatly improve the speed. It is still far from ideal, but I can usually get through a couple of minutes of low quality youtube videos at this point (even with a very slow internet connection).

Ah. I see.

Have you also looked at whether compression actually helps the transport? It might just be wasting CPU cycles because of the TLS layer being encrypted.

The code and protocol specs are updated. The old code is stored in the misc directory of the git repository.

Unfortunately, using more than one JID is still very unreliable. I am beginning to think that rransom was on the right track in thinking that the messages were getting reordered--especially since I am no longer verifying anything with IDs. Youtube pages load when using more than one JID, but the video itself never plays (despite the loading bar swiftly moving across the screen).

Hm, I see. This is not fun. A deployed hexchat would probably need to use different JIDs on the client and the server.

BTW, have you tried using hexchat with Tor? Does it work? Is that how you do testing?

Finally, the main problem with this transport seems to be Google rate-limiting their servers. I'm not sure what to do about this, and whether we can work around their throttling. After all, if they don't want hexchat to work on their servers, they can rate-limit them even more. Hm.

Create an XMPP pluggable transport

Child items ...

Activity