Opened 6 years ago

Last modified 21 months ago

#9022 accepted task

Create an XMPP pluggable transport

Reported by: asn Owned by: feynman
Priority: Medium Milestone:
Component: Circumvention/Pluggable transport Version:
Severity: Normal Keywords:
Cc: alexeftimiades@…, dcf@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

We should look into XMPP pluggable transports. There are many public XMPP services that see widespread use even from censored countries.

Child Tickets

Change History (94)

comment:1 Changed 6 years ago by asn

feynman from IRC is looking into this atm. He wrote http://sourceforge.net/projects/aeftimiamisc/files/hexchat/ in the past, which is a XMPP tunnel.

Unfortunately, it seems that in its current form it's quite slow and unusable for web browsing. It also seems that some XMPP servers are throttling streams of messages of large size (4k byte chunks).

More research is needed.

comment:2 Changed 6 years ago by arma

News from irc is that the throttling / slowness was due to a bug in his threading code, and actually it's remarkably fast now.

comment:3 Changed 6 years ago by asn

feynman posted his updated code to github: https://github.com/aeftimia/hexchat

It seems that the topology of an XMPP transport would be:

                        teh censor
     +-------------+       \\\       +-------------+         +----------+
     |  hexchat    |       \\\       |             |         | hexchat  |
     |  client     |<------\\\------>| XMPP server |<------->| XMPP bot |
     |(XMPP client)|       \\\       |             |         |          |
     +-------------+       \\\       +-------------+         +----------+
           ^               \\\                                    ^
           |               \\\                                    |
           |               \\\                                    |
           |               \\\                                    |
           v               \\\                                    v
     +------------+        \\\                             +------------+
     |            |        \\\                             |            |
     | Tor client |        \\\                             | Tor bridge |
     |            |        \\\                             |            |
     +------------+        \\\                             +------------+
                           \\\

Also, the simplest and easiest deployment of hexchat would probably resemble the current deployment of flashproxy. That is, the client-side would expose a SOCKS-server but in reality it would ignore the SOCKS handshake. It would connect to an XMPP server, and speak with a specific XMPP bot (that would run the server-side of hexchat). The XMPP bot would extract the Tor data out of the XMPP traffic, and pass them to a specific-hardcoded bridge.

The above system is easier to deploy on the client-side, since the client doesn't need to specify an XMPP server, the XMPP bot username, or the bridge address. This is similar to how flashproxy works currently. In the future, we can think of how the client can specify specific parameters for his hexchat session (like a specific XMPP bot username, or a specific bridge).

Also, it's worth noting that in the hexchat system, the IP of the client is exposed to the XMPP server. The server-side hexchat XMPP bot should not be able to get the IP of the client, since it's always speaking to the client through the server.

(BTW, obviously the name hexchat might change if feynman wants to change it.)

comment:4 Changed 6 years ago by rransom

This pluggable transport will (a) give hostile firewall vendors an incentive to block all XMPP-like traffic, and (b) give XMPP server operators an incentive to deploy censorship software to detect and block hexchat.

comment:5 Changed 6 years ago by feynman

Cc: alexeftimiades@… added

comment:6 Changed 6 years ago by feynman

Owner: changed from asn to feynman
Status: newaccepted

comment:7 in reply to:  3 ; Changed 6 years ago by feynman

Replying to asn:

I should make a couple of notes here. First of all, the client really controls every aspect of initializing the connection. The xmpp bot on the server side just logs into an xmpp server and listens for traffic. It does not even bind to a port.

On the client side, the xmpp bot binds to one or more ports and listens for traffic. It associates each of these ports with:
*An xmpp username to forward traffic to
*An ip:port that the xmpp bot on the server side should try to connect to

When it gets a connection, it sends a message "connect me!" to the xmpp bot on the server through the chatline. It puts the ip:port the server should try to connect to in the xml subject tag and the ip:port of its newly spawned connect socket in the xml nick tag (used for nicknames). This way, the xmpp bot on the server side has a way to send data to that connected socket when replying. The client xmpp bot also creates an entry in its routing table that associates the following tuple:
(ip:port of connected socket, server's xmpp username, server ip:port to connect to)
with the newly connected socket.

When the xmpp bot on the server side gets a connection request, it creates a new socket and tries to connect it to the ip:port specified in the subject tag. If successful, it adds an entry to its routing table that associates the following tuple:
(ip:port that the server just connected to, client's xmpp username, client's connected socket's ip:port)
with the newly connected socket.

Now, when either socket receives data, they are prompted to send the data over the chat server using the socket's key in the routing table to construct the appropriate nick and subject xml tags. When a message is received over an xmpp server, the routing table key is constructed from the username of the computer that sent it, along with the nick and subject xml tags. The data is then forwarded to the appropriate socket.

An analogous process takes place for disconnections, starting with a closing socket sending a "disconnect me!" message to the xmpp bot on the other side of the chat server.

comment:8 in reply to:  4 ; Changed 6 years ago by asn

Replying to rransom:

This pluggable transport will (a) give hostile firewall vendors an incentive to block all XMPP-like traffic, and (b) give XMPP server operators an incentive to deploy censorship software to detect and block hexchat.

Yes, these are valid concerns.
Another issue with this PT, is that the XMPP server (e.g. google) learns the IP address of our users.

Unfortunately, I don't know how to evaluate the severity of these concerns, and whether it's a good idea to deploy such a transport to users.

comment:9 in reply to:  8 Changed 6 years ago by arma

Replying to asn:

Replying to rransom:

This pluggable transport will (a) give hostile firewall vendors an incentive to block all XMPP-like traffic, and (b) give XMPP server operators an incentive to deploy censorship software to detect and block hexchat.

Yes, these are valid concerns.
Another issue with this PT, is that the XMPP server (e.g. google) learns the IP address of our users.

Unfortunately, I don't know how to evaluate the severity of these concerns, and whether it's a good idea to deploy such a transport to users.

I don't see a problem here. Predicting the future is hard -- maybe some XMPP servers will choose to censor, and maybe some won't. Probably some firewall vendors will offer a 'filter xmpp button' (probably some of them do already). But whether firewall operators choose to press the button remains a complex tradeoff.

Wrt the 'xmpp servers censoring their content' question: that means the hexchat design should consider whether it can achieve its goals with fewer/no regexpable "headers" in its chat setup.

comment:10 in reply to:  7 Changed 6 years ago by asn

Replying to feynman:

Replying to asn:

I should make a couple of notes here. First of all, the client really controls every aspect of initializing the connection. The xmpp bot on the server side just logs into an xmpp server and listens for traffic. It does not even bind to a port.

On the client side, the xmpp bot binds to one or more ports and listens for traffic. It associates each of these ports with:
*An xmpp username to forward traffic to
*An ip:port that the xmpp bot on the server side should try to connect to

When it gets a connection, it sends a message "connect me!" to the xmpp bot on the server through the chatline. It puts the ip:port the server should try to connect to in the xml subject tag and the ip:port of its newly spawned connect socket in the xml nick tag (used for nicknames). This way, the xmpp bot on the server side has a way to send data to that connected socket when replying. The client xmpp bot also creates an entry in its routing table that associates the following tuple:
(ip:port of connected socket, server's xmpp username, server ip:port to connect to)
with the newly connected socket.

When the xmpp bot on the server side gets a connection request, it creates a new socket and tries to connect it to the ip:port specified in the subject tag. If successful, it adds an entry to its routing table that associates the following tuple:
(ip:port that the server just connected to, client's xmpp username, client's connected socket's ip:port)
with the newly connected socket.

Now, when either socket receives data, they are prompted to send the data over the chat server using the socket's key in the routing table to construct the appropriate nick and subject xml tags. When a message is received over an xmpp server, the routing table key is constructed from the username of the computer that sent it, along with the nick and subject xml tags. The data is then forwarded to the appropriate socket.

An analogous process takes place for disconnections, starting with a closing socket sending a "disconnect me!" message to the xmpp bot on the other side of the chat server.

Hey there! I have a quick code review and comments. If you are bored fixing my comments, just say so, and I will do it when I get some free time.

  • Which XMPP plugins do we really need? For example, do we need Multi-User Chat?
  • Instead of using print(), use the Python logging module for your logs.
  • Stuff like "send '_' for empty data" must be mentioned in the spec.
  • Maybe add a more paranoid "(dis)connect me!" string so that it's even more unlikely to be encountered in a normal XMPP concersation? Add some numbers, and symbols, and stuff.
  • You are using rfind() but not checking the retval. You are also using functions like find() without checking for exceptions. Don't assume that the data you receive are correctly formatted. Your program might crash with an exception.
  • I still think sockbot is a weird name -- and you also needed two lines of comments to explain it. Why don't you name the class Hexchat or something?
  • What's the deal with the lambda in the "disconnected" event handler function pointer? Or the lambda: False? Am I missing something?
  • Maybe split the codebase to more files? One for the client and another for the server? This way you won't need to have variables like client_socks and server_socks that are only used in one mode.
  • Using sock as an abbreviation for socket continues to be confusing in names like server_socks. Maybe expand sock to socket?

All in all, code looks good, the new comments are helpful, and I think I kind of understand how it works.

comment:11 Changed 6 years ago by asn

Here is a list more things that must be done till the transport is deployable:

  • Write a SOCKS-server for the client-side. We should look at how flashproxy does it.
  • We need pyptlib support. I see you started implementing it, but don't worry about it. I can do it tomorrow or the day after.
  • SSL support.
  • We need to move the stuff from the config file to hardcoded parameters and command line switches. That's how we currently deploy pluggable transports. Check out how flashproxy is currently deployed (open up a pluggable transport bundle, and check the torrc).

comment:12 in reply to:  11 ; Changed 6 years ago by feynman

Replying to asn:
I updated the files with most of the changes you suggested. Here are the things I did not change:

  • More paranoid "(dis)connect" message: I left this the way it was so it would be easy to spot in a debug file. I do not think people should run hexchat using their usual XMPP chat server accounts anyway, so making messages distinguishable from normal chats should be unnecessary.
  • Split the code base to more files: Clients and servers are objects of the same class because the program does not really distinguish them the way TCP does. In fact, any client can also act as a server.
  • Write a SOCKS-server for the client side: I will do this if I must, but it seems like a hack around an unnecessary limitation tor places on pluggable transports. I would personally prefer that tor be configured to have hexchat listen on a local address and tor configured to use that local address as a bridge. Then hexchat would forward the connection to the actual bridge. This would leave hexchat in the most versatile form. Then tor, or any other program could still use it as something other than a SOCKS proxy.

-pyptlib support: You said you could/would take care of this. If you do not have time tomorrow or want me to take care of this, let me know. Otherwise, I will leave it to you to finish this off with pyptlib.

-SSL support: If you were referring to SSL support with the chat server, it already supports that (sleekxmpp does this transparently).

comment:13 in reply to:  12 Changed 6 years ago by asn

Replying to feynman:

Replying to asn:
I updated the files with most of the changes you suggested. Here are the things I did not change:

  • More paranoid "(dis)connect" message: I left this the way it was so it would be easy to spot in a debug file. I do not think people should run hexchat using their usual XMPP chat server accounts anyway, so making messages distinguishable from normal chats should be unnecessary.
  • Split the code base to more files: Clients and servers are objects of the same class because the program does not really distinguish them the way TCP does. In fact, any client can also act as a server.
  • Write a SOCKS-server for the client side: I will do this if I must, but it seems like a hack around an unnecessary limitation tor places on pluggable transports. I would personally prefer that tor be configured to have hexchat listen on a local address and tor configured to use that local address as a bridge. Then hexchat would forward the connection to the actual bridge. This would leave hexchat in the most versatile form. Then tor, or any other program could still use it as something other than a SOCKS proxy.

-pyptlib support: You said you could/would take care of this. If you do not have time tomorrow or want me to take care of this, let me know. Otherwise, I will leave it to you to finish this off with pyptlib.

-SSL support: If you were referring to SSL support with the chat server, it already supports that (sleekxmpp does this transparently).

Sounds good. Thanks for the fixes. I will also do some code cleaning of my own when I get the time.

BTW, with regards to the SOCKS-server thing, have you tried using hexchat with tor? If you can manage to make hexchat work with Bridge lines and ClientTransportPlugin lines, then I guess we don't need to do the SOCKS-server thing. You might be able to do it with something like this:

Bridge 127.0.0.1:5555 # actual hexchat address
Bridge hexchat 0.0.0.1:1233 # dummy bridge line just to spawn up 'hexchat' transport
ClientTransportPlugin hexchat exec /usr/bin/hexchat --blabla --managed # this line will force tor to spawn hexchat 

Although this is a hack, so I can't promise it's going to work. Check out how ClientTransportPlugin and the managed proxy interface works: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/180-pluggable-transport.txt

comment:14 Changed 6 years ago by feynman

I fixed a bug in how the program handles "connection refused" errors on the server side. Also, it appears that Linux handles threads differently than OSX and I needed to throw in an infinite loop to keep the program running. As of now, I have NOT gotten it working with tor, but I hope to do so over the next 24 hours.

comment:15 Changed 6 years ago by dcf

Cc: dcf@… added

comment:16 Changed 6 years ago by feynman

I want to thank everyone on the IRC that helped me test this program.

At this point I was able to connect and use a bridge through hexchat after making some minor modifications to the code. It now acts completely (or so I hope) transparently as a means of forwarding data from one computer over a chatline to another computer.

This allows you to tell tor to use your local computer as a bridge and have hexchat waiting to forward data byte for byte to another computer (which would be running its own instance of hexchat).

There is a lot of room for flexibility here. For example, the computer with an uncensored internet connection could be behind a NAT and does not even have to be running tor. As long as the computer can:
a) Connect to and use an XMPP chat server
b) Connect to the requested bridge (or run a bridge itself)
, the computer is a viable relay for hexchat.

A further consideration is the distribution of JIDs (xmpp usernames of the form username@chatserver) of people running hexchat. Remember, you do not have to know the IP address of the bridge you are connecting to if the bridge itself is running hexchat (in which case you would tell your client hexchat to connect to a 127.0.0.1 address on the remote computer).

Finally, I want to note that at this point, running hexchat would probably be a security risk. Someone could connect to a computer running hexchat, then connect from there to any IP, local or remote, and send arbitrary data from that computer. The good news is that this is quite easy to fix. I can throw in another command line arguement that gives the computer a list of ip:ports it is authorized to connect to.

All in all, the program is near complete. It just needs some means to limit the ip:ports it can connect to, and a pyptlib interface.

comment:17 in reply to:  16 Changed 6 years ago by asn

Hm, https://github.com/aeftimia/hexchat/commit/bff1134bc9d17e8e0532bcc99d3a77b975ba1946 is a bit weird. It seems like your non-blocking connect() never succeeded (which makes sense, since you never connect to a remote host instantly) and you turned it into a blocking connect().

Problem with a blocking connect() is that hexchat will block till it connects. Imagine this on the server-side, where the hexchat bot gets 100 connect me'' messages a second, and it blocks for every connect.

You will probably need to introduce some kind of asynchronous networking there. You want to do a non-blocking connect() and run add_client_socket() only when it's completed. Are you familiar with any asynchronous Python networking libraries (like asyncore or twisted or something)?

comment:18 Changed 6 years ago by asn

Since you prefer to not do it the SOCKS way, and instead use the address of hexchat as Bridge, we might not even need the managed-proxy interface and pyptlib.

Specifically, if hexchat is an application with the following CLI:
hexchat-client <listenaddr> <xmpp_server> <jid/password>
and
hexchat-server <pushaddr> <xmpp_server> <jid/password>
we might be able to deploy this without the managed-proxy interface.

On the client-side, we do the dummy Bridge/ClientTransportPlugin trick. On the server-side, we just fire up hexchat-server and point it to the ORPort of our bridge without even informing Tor about it.

In the future, if we want the managed-proxy interface, we can add pyptlib support.

comment:19 Changed 6 years ago by asn

(Also, can you clean up your repo so that the correct hexchat.py is obvious to the casual observer? Maybe you can put the secure version in a misc/ repository (or even better in a different git branch). Also, I guess we can remove the pluggable-transports directory till we implement correct pyptlib support.)

comment:20 Changed 6 years ago by asn

Also, check my branch docs_and_refactoring_2 for some more code cleanups.

Some more code comments:

  • Why do you "resend the message" on error in get_message(). Is that what you are suppposed to do in XMPP?
  • b64decode can throw an exception (triggered remotely by sending a wrongly formatted base64 chunk). We should catch that exception, and also check for other uncaught exceptions.

comment:21 Changed 6 years ago by feynman

As of about 12 hours ago, I made an unfortunate discovery. Gtalk was not transmitting my messages most of the time--especially while watching youtube videos. Instead, it was bouncing the message with an error code. Hexchat thought the error message was the response and worked with it as though it came from the other party. I am very sorry for the confusion and I am quite disappointing at this point.

It seems that Gtalk will only deliver so many messages in a given period of time. I tried other chat servers, and they are much slower. I can NOT watch youtube videos with hexchat.

Though I seem to be able to access facebook.

I will post any updates as they come.

comment:22 in reply to:  21 ; Changed 6 years ago by xnyhps

Replying to feynman:

It seems that Gtalk will only deliver so many messages in a given period of time. I tried other chat servers, and they are much slower. I can NOT watch youtube videos with hexchat.

Is there a reason you're passing all traffic through <message> stanzas? Many servers will throttle those to avoid spam, <iq> stanzas are a lot more likely to work well. You could look at XEP-0047: In-Band Bytestreams for how this can be done.

In fact, you might be able to use that specification for the actual content data (leaving the signaling to <message>s or other <iq>s), that would hide the data making it look like ordinary file transfers.

comment:23 in reply to:  22 ; Changed 6 years ago by feynman

Replying to xnyhps:

Replying to feynman:

It seems that Gtalk will only deliver so many messages in a given period of time. I tried other chat servers, and they are much slower. I can NOT watch youtube videos with hexchat.

Is there a reason you're passing all traffic through <message> stanzas? Many servers will throttle those to avoid spam, <iq> stanzas are a lot more likely to work well. You could look at XEP-0047: In-Band Bytestreams for how this can be done.

In fact, you might be able to use that specification for the actual content data (leaving the signaling to <message>s or other <iq>s), that would hide the data making it look like ordinary file transfers.

This sounds like a good idea. I looked into <iq>s, but it seems they do not come with enough text fields (although I could be mistaken). I need four text fields to send a message:

  • One for the client ip:port
  • One for the server ip:port
  • One for the actual data
  • The JID of the computer that sent the message

Unless all four fields can be stuck somewhere in an iq message, this route will not work. Maybe with some hacks I could be wrong, but at first glance, this looks like a dead end.

comment:24 in reply to:  23 Changed 6 years ago by xnyhps

Replying to feynman:

This sounds like a good idea. I looked into <iq>s, but it seems they do not come with enough text fields (although I could be mistaken). I need four text fields to send a message:

  • One for the client ip:port
  • One for the server ip:port
  • One for the actual data
  • The JID of the computer that sent the message

Unless all four fields can be stuck somewhere in an iq message, this route will not work. Maybe with some hacks I could be wrong, but at first glance, this looks like a dead end.

<iq>s can carry arbitrary XML, which servers will route to the client you're addressing. It doesn't need to follow an already defined protocol or extension.

You just have to keep the following in mind:

  1. They must contain a single child element (which might contain further children), which should be in some custom XML namespace.
  2. Everything must be valid UTF8.
  3. There's a size limit in stanzas.

So you could define your own protocol where you send an <iq> like:

<iq type="set" to="pluggabletransport@jabber.org/Hex" id="1234">
    <initiate xmlns="https://www.torproject.org/transport/xmpp">
        <host>www.google.com</host>
        <port>443</port>
    </initiate>
</iq>

and the transport replies:

<iq type="result" from="pluggabletransport@jabber.org/Hex" id="1234">
   <success sid="abcd567" xmlns="https://www.torproject.org/transport/xmpp" /> 
</iq>

which the client uses to open an IBB connection:

<iq id="1235" to="pluggabletransport@jabber.org/Hex" type="set">
    <open xmlns="http://jabber.org/protocol/ibb" block-size="4096" sid="abcd567" stanza="iq" />
</iq>

I haven't read the code for all the details of the information you need to exchange, but in principle you can stick whatever you want in those <iq>s. :)

comment:25 Changed 6 years ago by feynman

I think I got an iq method worked out. I just need to figure out how to register the protocol so gtalk will not return a "feature-not-implemented" error.

The code will need cleaning up, but all in all, this will be an improvement on the old method, if not for speed then for more robust code.

comment:26 Changed 6 years ago by xnyhps

I assume you mean the other contact is returning "feature-not-implemented"?

If you use a custom iq-class in Sleek:

class Initiate(ElementBase):
    name = 'initiate'
    namespace = 'https://www.torproject.org/transport/xmpp'
    plugin_attrib = 'tor_initiate'
    interfaces = set(('host', 'port'))
    sub_interfaces = interfaces

And call:

register_stanza_plugin(Iq, Initiate)

Then you can use:

self.register_handler(Callback('Tor XMPP Transport Handler', StanzaPath('iq@type=set/tor_initiate'), self.handle_transport))

To register the self.handle_transport callback to be called every time a message matching the class comes in.

If you use the iq-stanza format I proposed, then you can access the fields with stanza['tor_initiate']['host'] and stanza['tor_initiate']['port'].

comment:27 in reply to:  26 Changed 6 years ago by feynman

Replying to xnyhps:

I assume you mean the other contact is returning "feature-not-implemented"?

If you use a custom iq-class in Sleek:

class Initiate(ElementBase):
    name = 'initiate'
    namespace = 'https://www.torproject.org/transport/xmpp'
    plugin_attrib = 'tor_initiate'
    interfaces = set(('host', 'port'))
    sub_interfaces = interfaces

And call:

register_stanza_plugin(Iq, Initiate)

Then you can use:

self.register_handler(Callback('Tor XMPP Transport Handler', StanzaPath('iq@type=set/tor_initiate'), self.handle_transport))

To register the self.handle_transport callback to be called every time a message matching the class comes in.

If you use the iq-stanza format I proposed, then you can access the fields with stanza['tor_initiate']['host'] and stanza['tor_initiate']['port'].

Unfortunately, I am beginning to think that the chat server is sending the error message. I consistently get the same error messages whether the other hexchat bot is logged in as the recipient or not. It would appear as though the server does not like custom IQs.

If you have the time, could you confirm that you are able to send custom Iqs with sleekxmpp? If you are not willing or able, that is fine, but an example of working code would really be a help here.

comment:28 Changed 6 years ago by xnyhps

I experimented a bit with your code last night to see if my idea could work and committed that here: https://github.com/xnyhps/hexchat/commit/07cb3a192c7d24fa19b1eec33741c39d948562bd. Setting up the connection works with it, but handling closed sockets/streams properly is still unfinished.

I changed a couple of things, I wasn't sure why the "local address" is communicated to the host and I left it out. It's up to you if you want to use my changes, or just look at it for inspiration. :)

comment:29 Changed 6 years ago by xnyhps

Oh, almost forgot. About the "feature-not-implemented", are you addressing the <iq>s to the full JID of the contact? So pluggabletransport@…/Hexchat, not just pluggabletransport@…. <iq>s don't get forwarded the same way as messages are.

comment:30 Changed 6 years ago by feynman

I got the protocol working with IQs and uploaded the code here:

https://gitweb.torproject.org/user/asn/hexchat.git

Some comments:

  • The protocol is quite different now and I need to update the protocol-spec in the "doc" directory to reflect this.
  • I used custom IQ stanzas rather than a stream (which is after all just a bunch of custom IQ stanzas).
  • The code is poorly commented at this point. I need to fix that, but for now, I thought it was important to keep everyone updated on progress.
  • I still cannot watch youtube videos, and the software has a tendency to randomly start refusing connections. However, when it is working, it is reasonably fast.
  • Sometimes messages are still dropped. I tried buffers, delivery confirmation messages and locks to try to fix this. None of those techniques worked. Please let me know if you can think of any new ways of ensuring messages are delivered quickly and in the right order.
  • I encourage others to test the code themselves and let me know whether you can think of any ways of improving it.

comment:31 Changed 6 years ago by feynman

I can watch about 10 seconds of a youtube video before something gets messed up. I want to note that the video does not seem to stop loading due to lack of speed. Rather, hexchat bots are sending disconnect requests then receiving data. The data is of course dropped since the socket already disconnected. I unlike connecting, I cannot wait for a confirmation of disconnect because by the time a computer has sent a disconnect request through a chat server, the socket it would write to has already closed. This is an inherent flaw in trying to forward data from a connection oriented socket. The good news is proxies manage to do it all the time--so it is possible (perhaps when the connection through the proxy--in this case a chat server--is fast enough).

Anyway, I am not sure how, or even whether it is possible to fix this problem with the disconnecting process, but I am doing everything I can to get youtube to work here.

comment:32 Changed 6 years ago by feynman

I am definitely getting closer. I found that gtalk drops IQs when you send too many to a given person (or possibly group of people) too quickly. I added code that saves data received from a socket into a buffer and sends the data out in large chunks every second. This gave me much better results, but google still seems to start dropping IQs somewhere around 1.5 minutes into a video at 240p. The situation only gets worse with higher quality videos. This might be because the bandwidth of gtalk for xmpp messages is inherently slower than the rate at which youtube sends data when streaming videos higher than 240p.

I tried using zlib to compress the data before base 64 encoding it and sending it over the chatline to see if my messages were too long, but this did not seem to help.

More testing is necessary.

comment:33 Changed 6 years ago by feynman

I introduced caching and garbage collection into the protocol. Now hexchat will throttle, cache, and empty caches when too much data is stored. This is still not enough to consistently watch youtube videos, but it makes the whole system more consistent in its performance and it does a much better job of delivering data--at least when Google is not dropping too many IQ packets.

I am trying to think of other ways of dealing with lots of dropped packets. I have delivery confirmation in place, and I might implement a timer that disconnects a socket when it goes a certain amount of time without receiving confirmation that its packets are being delivered.

comment:34 Changed 6 years ago by feynman

I added code that measures the amount of time lapsed between sending data and receiving an acknowledgement.

Unfortunately, it seems even with all the error handling functionality I put in, I still cannot stream youtube videos--even at low quality--at least not without a lot of buffering time. This is from a computer with about 350kb/s (max) internet access. When I run the tests from a computer with much faster internet access (1Mb/s max), I can stream low quality youtube videos. Unfortunately, I doubt the most potential users of this software will have access to 1Mb/s internet access.

Furthermore, gmail chat seems to be the only server (of the three I tested) that provides a fast enough service to stream youtube videos.

Even with gmail chat, I occasionally receive a "too many bytes sent per hour" error which kicks me off my account for a while (I am guessing an hour, but I have not measured). I am already compression my data with zlib at its highest compression rate before base 64 encoding the data and sending it over the chat server.

I am beginning to doubt that this will be a scalable and practical means of connecting to tor bridges. I will keep the ticket open for a while and certainly update it if I make any breakthroughs--though for now, I am out of ideas.

As a final note, I would like to mention that my protocol currently has enough error handling that it might be a suitable starting point for tunneling TCP over UDP. If that would be useful to Tor, please let me know and/or make a new ticket.

comment:35 Changed 6 years ago by feynman

I added a new feature in which you can use multiple accounts to send and receive data. In doing so, I discovered that when using gmail, you can only send and receive data to other users on your contact list. I tried to work around this by setting up a multiuser chat (MUC), though it did not seem to work.

Also, webpages seem to load less reliably when using more than one account. I have no idea why.

comment:36 in reply to:  35 ; Changed 6 years ago by rransom

Replying to feynman:

Also, webpages seem to load less reliably when using more than one account. I have no idea why.

Are messages/packets being reordered?

comment:37 in reply to:  36 Changed 6 years ago by feynman

Replying to rransom:

Replying to feynman:

Also, webpages seem to load less reliably when using more than one account. I have no idea why.

Are messages/packets being reordered?

Even if they were, I have a system in place that should take that into account. I have each message marked with a sequential identifier, and each computer acknowledges messages based on the identifier they receive. They also keep track of the identifier of the last message they receive so they know what parts of the message (if any) contain redundant information (the message may be partially or entirely composed of caches, but there is a stanza that indicates how many bytes each cache takes up).

In theory, this *should* all compensate for data coming out of order.

comment:38 Changed 6 years ago by asn

Hey feynman,

thanks for all the new features, and sorry for being less active on this lately.

BTW, due to the encryption of TLS, I'm not sure how helpful the caching is, since all TLS records should look unique on the wire. For the same reason, zlib might not find much stuff to compress in your TLS traffic.

Also, could you document your TCP-like functionality in the spec? That is, how you calculate sequence identifiers and do ACKs, etc.

comment:39 Changed 6 years ago by asn

Also, where did the file transfer idea go? Does inbound file transfer (the one where files go through the server) work in Google's XMPP servers?

comment:40 Changed 6 years ago by asn

Also, check out this weird proposal that just hit the XMPP standards mailing list:
http://mail.jabber.org/pipermail/standards/2013-June/027690.html

It's probably not relevant to the transport, but might give you some nice ideas.

comment:41 in reply to:  38 ; Changed 6 years ago by feynman

Replying to asn:

Hey feynman,

thanks for all the new features, and sorry for being less active on this lately.

BTW, due to the encryption of TLS, I'm not sure how helpful the caching is, since all TLS records should look unique on the wire. For the same reason, zlib might not find much stuff to compress in your TLS traffic.

TLS encryption should be completely independent of caching. It is not caching the TLS packet, but the data it sends *before* it gets encrypted with TLS. The same goes for the zlib compression stuff.

Also, could you document your TCP-like functionality in the spec? That is, how you calculate sequence identifiers and do ACKs, etc.

I will document all this functionality ASAP (probably over the next couple of days). For now, let me give you a run down of what happens:

  1. There is data to be read from the socket.
    1. Data is read from a socket and added to a buffer, which is periodically checked.
    2. When data is found in the buffer or the cache, the buffered data is added to the cached data, the length of the buffered data (if greater than zero) is appended to a separate list of cache lengths, and the current time is appended to a list of timestamps.
    3. All cached data is compressed, base 64 encoded, and put in a "data" stanza
    4. All the lengths of each cache is comma separated in a "chunks" stanza
    5. Local and remote ips and ports are set in their respective stanzas
    6. A comma separated list of all the accounts that the computer controls and are connected to the chat server are set in an "aliases" stanza.
    7. The socket's id variable is incremented by one (mod sys.maxsize).
    8. The iq message's id is set to the socket's id variable.
    9. The above stanzas are appended to the iq message in a 'packet' stanza
    10. The recipient of the message is selected from a list of potential addresses given during the connection phase (not mentioned here).
    11. The sender of the message is selected from a list of accounts connected to the chat server.
    12. The message is sent over the chat server
  1. A message containing data is received.
    1. The computer computes "id_diff"="id in the message" - "last id received with the same local and remote ip and ports and set of aliases"
    2. If id_diff<=0 and id_diff>=-"peer's sys.maxsize"/2 (the latter quantity is established during the connection phase) then the message is declared redundant and a confirmation is sent regarding the id containing the most recent data (i.e. *not* the id of the message that was just received).
    3. If the message is not completely redundant, mod id_diff with "peer's sys.maxsize" to get the number of new chunks of data.
    4. Compute the number of bytes of data to ignore from the number of new chunks of data computed in (c) and the list of chunk sizes in the "chunks" stanza.
    5. Unzip the data, discarding the number of bytes computed in (d).
    6. Set the socket's "last id received" to the id of the current message and send a confirmation.
    7. Send the data to the socket.
  1. A confirmation of data is received.
    1. Compute the difference between the id of the message acknowledged with the appropriate socket's current id variable, storing the result as id_diff
    2. Mod the result of (a) with sys.maxsize
    3. Subtract the result of (b) from the number of caches stored.
    4. If the result of (d) is positive move on to e.
    5. set the new throttle rate (the period over which the socket waits before checking its buffer) to a complicated function, "F", of difference between the current time stamp and the time stamp recorded "result of (d) - 1" records ago. The complicated function "F" rescales the throttle rate to never goes above a maximum throttle rate/number of accounts connect to the chat server (so each account never sends messages slower than a certain rate) and the throttle rate never goes below a minimum throttle rate/number of accounts connected to the chat server (so each account never sends messages faster than a certain rate).
    6. The rate at which the socket reads data is adjusted based on the new throttle rate so that garbage collection need not happen for a certain minimum amount of time. This minimum amount of time is computed from the new throttle rate, together with a global constant "MAXIMUM_DATA" which contains the number of bytes that can be safely sent over the chat server, and another global constant "NUM_CACHES" which contains the minimum number of times the system should cache data before the cache size reaches MAXIMUM_DATA (and garbage collection takes place).
    7. The appropriate number of caches are cleared along with their recorded data lengths and time stamps (see 1a).

I know that I could us a global constant to mod data rather than sys.maxsize (which varies from one architecture to another), but getting the system to run quickly and efficiently is more important at the moment. In the mean time, consider this an outline of the full protocol spec to come.

comment:42 in reply to:  41 ; Changed 6 years ago by rransom

Replying to feynman:

Replying to asn:

Hey feynman,

thanks for all the new features, and sorry for being less active on this lately.

BTW, due to the encryption of TLS, I'm not sure how helpful the caching is, since all TLS records should look unique on the wire. For the same reason, zlib might not find much stuff to compress in your TLS traffic.

TLS encryption should be completely independent of caching. It is not caching the TLS packet, but the data it sends *before* it gets encrypted with TLS. The same goes for the zlib compression stuff.

Tor connections are encrypted (and authenticated) using TLS before they reach your XMPP transport.

comment:43 in reply to:  42 Changed 6 years ago by feynman

Replying to rransom:

Replying to feynman:

Replying to asn:

Hey feynman,

thanks for all the new features, and sorry for being less active on this lately.

BTW, due to the encryption of TLS, I'm not sure how helpful the caching is, since all TLS records should look unique on the wire. For the same reason, zlib might not find much stuff to compress in your TLS traffic.

TLS encryption should be completely independent of caching. It is not caching the TLS packet, but the data it sends *before* it gets encrypted with TLS. The same goes for the zlib compression stuff.

Tor connections are encrypted (and authenticated) using TLS before they reach your XMPP transport.

That would imply the zlib compression would be quite useless when relaying Tor traffic, but the caching scheme should work all the same. The whole XMPP transport does no analysis on what it is reading. It simply passes on the data byte for byte. The caching scheme combined with id numbers for packets should help ensure chunks of data get to the proper destination consistently and in the right order.

Whatever Tor does when it reads and writes to a TCP socket should work independently from the mechanism that actually delivers the data to its destination. My understanding is that the packet would ordinarily be encoded with an IP header and sent directly through a gateway to the internet.

When using hexchat, the data is sent to a local TCP socket running hexchat (call it hexchat1). Hexchat1 then reads the data (thereby stripping it of its TCP header) and passes it over a chat server to another hexchat program (call it hexchat2) that sends the data to the appropriate ip:port (giving it a new TCP header in the process).

The client thinks it is sending the data to hexchat1, and the server thinks it is receiving data from hexchat2, but the data itself is never changed. It might be broken into smaller chunks or combined into bigger chunks, and it might be delivered at unpredictable rates, but it is never altered.

That at least is how this should work in principle.

comment:44 Changed 6 years ago by feynman

I updated the protocol spec here: https://raw.github.com/aeftimia/hexchat/master/doc/protocol-spec.txt

There is still work to be done.

JIDs are often given random strings for their so called "resources" (or if a resource is requested, a random string is often appended to it). To send an IQ, one must know the recipient's resource. This is great for security, but bad for this particular application. To get around this, I use a message (which can be sent without a resource) to send a connection request to a JID with an unknown resource. When the recipient responds, thus disclosing their resource, their full JID (including the resource) is added to a table that keeps track of JIDs and resources.

The problem is if one of the computers disconnects and reconnects, they acquire a new resource and their is no way (currently) for the other computer to update its table.

Another problem is that messages that have no resource specified can only be sent to people on your contact list. Thus, I may have to carry on with the multi-user chat scheme and devise a secure way of acquiring the target's resource by first sending a message to everyone in the chat room. The obvious way of handling this would be to use asymmetric encryption to send initial connection messages in an encrypted form to everyone in the chat room, then have the recipient decrypt it and respond via IQ.

However, before I continue with this, I would like some feedback concerning the practicality of the protocol thus far. Here are some questions I want to consider:

Is the protocol lacking anything that has not been mentioned?
Is it too complicated?
Is the program still too slow to be useful?

comment:45 in reply to:  44 ; Changed 6 years ago by xnyhps

Replying to feynman:

JIDs are often given random strings for their so called "resources" (or if a resource is requested, a random string is often appended to it).

(I just want to point out that this is pretty uncommon for XMPP servers except GTalk. Most normal XMPP servers just give you the resource you request.)

To send an IQ, one must know the recipient's resource. This is great for security, but bad for this particular application. To get around this, I use a message (which can be sent without a resource) to send a connection request to a JID with an unknown resource. When the recipient responds, thus disclosing their resource, their full JID (including the resource) is added to a table that keeps track of JIDs and resources.

The problem is if one of the computers disconnects and reconnects, they acquire a new resource and their is no way (currently) for the other computer to update its table.

Another problem is that messages that have no resource specified can only be sent to people on your contact list.

This also sounds like a limitation set by GTalk.

Why do you want to avoid needing to have someone on your contact list to use this? If you want to properly exchange messages/iqs with someone, it helps to be able to know on which resources they are online. This should also make it much easier to automatically handle the case where the other side disconnected and reconnected on a different resource.

If you're worried about privacy... I don't really see why you would authorize someone to use your connection as a proxy to the internet when you don't want them to know when you're online. It sounds fair to inform them when you're available to proxy a connection for you.

comment:46 in reply to:  45 Changed 6 years ago by feynman

Replying to xnyhps:

Replying to feynman:

JIDs are often given random strings for their so called "resources" (or if a resource is requested, a random string is often appended to it).

(I just want to point out that this is pretty uncommon for XMPP servers except GTalk. Most normal XMPP servers just give you the resource you request.)

To send an IQ, one must know the recipient's resource. This is great for security, but bad for this particular application. To get around this, I use a message (which can be sent without a resource) to send a connection request to a JID with an unknown resource. When the recipient responds, thus disclosing their resource, their full JID (including the resource) is added to a table that keeps track of JIDs and resources.

The problem is if one of the computers disconnects and reconnects, they acquire a new resource and their is no way (currently) for the other computer to update its table.

Another problem is that messages that have no resource specified can only be sent to people on your contact list.

This also sounds like a limitation set by GTalk.

Why do you want to avoid needing to have someone on your contact list to use this? If you want to properly exchange messages/iqs with someone, it helps to be able to know on which resources they are online. This should also make it much easier to automatically handle the case where the other side disconnected and reconnected on a different resource.

If you're worried about privacy... I don't really see why you would authorize someone to use your connection as a proxy to the internet when you don't want them to know when you're online. It sounds fair to inform them when you're available to proxy a connection for you.

My main concern is not really for the sake of the user so much as for the person running the proxy service. I figured that people who run proxy services are not going to want to constantly log in to their chat accounts and accept strangers' requests to be added to their contact list. I do not think that would be a very scalable approach.

I imagined that this would work in a more automated fashion like other Tor plugins. Take for example, obfsproxy. You do not need to give someone permission to connect to your IP address for obfsproxy to work. The user simple plugs in the ip:port to Tor, and Tor connects. I think having to ask people to add you to their contact lists would discourage users from trying the software, and discourage people that manage proxies from running the service. It is just too much maintenance.

In case there was any doubt, I want to assert that I think that using your usual chat accounts to run proxy services is a bad idea. Your chat accounts are not only a piece of identifying information, they are an easy form of contact information--especially if you are using an email (like in the case of GTalk). That just sounds like a bad idea from the start.

comment:47 in reply to:  39 Changed 6 years ago by feynman

Replying to asn:

Also, where did the file transfer idea go? Does inbound file transfer (the one where files go through the server) work in Google's XMPP servers?

I looked into the file transfer protocol and it seemed easier to make my own protocol than try to sneak all the parameters (port numbers, ip addresses, etc) into fields of an existing one. As far as I can tell, the chat server does not treat the file transfer protocol any differently than any other xml protocol, and it is really up to the users of the file transfer protocol to manage the actual exchange of data.

I looked into what the file transfer protocol can do with regards to making sure data gets to the client and in the right order. It would seem I already have the same system of safeguards integrated into the hexchat protocol. I also have new things like dynamic throttling, dynamic rates at which sockets are read, and caching.

comment:48 Changed 6 years ago by feynman

I recently discovered that the caching and delivery confirmation were doing more harm than good. I think they were simply using too much bandwidth. It seems that by spawning a new thread for closing a socket and acquiring a lock that blocks the reading of other sockets, I could greatly improve the speed. It is still far from ideal, but I can usually get through a couple of minutes of low quality youtube videos at this point (even with a very slow internet connection).

The code and protocol specs are updated. The old code is stored in the misc directory of the git repository.

Unfortunately, using more than one JID is still very unreliable. I am beginning to think that rransom was on the right track in thinking that the messages were getting reordered--especially since I am no longer verifying anything with IDs. Youtube pages load when using more than one JID, but the video itself never plays (despite the loading bar swiftly moving across the screen).

I hope to find other ways to make the program faster.

comment:49 in reply to:  48 ; Changed 6 years ago by asn

Replying to feynman:

I recently discovered that the caching and delivery confirmation were doing more harm than good. I think they were simply using too much bandwidth. It seems that by spawning a new thread for closing a socket and acquiring a lock that blocks the reading of other sockets, I could greatly improve the speed. It is still far from ideal, but I can usually get through a couple of minutes of low quality youtube videos at this point (even with a very slow internet connection).

Ah. I see.

Have you also looked at whether compression actually helps the transport? It might just be wasting CPU cycles because of the TLS layer being encrypted.

The code and protocol specs are updated. The old code is stored in the misc directory of the git repository.

Unfortunately, using more than one JID is still very unreliable. I am beginning to think that rransom was on the right track in thinking that the messages were getting reordered--especially since I am no longer verifying anything with IDs. Youtube pages load when using more than one JID, but the video itself never plays (despite the loading bar swiftly moving across the screen).

Hm, I see.
This is not fun. A deployed hexchat would probably need to use different JIDs on the client and the server.

BTW, have you tried using hexchat with Tor? Does it work? Is that how you do testing?

Finally, the main problem with this transport seems to be Google rate-limiting their servers. I'm not sure what to do about this, and whether we can work around their throttling. After all, if they don't want hexchat to work on their servers, they can rate-limit them even more. Hm.

comment:50 in reply to:  49 Changed 6 years ago by feynman

Replying to asn:

Replying to feynman:

I recently discovered that the caching and delivery confirmation were doing more harm than good. I think they were simply using too much bandwidth. It seems that by spawning a new thread for closing a socket and acquiring a lock that blocks the reading of other sockets, I could greatly improve the speed. It is still far from ideal, but I can usually get through a couple of minutes of low quality youtube videos at this point (even with a very slow internet connection).

Ah. I see.

Have you also looked at whether compression actually helps the transport? It might just be wasting CPU cycles because of the TLS layer being encrypted.

You might be right. I can take out the compression. However, I do not think it will help the speed because I have it set to send out messages once a second. I do not think it takes more than a second to compress data.

The code and protocol specs are updated. The old code is stored in the misc directory of the git repository.

Unfortunately, using more than one JID is still very unreliable. I am beginning to think that rransom was on the right track in thinking that the messages were getting reordered--especially since I am no longer verifying anything with IDs. Youtube pages load when using more than one JID, but the video itself never plays (despite the loading bar swiftly moving across the screen).

Hm, I see.
This is not fun. A deployed hexchat would probably need to use different JIDs on the client and the server.

I might be able to do distribute connections between different JIDs, but that will not guarantee the load is evenly distributed between JIDs (since some connections could exchange more data than others).

I might be able to exchange data with multiple JIDs per connection by giving each message an incremental ID and letting the client order them appropriately. I suppose I should try this.

BTW, have you tried using hexchat with Tor? Does it work? Is that how you do testing?

I have not tried hexchat with Tor. I will have physical access to my desktop (which has Tor) tomorrow. I will begin testing with Tor then.

Finally, the main problem with this transport seems to be Google rate-limiting their servers. I'm not sure what to do about this, and whether we can work around their throttling. After all, if they don't want hexchat to work on their servers, they can rate-limit them even more. Hm.

As for Google rate-limiting their servers, they probably chose their current rate-limit based on what they think their servers can handle (or at least what they want their servers to handle). As long as hexchat stays under their data limit, they should not have a problem with it (at least not because of the amount of data being exchanged). I have the maximum number of bytes a socket can read at once calculated so that the buffer will not contain more than 217 bytes when hexchat sends a message. 217 bytes seems to be the maximum number of bytes I can send at once before Google disconnects me. If I leave the throttle rate at one second, that comes out to be 131kb/s--which is a very slow but still bearable internet access.

comment:51 Changed 6 years ago by feynman

I added on a simple ID feature in which the server increments an id with each message it sends, and puts the id in its own stanza. This allows the client to determine where the message is in the sequence and store the message in a buffer if necessary. This also happens for disconnect requests. This way, the client will not receive the disconnect request until all the messages sent before it have been received.

I also took out zlib compression.

I will update the protocol spec ASAP (probably later today).

Note that with N connected JIDs, the bot's maximum bandwidth increases by a factor of N. So, with enough JIDs, it *should* be possible to run a proxy service for many people.

The next step is probably to find a way around Google's preventing people from sending you messages if they are not on your contact list. That is going to be really important if this is to work with Gtalk. Gtalk is much faster than jabber (and probably many other XMPP services), so you would need a lot more accounts for it to work with other chat servers.

I would like to get this working properly with Gtalk by using MUC rooms, but I cannot get sleekxmpp to send out a message to everyone in a chat room (I always get a service unavailable error). I currently quite stuck with respect to this problem.

comment:52 Changed 6 years ago by feynman

Protocol spec has been updated.

comment:53 Changed 6 years ago by feynman

I added threading and blocking capabilities to the connection phase-analogous to the threading and blocking that takes place in the disconnect phase. Now I have much better reliability. Videos do not seem to stop loading in the middle. I just made it through a 5 minute Youtube video with buffering only at the beginning. I used two JIDs for the client and server. Although the video was only 140p, I still think this is a big step forward.

I also cleaned up the code. Though there are still few comments, I have better object oriented practices with separate client_socket and server_socket classes. I also took out the use of sys.maxsize in place of a global constant: 232-1. This is mostly for security (so people cannot profile the OS based on the sys.maxsize) but also for a slightly simpler protocol.

For now, I think this is about as fast and reliable as it is going to get. It is probably time to start working on error handling and full GTalk compatibility (i.e. being able to send connect requests to people who are not necessarily on your contact list).

comment:54 Changed 6 years ago by feynman

I just tried hexchat with tor and I can safely report success in getting basic webpages to load (very slowly), and even watching a video (with a lot of buffering). I am using the following configuration:

laptop browser=>laptop hexchat=>chat server=>desktop hexchat=>desktop tor=>desktop hexchat=>chat server=>laptop hexchat=>bridge

...and I have the wireshark logs to prove it.

Keep in mind that this test should be about twice as slow as a normal hexchat connection since I am crossing the chat server twice as many times as I would in a normal pluggable transport connection.

I am also using three gmail accounts per computer. Perhaps a real proxy server would/should use even more.

I also have some basic error handling if one computer is disconnected from one or more of its accounts.

I would appear that the only thing left to do is to get this to work with GTalk so that anyone can initiate a connect to anyone--regardless of whether the client is on the server's contact list. That, and to limit the ip:ports a server can connect to with a whitelist (but the latter should be quite easy).

comment:55 in reply to:  54 ; Changed 6 years ago by asn

Replying to feynman:

I just tried hexchat with tor and I can safely report success in getting basic webpages to load (very slowly), and even watching a video (with a lot of buffering). I am using the following configuration:

Nice! Do you see a big difference (speed-wise) from normal Tor browsing?

laptop browser=>laptop hexchat=>chat server=>desktop hexchat=>desktop tor=>desktop hexchat=>chat server=>laptop hexchat=>bridge

...and I have the wireshark logs to prove it.

Keep in mind that this test should be about twice as slow as a normal hexchat connection since I am crossing the chat server twice as many times as I would in a normal pluggable transport connection.

I am also using three gmail accounts per computer. Perhaps a real proxy server would/should use even more.

I also have some basic error handling if one computer is disconnected from one or more of its accounts.

I would appear that the only thing left to do is to get this to work with GTalk so that anyone can initiate a connect to anyone--regardless of whether the client is on the server's contact list. That, and to limit the ip:ports a server can connect to with a whitelist (but the latter should be quite easy).

Yeah. Are you sure you want to take the whitelist approach? Having the server forward all traffic to a single address might be more convenient.

Specifically, we will need hexchat to have a command-line interface similar to the one described in comment:18, if we want to deploy this without the managed-proxy interface.

comment:56 Changed 6 years ago by feynman

The bad news is I cannot find a way to contact someone on a gmail account if you are not on their roster (I considered having an automated email request, but that seemed to complicated).

The good news is that gmail seems to be fine sending messages to jabber.org accounts regardless of whether they are on your contact list. So the trick is to always have at least one jabber.org account (though any non-gmail one will probably do) and make sure the client always routes connection requests through a jabber account. This would look as follows in as a command line argument:

python hexchat.py -c jid1 'password1' jid2 'password2' ... -s local_ip local_port user@… remote_ip remote_port

This does not mean all data is sent through that one jabber account, just the first connection request. When it replies, it sends a list of other JIDs in an "aliases" stanza.

comment:57 in reply to:  55 Changed 6 years ago by feynman

Replying to asn:

Replying to feynman:

I just tried hexchat with tor and I can safely report success in getting basic webpages to load (very slowly), and even watching a video (with a lot of buffering). I am using the following configuration:

Nice! Do you see a big difference (speed-wise) from normal Tor browsing?

laptop browser=>laptop hexchat=>chat server=>desktop hexchat=>desktop tor=>desktop hexchat=>chat server=>laptop hexchat=>bridge

...and I have the wireshark logs to prove it.

Keep in mind that this test should be about twice as slow as a normal hexchat connection since I am crossing the chat server twice as many times as I would in a normal pluggable transport connection.

I am also using three gmail accounts per computer. Perhaps a real proxy server would/should use even more.

I also have some basic error handling if one computer is disconnected from one or more of its accounts.

I would appear that the only thing left to do is to get this to work with GTalk so that anyone can initiate a connect to anyone--regardless of whether the client is on the server's contact list. That, and to limit the ip:ports a server can connect to with a whitelist (but the latter should be quite easy).

Yeah. Are you sure you want to take the whitelist approach? Having the server forward all traffic to a single address might be more convenient.

I would strongly prefer using a white list approach to maximize versatility. You never know when you might want a server to be able to connect to different IPs.

Maybe one IP is down, and you want it to be able to connect to something else. If the hexchat server is only able to connect to one IP, you would have to restart the server to start working with a different IP. On the other hand, if it can connect to several IPs, then the client can run several instances of hexchat--each listening on a different local ip:port and configured to ask the server to connect to a different bridge. Then Tor can chose between several local IPs as though they were different bridges.

Specifically, we will need hexchat to have a command-line interface similar to the one described in comment:18, if we want to deploy this without the managed-proxy interface.

Just look at the readme file on the git repository. I have been using a command line interface the whole time. Adding a whitelist of servers should not be hard.

comment:58 Changed 6 years ago by feynman

I have the whitelist feature in place and documented in the readme file in the git repository (https://github.com/aeftimia/hexchat).

I *think* I have worked out all the bugs. Is there anything else I should do? I guess I need to comment the code, but other than that, is there anything else that needs to be done before this can become a pluggable transport?

comment:59 in reply to:  58 Changed 6 years ago by asn

Replying to feynman:

I have the whitelist feature in place and documented in the readme file in the git repository (https://github.com/aeftimia/hexchat).

I *think* I have worked out all the bugs. Is there anything else I should do? I guess I need to comment the code, but other than that, is there anything else that needs to be done before this can become a pluggable transport?

Hey there,

whitelist code looks good.

BTW, in your code, it seems like you are using 'return' as a function. In reality, it's simply a statement. This means that 'return()' doesn't return nothing, but instead it returns an empty tuple.
I would suggest transforming 'return(x)' to 'return x'.

Also, would you be interested in turning your CLI parsing code to use argparse? Manually parsing 'sys.argv' is extremely ghetto, not well readable and it doesn't scale well. Cool kids use argparse these days: http://docs.python.org/dev/library/argparse.html

I tested the code a bit and it seems to work (used same JIDs for client/server though). I should test with different JID, then we should do some code cleanups and start thinking of deploying this transport. This means that we should setup a stable/fast bridge, and prepare some testing bundles.

comment:60 Changed 6 years ago by asn

(cont):

argparse will make your CLI parsing much simpler and readable. It will also give you a usage() function for free. For example, check this tutorial: http://pymotw.com/2/argparse/

BTW, I like the client_socket and server_socket separation. I also like how you splitted your code into more functions. It's easier to understand now. Next step would be to split into multiple files. 650 lines of code in a single file are a lot for a python project.

Cheers.

comment:61 in reply to:  60 Changed 6 years ago by feynman

Replying to asn:

(cont):

argparse will make your CLI parsing much simpler and readable. It will also give you a usage() function for free. For example, check this tutorial: http://pymotw.com/2/argparse/

BTW, I like the client_socket and server_socket separation. I also like how you splitted your code into more functions. It's easier to understand now. Next step would be to split into multiple files. 650 lines of code in a single file are a lot for a python project.

Cheers.

I just finished adding argparse, updating the readme file to match the modified flags, and split the project into multiple files.

I also noticed that sleekxmpp uses an RLock for blocking its stream from sending data. To make a long story short, that would probably explain why a messages seemed to disappear every now and then. Now I have the client socket spawn a new thread every time it wants to send data. That should fix the problem.

comment:62 Changed 6 years ago by asn

Check out my branch 'argparse_cli'. It looks like this:
https://gitweb.torproject.org/user/asn/hexchat.git/shortlog/refs/heads/argparse_cli

You might or might not like it. It splits the hexchat.py launcher into two launchers, one for the client-side and another for the server-side. I believe the CLI is smoother now.

You don't see many CLIs with '--client' containing three different things. Also, in your latest code all your args are optional, which breaks badly.

The only functional change in my branch is that the client-side can't spawn multiple XMPP bots anymore (it only accepts a single jid/password pair). Does this matter? I imagined that only the server-side needs to have control XMPP bots. If that's not the case I'll fix it somehow to allow multiple JID/passwords. (The server-side still supports multiple XMPP credentials).

comment:63 in reply to:  62 Changed 6 years ago by feynman

Replying to asn:

Check out my branch 'argparse_cli'. It looks like this:
https://gitweb.torproject.org/user/asn/hexchat.git/shortlog/refs/heads/argparse_cli

You might or might not like it. It splits the hexchat.py launcher into two launchers, one for the client-side and another for the server-side. I believe the CLI is smoother now.

You don't see many CLIs with '--client' containing three different things. Also, in your latest code all your args are optional, which breaks badly.

The only functional change in my branch is that the client-side can't spawn multiple XMPP bots anymore (it only accepts a single jid/password pair). Does this matter? I imagined that only the server-side needs to have control XMPP bots. If that's not the case I'll fix it somehow to allow multiple JID/passwords. (The server-side still supports multiple XMPP credentials).

First of all, the logins and log file should be required command line arguments. I will fix that now.

As for your other changes, I generally think that cutting down on flexibility is a bad idea unless there is a clear tradeoff. Personally, I would rather have the ability to spawn have multiple clients and not need it. I really do not think it takes anything away from the program, and it just adds some extra versatility that someone someday might need.

As for the client connecting with multiple JIDs, that is a must. Just like the server, the client should be able to distribute the load of an internet connection over multiple JIDs (even if it needs far less than a server).

Remember, the only functional difference between a client and a server is that a client can start a conversation. Once the server makes the requested connection, the two bots act identically. There is nothing stopping a bot running as a client from making a connection to another bot running as a client. This is how I do my testing. My laptop's hexchat acts as a client for my browser and a server for my desktop's tor. My desktop's hexchat acts as a client for tor and a server for my laptop's hexchat. This helps me increase the stress on both bots and determine how they react under heaver loads than I could provide otherwise.

comment:64 Changed 6 years ago by feynman

I just want to say that I made a lot of small changes to the git repository, but the net result was very little. I tried a few more things like logging into the same account multiple times (the server disconnected me when I tried to speed up the throttle rate accordingly), having a throttle rate that does not change with the number of accounts being used (I concluded that this was not helping), and different locking mechanisms (the final result has a slightly different locking mechanism).

So, the code now has a little different locking mechanism and a little more effort in keeping it from sending data after getting a disconnect request. These were minor changes, and the code as a whole is pretty much the same.

comment:65 Changed 6 years ago by asn

hi feynman,

I think you asked me in IRC what was the purpose of my 'argparse_cli' branch.

I hoped that it would make the code a bit more elegant, and also make the CLI parsing more meaningful (for example, the --client switch of the current CLI is very weird since it's three things at one). I also added some verbose argparse help strings and some examples.

For example, the argparse configuration in the current code is not entirely correct I think. For example, the '--login' switch should be marked as mandatory (both client and server always use it), but it's not. This means that if I omit the '--login' switch, I will get a python error in while index<len(args.login): instead of a helpful argparse message.

In any case, it's perfectly fine if you don't merge my changes.

BTW, I'm extremely busy these days. However, I hope I can find some time next week (after the weekend) to look at the deployability of hexchat and also check out your last two messages.

Thanks!

comment:66 Changed 6 years ago by feynman

I thought of and implemented a couple more ideas for making the code faster and more robust. The main one is to have the bots actively work to distribute the load evenly rather than have each socket cycle through bots. I also have the bots working to keep the rate at which they send data below a certain threshold (currently 20kb/s).

I chose this method because I recently learned that many chat servers penalize you if you send more bytes than a certain threshold over a given time frame. I also noticed that connections established using hexchat seemed to get slower when used heavily. I figured that having the bots actively work to keep their data rate below a threshold could keep the overall bandwidth more consistent.

I also brought back the "login to each account multiple times" option.

I think this all might be helping because youtube videos seem to load faster (and they no longer prevent me from doing things in other tabs). The client is using 15 logins and the server is using 15 logins while logging into each of them 3 times.

Keep in mind, the code was quite stable before, and I am now just trying to make it better (rather than make it work).

The old code can still be found here: https://github.com/aeftimia/hexchat/tree/master/misc/round_robin

I will continue to update the code and describe the updates here if I think of any more ways of improving upon the design.

comment:67 Changed 6 years ago by feynman

I just made an small but important change in the protocol. With many (i.e. ~10) logins, the aliases field was taking up more space than many of the data stanzas. This means that just sending a list of aliases back and forth is using up a considerable portion of the allocated bandwidth. I had to modify the protocol so that it no longer sends aliases when sending data and disconnect requests. By that point, you can infer the list of aliases from the IQ's "from" field (which as far as I know cannot be spoofed on most chat servers). The alias list is established during the connection process and remembered until the socket disconnect.

This is important because although it scales well with the number of logins you use, it breaks compatibility with the old protocol.

Since this has not yet been deployed with Tor, I suggest using this newer protocol. The code also has the improvements mentioned earlier.

comment:68 Changed 6 years ago by asn

Any experience with this issue on the server-side? It only happens with gmail. Is it because GTalk is not compatible with xep-0030?

$ python hexchat.py --logfile a --login something@gmail.com password
Traceback (most recent call last):                                                                 
  File "hexchat.py", line 50, in <module>                                                          
    master0=master(username_passwords, whitelist, args.num_logins)                                 
  File "/hexchat/master.py", line 89, in __init__                              
    self.bots.append(bot(self, jid_password))                                                      
  File "/hexchat/bot.py", line 38, in __init__                                 
    self.process()                                                                                 
  File "/usr/lib/python2.7/dist-packages/sleekxmpp/basexmpp.py", line 147, in process              
    self.plugin[name].post_init()                                                                  
  File "/usr/lib/python2.7/dist-packages/sleekxmpp/plugins/xep_0199/ping.py", line 76, in post_init
    self.xmpp['xep_0030'].add_feature(Ping.namespace)                                              
AttributeError: 'bool' object has no attribute 'add_feature'

comment:69 in reply to:  68 Changed 6 years ago by feynman

Replying to asn:

Any experience with this issue on the server-side? It only happens with gmail. Is it because GTalk is not compatible with xep-0030?

$ python hexchat.py --logfile a --login something@gmail.com password
Traceback (most recent call last):                                                                 
  File "hexchat.py", line 50, in <module>                                                          
    master0=master(username_passwords, whitelist, args.num_logins)                                 
  File "/hexchat/master.py", line 89, in __init__                              
    self.bots.append(bot(self, jid_password))                                                      
  File "/hexchat/bot.py", line 38, in __init__                                 
    self.process()                                                                                 
  File "/usr/lib/python2.7/dist-packages/sleekxmpp/basexmpp.py", line 147, in process              
    self.plugin[name].post_init()                                                                  
  File "/usr/lib/python2.7/dist-packages/sleekxmpp/plugins/xep_0199/ping.py", line 76, in post_init
    self.xmpp['xep_0030'].add_feature(Ping.namespace)                                              
AttributeError: 'bool' object has no attribute 'add_feature'

I seem to recall having the same problem and resolving it by updating sleekxmpp.

comment:70 Changed 6 years ago by asn

(fixed the issue with my sleekxmpp)

btw, I kind of dislike the fact that we send our local ip:port through XMPP. it's a small but unneeded information leak.

Since (<remote ip/port>, <jid>) is not sufficient for your routing table, why don't you also add the source IP of the other side in there? You can probably get the client's IP using the sleekxmpp API; you don't need the client to send its IP to the server. If that doesn't work, you can get the client to generate a nonce and send it to the server.

Do you think that makes sense?

comment:71 Changed 6 years ago by asn

Also, do you know what is this error:

WARNING:root:<message xmlns="jabber:client" to="blabla@gmail.com/DDA289DC" type="error" from="wowowowow@gmail.com"><connect xmlns="hexchat:connect"><local_ip>127.0.0.1</local_ip><local_port>60776</local_port><remote_ip>32.1.35.12</remote_ip><remote_port>6061</remote_port><aliases>blabla@gmail.com/DDA289DC</aliases></connect><error code="503" type="cancel"><service-unavailable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas" /></error></message>

It appeared on my client-side hexchat. On the client-side I used blabla@…, and on the server-side I used wowowowow@…. I seem to remember you talking about service-unavailable errors.

comment:72 Changed 6 years ago by asn

Also also I'm getting this warning in hexchat:

WARNING:sleekxmpp.xmlstream.cert:Could not find pyasn1 and pyasn1_modules. SSL certificate expiration COULD NOT BE VERIFIED.

I think it's a good idea to completely block hexchat from starting up if it can't validate SSL certificates. What do you say?

comment:73 in reply to:  70 Changed 6 years ago by feynman

Replying to asn:

(fixed the issue with my sleekxmpp)

btw, I kind of dislike the fact that we send our local ip:port through XMPP. it's a small but unneeded information leak.

Since (<remote ip/port>, <jid>) is not sufficient for your routing table, why don't you also add the source IP of the other side in there? You can probably get the client's IP using the sleekxmpp API; you don't need the client to send its IP to the server. If that doesn't work, you can get the client to generate a nonce and send it to the server.

Do you think that makes sense?

The local ip:port is used to uniquely identify a connection--even among several connections between the same client and server. Since the client thinks it is connecting directly to the server, the source ip:port seemed like the perfect unique identifier for that particular connection. Whatever identifier you use, you are going to end up with something that can be uniquely mapped to the client's source IP.

I just finished changing the source ip:port to a SHA512 hash of the source ip:port.

comment:74 in reply to:  71 Changed 6 years ago by feynman

Replying to asn:

Also, do you know what is this error:

WARNING:root:<message xmlns="jabber:client" to="blabla@gmail.com/DDA289DC" type="error" from="wowowowow@gmail.com"><connect xmlns="hexchat:connect"><local_ip>127.0.0.1</local_ip><local_port>60776</local_port><remote_ip>32.1.35.12</remote_ip><remote_port>6061</remote_port><aliases>blabla@gmail.com/DDA289DC</aliases></connect><error code="503" type="cancel"><service-unavailable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas" /></error></message>

It appeared on my client-side hexchat. On the client-side I used blabla@…, and on the server-side I used wowowowow@…. I seem to remember you talking about service-unavailable errors.

That could mean a couple of things:

  1. The server has not finished booting up.
  2. The server has been temporarily disconnected from that JID

Either way, it is not a big deal. Having read this comment, I changed that warning to "<JID> is not available" in the commit from my reply to your previous comment.

comment:75 in reply to:  72 Changed 6 years ago by feynman

Replying to asn:

Also also I'm getting this warning in hexchat:

WARNING:sleekxmpp.xmlstream.cert:Could not find pyasn1 and pyasn1_modules. SSL certificate expiration COULD NOT BE VERIFIED.

I think it's a good idea to completely block hexchat from starting up if it can't validate SSL certificates. What do you say?

I just checked the sleekxmpp code, and I think that happens automatically if you install pyasn1 and pyasn1_modules.

comment:76 in reply to:  70 Changed 6 years ago by feynman

Replying to asn:

(fixed the issue with my sleekxmpp)

btw, I kind of dislike the fact that we send our local ip:port through XMPP. it's a small but unneeded information leak.

Since (<remote ip/port>, <jid>) is not sufficient for your routing table, why don't you also add the source IP of the other side in there? You can probably get the client's IP using the sleekxmpp API; you don't need the client to send its IP to the server. If that doesn't work, you can get the client to generate a nonce and send it to the server.

Do you think that makes sense?

It just occurred to me that you might have thought that by local ip, I meant the external ip:port of the client. I was actually referring to the ip:port of the connected socket that is created after the client hexchat accepts tor's connection.

I wanted to start a new branch with the hashed ip:port protocol, but I gave up and just reverted back to the last commit before I changed that part of the protocol.

I would prefer to send the ip:port of the connected socket rather than hashing the address--it just makes the code cleaner. However, if you still think this is a security risk (even a minor one), I will gladly revert back to hashing the ip:port.

comment:77 Changed 6 years ago by feynman

I played around with storing message buffers in sqlite databases to save ram, but it turned out the database files never went over 1Mb.

However, I decided to stick with using a separate thread to check a buffer for incoming messages. This is slightly more efficient than spawning a new thread every time data could be written (i.e. the old way).

I also fixed a potential problem with the way the program checks a buffer for data to send over the chat server.

All in all, I have some minor changes in the git repository that might help scalability.

comment:78 Changed 6 years ago by feynman

I just added some more complicated (but hopefully better) error handling (e.g. for when a bot gets disconnected while another bot is trying to send it a message). This required adding a new type of stanza, "disconnect_error" and tweaking the protocol a bit. Except for error handling, the program remains backwards compatible.

comment:79 in reply to:  71 ; Changed 6 years ago by feynman

Replying to asn:

Also, do you know what is this error:

WARNING:root:<message xmlns="jabber:client" to="blabla@gmail.com/DDA289DC" type="error" from="wowowowow@gmail.com"><connect xmlns="hexchat:connect"><local_ip>127.0.0.1</local_ip><local_port>60776</local_port><remote_ip>32.1.35.12</remote_ip><remote_port>6061</remote_port><aliases>blabla@gmail.com/DDA289DC</aliases></connect><error code="503" type="cancel"><service-unavailable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas" /></error></message>

It appeared on my client-side hexchat. On the client-side I used blabla@…, and on the server-side I used wowowowow@…. I seem to remember you talking about service-unavailable errors.

I just realized you set up your client to initiate connections to a gmail account. That will not work unless the client is on the servers's contact list. You can still use gmail, just not for initiating connections. You need to have your client run like this:

python hexchat.py --login <lots of logins> --log_file <log file> --client <ip and port> <*SOMETHING OTHER THAN GMAIL***> <ip and port>

The server can use gmail accounts, but it needs at least one account that is not gmail so the client can initiate a connection.

Sorry for the confusion. I should have realized what was going on.

comment:80 Changed 6 years ago by feynman

The good news is I think I found maximum throughput you can send through a given JID.

The bad news is it seems to be 5kb/s.

The other good news is it seems you can log into that same JID as much as you want (use the --num_logins command line argument). So with a few thousand logins, it should be possible to run a tor relay over several chat servers.

The other bad news is that you create a couple of threads every time you login, and a couple more every time you establish a connection. I looked into nonblocking sockets, but that is definitely not an option for sleekxmpp (if you don't believe me, look at the read_xml function in the xmlstream.py file).

Nonblocking sockets might be an option for the client sockets (the ones that are not being used to talk to the chat server) but they would still require some locking for each write and possibly each read. I was worried that the locking would hold up reading/writing to other sockets, and I thought it would be safer to got with the blocking+threading technique.

Let me know if you think the blocking+threading technique is a terrible idea, and I will look into nonblocking+select+sequential read/writes.

Until this proves to not be scalable (or I am told otherwise), I am sticking with the blocking+threading technique.

The bootup takes around 15 minutes for ~300 connections, so testing the program with a large numbers of connections is a very slow process. I am working my way to a few thousand connections (which will have to boot up overnight), and I will keep this ticket updated with my progress.

comment:81 Changed 6 years ago by feynman

I just realized that with a few thousand connections, the aliases stanza would probably be too large to send. I updated the protocol to compress the aliases stanza before sending it. Since there is a lot of redundancy in the JIDs, this should help a lot.

Also, I just timed about 5 minutes to boot up ~300 connections not 15.

comment:82 Changed 6 years ago by feynman

This list may grow, but these chat serves do not like you logging into them from the same IP (regardless of how many JIDs you are using) 100 times:

*Gmail (unfortunately)
*jabber.se

Therefore, they are probably not good candidates for logins the server could use.

These however are OK:

*jabber.org
*jabber.dk

comment:83 Changed 6 years ago by feynman

Here is a more complete list of jabber severs that let you login at least 100 times:

  1. jabber.org
  2. jabber.dk
  3. jabme.de
  4. jappix.com
  5. jabberzac.org
  6. twattle.net
  7. rkquery.de
  8. miqote.com
  9. jabber.hot-chilli.net (At least the first three servers listed in their registration page: http://jabber.hot-chilli.net/account/create/)

That should be enough to run a tor relay if you log into each server at least 100 times.

comment:84 in reply to:  83 Changed 6 years ago by feynman

Replying to feynman:

Here is a more complete list of jabber severs that let you login at least 100 times:

  1. jabber.org
  2. jabber.dk
  3. jabme.de
  4. jappix.com
  5. jabberzac.org
  6. twattle.net
  7. rkquery.de
  8. miqote.com
  9. jabber.hot-chilli.net (At least the first three servers listed in their registration page: http://jabber.hot-chilli.net/account/create/)

That should be enough to run a tor relay if you log into each server at least 100 times.

I just found out that you need to install dnspython to connect to some of the above chat servers. For python3, you need to install dnspython like this:

git clone http://github.com/rthalley/dnspython
cd dnspython
git checkout python3
python3 setup.py install

comment:85 Changed 6 years ago by feynman

Compressing the aliases was doing more harm than good. I might need to find another way to send the aliases if just separating them with commas proves impractical.

comment:86 Changed 6 years ago by feynman

I have updated the code and protocol-spec to provide aliases in an XML form as follows:

Make stanzas of the form:
<server>

<user>

resource1,resource2,...,resourceN

</user>

</server>

, for each JID of the form user@server/resrouce. This eliminates redundancy in specifying a lot of servers and usernames that are the same.

comment:87 Changed 6 years ago by feynman

I experimented with nonblocking sockets and found that they clean up a lot of code. I decided to stick with them. The old code is in misc/blocking.

I also added a feature in which clients and servers can only give out a maximum number of aliases. This helps scalability, and should help prevent a single connection from hogging too much bandwidth (since each alias can only provide 5kb/s). The aliases are chosen in such a way that all the aliases share connections pretty evenly.

comment:88 Changed 6 years ago by feynman

I just loaded 4 and a half minutes of 360p youtube video with no buffering.

The server had 125 connections to chat servers, and client had 20, and I have the maximum number of aliases set to 40 (giving a maximum bandwidth per connection of 200kb/s).

comment:89 in reply to:  79 ; Changed 6 years ago by asn

Replying to feynman:

Replying to asn:

Also, do you know what is this error:

WARNING:root:<message xmlns="jabber:client" to="blabla@gmail.com/DDA289DC" type="error" from="wowowowow@gmail.com"><connect xmlns="hexchat:connect"><local_ip>127.0.0.1</local_ip><local_port>60776</local_port><remote_ip>32.1.35.12</remote_ip><remote_port>6061</remote_port><aliases>blabla@gmail.com/DDA289DC</aliases></connect><error code="503" type="cancel"><service-unavailable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas" /></error></message>

It appeared on my client-side hexchat. On the client-side I used blabla@…, and on the server-side I used wowowowow@…. I seem to remember you talking about service-unavailable errors.

I just realized you set up your client to initiate connections to a gmail account. That will not work unless the client is on the servers's contact list. You can still use gmail, just not for initiating connections. You need to have your client run like this:

Ugh, I see.

Hm, what do you think is going to be the workflow of users who need to get an XMPP account to use with hexchat? If our server-side hexchat bot is a @gmail.com JID, how are users going to create non-gmail JIDs? And how do they connect with them to Gtalk's servers?

Also, if users want to have N aliases (to increase their speed), they also need to create N JIDs, right?

comment:90 in reply to:  89 Changed 6 years ago by feynman

Replying to asn:

Replying to feynman:

Replying to asn:

Also, do you know what is this error:

WARNING:root:<message xmlns="jabber:client" to="blabla@gmail.com/DDA289DC" type="error" from="wowowowow@gmail.com"><connect xmlns="hexchat:connect"><local_ip>127.0.0.1</local_ip><local_port>60776</local_port><remote_ip>32.1.35.12</remote_ip><remote_port>6061</remote_port><aliases>blabla@gmail.com/DDA289DC</aliases></connect><error code="503" type="cancel"><service-unavailable xmlns="urn:ietf:params:xml:ns:xmpp-stanzas" /></error></message>

It appeared on my client-side hexchat. On the client-side I used blabla@…, and on the server-side I used wowowowow@…. I seem to remember you talking about service-unavailable errors.

I just realized you set up your client to initiate connections to a gmail account. That will not work unless the client is on the servers's contact list. You can still use gmail, just not for initiating connections. You need to have your client run like this:

Ugh, I see.

Hm, what do you think is going to be the workflow of users who need to get an XMPP account to use with hexchat? If our server-side hexchat bot is a @gmail.com JID, how are users going to create non-gmail JIDs? And how do they connect with them to Gtalk's servers?

The server-side hexchat bot can have one or more @gmail.com JIDs, but it needs to have at least one that is not gmail. The clients can then initiate connections to the non-gmail account with any type of account they like (gmail or not). When the server replies, it will send a subset of all the JIDs it has logged in with. The client can then send data and/or disconnect requests to any of these JIDs.

The server keeps track of which JIDs it sent for which connection, and only sends messages from one of the JIDs it sent during the connection phase.

Keep in mind that the client also sends a list of JIDs during the connection phase and keeps track of them in the same way the server does.

Here is how it works:

  1. Client sends connection request with a subset of all the JIDs it is using. Call this set "client JIDs".
  1. Server sends a connect_ack. If the connection was successful, this includes a subset of all the JIDs it is using. Call this set "server JIDs".
  1. When the client wants to send data, it sends a message from an element of "client JIDs" to an element of "server JIDs". When the server wants to send data, it sends a message from an element of "server JIDs" to an element of "client JIDs".

Clients and servers can send messages to arbitrary JIDs after the connection is established because "client JIDs" and "server JIDs" are full JIDs that include resources. Full JIDs are of the form user@server/resource. Resources are sometimes assigned randomly (even if you request one)--as is the case for gmail.

You cannot send an IQ stanza to a JID of the form user@server (i.e. without the resource) but you can send a message stanza to a JID without a resource. That is how clients initiate connections--at least the first time. After it gets a response, it adds the responding JID to a dict that keeps track of possible full JIDs to send connection requests to. From then on, it sends connection requests via IQ--at least until it gets an error. When it gets an error, it deletes the JID from its dict.

This all works fine for everything but gmail. You can only send messages to gmail JIDs without resources if you are on that JID's contact list. I found no way around that problem, so you cannot initiate connections to gmail accounts. Period.

Also, if users want to have N aliases (to increase their speed), they also need to create N JIDs, right?

Sort of. I found you can log into many accounts about 20 times from the same IP address before it refuses the connection. Here is a list of servers that let you login at least 100 times before refusing the connection:

https://trac.torproject.org/projects/tor/ticket/9022?replyto=89#comment:84

Some of those servers are very slow, but the jabber.* ones seem fine.

You can get away with logging into the same account multiple times because chat servers give you a random resource by default--thus giving you a unique full JID of the form user@server/resource--which is what you are going to send data to anyway.

Note that messages sent to a JID without a resource will be sent to all JIDs of that form. So a message sent to user@server will go to everyone logged into user@server--regardless of their resource. I have already implemented code that guarantees only one of those resources will actually execute the connection process.

You can specify the number of times to login to each JID with the --num_logins option. The default is 1.

comment:91 Changed 6 years ago by feynman

I just finished commenting/documenting the code and rewriting the protocol-spec.

The lines in the protocol-spec and README file should all be wrapped at 80 characters.

Next I am going to start working on integration testing.

My goals are:

a) Ensure every byte that is sent is either received in order or dropped (if the recipient disconnected as data was being sent).

b) Find the optimum bandwidth to allocate to each socket.

The latter can be done by sending data at different rates from one hexchat instance and measuring the rate at which they are received by another hexchat instance. I can then plot the data and see if there is a well defined maximum on the receiving end.

comment:92 Changed 6 years ago by feynman

I completed an integration test that confirmed every byte that was sent was either received or dropped because of a time delay during the disconnect process (see the above comment).

I also added and documented a new flag for running the program: --take_measurements. This will periodically measure and record the rate at which sockets are receiving data to be sent over the chat server. This feature has been incredibly useful for determining the optimum bandwidth to allocate each socket.

I was able to confirm you need about 64kb/s per connection to stream Youtube videos. When you watch a Youtube video, your browser opens a bunch of connections. The exact number changes a lot (most are really short-lived), but it seems peaks at around 60, and stabilize at around 10. However, Youtube only seems to use two or three of these connections heavily when streaming a video.

Keep in mind that this data was gathered through hexchat, and not an ordinary connection. So this may or may not apply to streaming videos directly.

Also, please let me know if my documentation is unsatisfactory. I am still a little unclear as to how much I should or should not say in the comments, so some feedback would be helpful.

comment:93 Changed 6 years ago by feynman

I made a slight change in how sockets receive data. I have them calculate the maximum number of bytes they can received before they actually read any data. This seemed to increase efficiency because I can now watch the same Youtube videos on only 32kb/s allocated to each socket.

comment:94 Changed 21 months ago by teor

Severity: Normal

Set all open tickets without a severity to "Normal"

Note: See TracTickets for help on using tickets.