Opened 8 years ago

Closed 7 years ago

#5280 closed defect (invalid)

obfsproxy transport functions do not understand TCP

Reported by: asn Owned by: asn
Priority: Medium Milestone:
Component: Archived/Obfsproxy Version:
Severity: Keywords:
Cc: hellais Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Imagine you are trying to pass a large amount of data (more than 4096
bytes) over the client-side of a transport. In this case, libevent
calls downstream_read_cb() every 4096 bytes of data (is it
EVBUFFER_MAX_READ?), which in turn calls proto_send(). This means that
if a transport has headers, they are appended every 4096 bytes. Then,
during proto_send(), the obfuscated data are appended to the 'dest'
evbuffer.

If the file was bigger than 4096 bytes, proto_send() will be called
again (for the rest of the data), and a header will be added
again. Then the new 4096 bytes of obfuscated data will also be added
to the 'dest' evbuffer. Then on the next libevent loop, libevent will
send all the buffered data to the wire. This means, that on the wire
we will see a big fragmented TCP packet with multiple transport
headers on it.

How can we guarantee to our transports that their headers will be
appended only to new TCP packets? A fragmented TCP packet with two
bunches of HTTP headers in it doesn't happen in real life.

Should we somehow implement the "Don't add stuff to 'dest', if 'dest'
already has stuff in it."? Should we add it as the default behavior of
proto_send? How much will it cripple performance? Does this make
sense?

As a secondary question, does EVBUFFER_MAX_READ add fingerprints in
our traffic, when we send large amounts of data?

(Arturo encountered this issue when implementing his HTTP transport,
so I'm CCing him.)

Child Tickets

Change History (4)

comment:1 Changed 8 years ago by nickm

You're a little mistaken about the rules for when libevent calls a read callback.

The only things guaranteed about read callback is that you will get a read callback when a read adds data to a bufferevent's input buffer, and that input buffer is greater greater than the read low watermark for that bufferevent.

The 4096-bytes-at-a-time thing is an implementation detail that *will* change in future versions; you can't count on it at all. It's also not necessarily true: if read high watermarks are enabled, or rate limiting is enabled, Libevent may read less at a time. Also, if there are less than 4096 bytes to read, Libevent will read them, and then still call the read callback if the conditions in the last paragraph are met.

Similarly, the amount of data that will get written at a time, or that will get written with a single syscall, is not specified by the Libevent API. Generally, it's "as much as possible," except if rate-limiting is enabled.

Bufferevents doesn't have access to the actual boundaries of TCP packets, since the Berkeley Sockets API doesn't provide that. (Unless if you're building your own packets with raw sockets or something, but that's not so standard.) If I call send() twice in succession, that might make 3 TCP packets, or it might make 1, or it might make 0 if the kernel is waiting for the other side to ack data that's already been sent. So at best, all you can hope to do is get some control over what syscalls are called, not over what packets are generated directly. If you can rephrase what you need in terms of syscalls, there might be a prayer of getting it, but if you actually need direct control over packets, bufferevents can't get you that.

comment:2 Changed 8 years ago by hellais

I am not sure how to rephrase this in terms of syscalls, but I can tell you how I ran into this problem.

Basically when writing the HTTP transport I need to slap some header on top of packets I send that are managed inside of an evbuffer. I add headers both at the beginning of the packet and at the end of it. The problem is that when a packet is bigger than 4096 bytes it gets fragmented and I should be able to slap the begin header on the first fragment and the end header on the last one.
If I don't do this there are problems when stripping the headers off of the packets and reading the content.

Is there a way to understand if the current chunk is the last one?

comment:3 Changed 8 years ago by nickm

By "packet" here, do you literally mean a TCP packet, or what? 4096 bytes is a pretty high MTU for stuff over the real internet.

What do you mean by "if the current chunk is the last one"? I guess that you don't mean "the last one because the connection has just closed". What does it mean for a piece of data to be the "last" within a TCP stream if it isn't at the end?

As I understand it, you are saying that data is coming in like

<AAAAAAA><AAAAAA><AAAAAAA>

and you want to stick a header and a footer around every <AAAAAAA>.

Is each of these <AAAAAA>s literally a TCP packet, or what? There is no reliable way to discover TCP packet boundaries from the Sockets protocol. You can *typically* infer that when you have read everything the kernel has to tell you, you have read some integer number of TCP packets, but you have no way to know whether you read one packet or more than one.

comment:4 Changed 7 years ago by asn

Resolution: invalid
Status: newclosed

I think these days we understand TCP/IP a bit better. I guess I can close this ticket :)

Note: See TracTickets for help on using tickets.