Opened 6 years ago

Last modified 4 years ago

#12857 new enhancement

Use streaming downloads

Reported by: dcf Owned by: dcf
Priority: High Milestone:
Component: Circumvention/meek Version:
Severity: Normal Keywords:
Cc: jab Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


How meek works now is, the client reads a small chunk of data from tor (up to 64 KB) and sends it in the body of a POST request. The server receives it, reads a small chunk of data from tor (up to 64 KB), and sends it back in the response. The client doesn't make another request until it has received the response to the first one, in order to keep all the chunks of data in order.

Here's what would be better. The client sends a small chunk of data. The server sends a response header and any data it has pending, and leaves the response channel open. The server can use the chunked transfer-encoding to send a body of indeterminate length. The server sends downstream data over the existing streaming response channel until it receives another request. At that point, it closes the response channel and opens up a new one (which will be the response to the just-received request).

The advantages are mainly about performance: there's no client polling; there's less HTTP header overhead (see #12778); and the client can send data (send a request) whenever it feels like it, without waiting for the server's most recent response, if the underlying HTTP library supports pipelining.

Psiphon has already implemented something like this. There's a bit of difficulty in that the golang HTTP server doesn't notify you of new requests while you're still sending a response on the same keep-alive channel. Their workaround (and I think it is a good one) is to put a timeout on the download streaming, so that a long response won't block upstream data forever.

We'll need to overhaul the web browser extensions, because they currently assume requests and responses with sizes known in advance.

This approach won't work with Google App Engine, because App Engine doesn't support streaming downloads. But it should work with CloudFront. See #12428 for how to improve performance with App Engine.

Child Tickets

Change History (6)

comment:1 Changed 6 years ago by jab

This approach won't work with Google App Engine, because App Engine ​doesn't support streaming downloads.


3.5 years after requesting this feature, apparently they're now taking it into consideration. Quoting rejea...@… 10 minutes ago:

I have forwarded this request to the engineering team. We will update this issue with any progress updates and a resolution.

For whatever it's worth!

comment:3 Changed 5 years ago by jab

Cc: jab added

comment:4 Changed 5 years ago by dcf

I've been working on this feature in a stream branch (currently at d3c344ea50).

At this point, the branch only does two things:

  1. Allows meek-client to read response bodies of any length (previously it was limited to 64 KB).
  2. Changes the protocol between meek-client and the helper to be able to express bodies of any length (previously the body was a base64-encoded part of the JSON blob, so it could not be too large and could not be streamed piece by piece).

By themselves, these changes don't change anything (though they are safe to merge and compatible with the current network). They will also need a change to meek-server that allows it to send response bodies of any length (currently they are limited to 64 KB). But this will need a protocol change, as currently deployed meek-clients only look at the first 64 KB of a response. What I'm thinking of is sending a new version header field (like "X-Meek-Version: 1" or "X-V: 1"); when meek-server sees that, it knows it can should send bodies without a length limit. If meek-server does not receive the header, then it will limit itself to 64 KB as before.

comment:5 Changed 4 years ago by dcf

Severity: Normal

New link for the Go not supporting notification of new requests on pipelined connections while handling the previous request ( link in the description is broken):

		// HTTP cannot have multiple simultaneous active requests.[*]
		// Until the server replies to this request, it can't read another,
		// so we might as well run the handler in this goroutine.
		// [*] Not strictly true: HTTP pipelining.  We could let them all process
		// in parallel even if their responses need to be serialized.

Still the case in 1.6.2:

comment:6 in reply to:  5 Changed 4 years ago by elimisteve

Note: See TracTickets for help on using tickets.