Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#12778 closed enhancement (fixed)

Put meek HTTP headers on a diet

Reported by: dcf Owned by: dcf
Priority: Medium Milestone:
Component: Obfuscation/meek Version:
Severity: Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Let's shorten the headers added by meek-client and meek-server where we can, to reduce the overhead of each request. I did some calculations recently and the overhead was greater than I expected, about 85% when the client sends a single Tor cell.

Here's a header sent by the Firefox meek-http-helper in the 3.6.2-meek-1 bundles, which use meek 0.7:

POST / HTTP/1.1\r\n
X-Session-Id: RAIzBBZBR5FFKxii7TBOldDAXUsBYI5+GhSKQPaQO6s=\r\n
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0\r\n
Host: meek-reflect.appspot.com\r\n
Content-Type: application/octet-stream\r\n
Content-Length: 543\r\n
Connection: keep-alive\r\n
Accept-Language: en-US,en;q=0.5\r\n
Accept-Encoding: gzip, deflate\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
\r\n

It's 413 bytes (which can vary a bit depending on the Host and Content-Length headers). When it gets wrapped in its own TLS Application Data record, it adds about 50 bytes (the ciphersuite I get with Google is one that has to pad up to a block length).

(BTW I got the header by disabling the headlessness of the browser extension, opening the browser console with Ctrl+Shift+J, and clicking on a request.)

A Tor cell is 514 bytes, and inside a TLS Application Data record it is 543 bytes. Therefore the overhead for sending one cell is (413+≈50)/543 ≈ 85%. Of course, the overhead is less when several cells are sent at once: ≈43% for two, and ≈28% for three.

Stuff set by meek-client that we could reduce:

  • X-Session-Id: is 32 bytes (44 base64-encoded); could be 16 (24).
  • Content-Type: is unnecessary, I think; remove it.

Stuff added by Firefox that we could reduce:

  • User-Agent: could probably be removed.
  • Accept-Language: could probably be removed.
  • Accept-Encoding: could probably be removed.
  • Accept: could probably be removed.

Stuff we should leave alone:

  • Host
  • Content-Length
  • Connection

With meek-client changes we could save up to 60 bytes, and with meek-http-helper changes we could potentially save up to 217 bytes, leading to a header as small as 136 bytes, or an overhead of (136+≈50)/543 ≈ 34% when sending one Tor cell; ≈17% for two; and ≈11% for three.

We should also check what the server's response headers.

(NB not that I think HTTP header overhead is the main cause of perceived slowness; I'll bet serialization of requests has a bigger effect.)

(What about SPDY? Does it have smaller headers? Yes, good thought. It is actually possible to use SPDY with the Chrome extension. But Chromium doesn't allow you to override the Host header when you use SPDY (Chromium #364319), so it doesn't work.)

Child Tickets

Change History (7)

comment:1 Changed 3 years ago by dcf

b1f6a7ec removed Content-Type from meek-client.

I tried removing Content-Type from the Firefox extension (passing an empty string for the aContentType argument to nsIUploadChannel.setUploadStream, but it didn't work. Apparently setting an empty string also prevents the Content-Length header from being added, so the POST just hangs forever, at least on www.google.com.

I didn't try removing it from the Chrome extension.

Last edited 2 years ago by dcf (previous) (diff)

comment:2 follow-up: Changed 3 years ago by dcf

Here's what server headers look like.

Google App Engine, 174 bytes.

HTTP/1.1 200 OK\r\n
Server: Google Frontend\r\n
Date: Mon, 04 Aug 2014 07:08:40 GMT\r\n
Content-Type: application/octet-stream\r\n
Content-Length: 65536\r\n
Alternate-Protocol: 443:quic\r\n
\r\n

Amazon CloudFront, 321 bytes.

HTTP/1.1 200 OK\r\n
X-Cache: Miss from cloudfront\r\n
X-Amz-Cf-Id: Wq5WzKia5-NimKbhVoTr0SJ9D7i4RRtgWxqcqPniq-GqqNEhOjyyXA==\r\n
Via: 1.1 d368aa75357ba38c7d574850f7952d23.cloudfront.net (CloudFront)\r\n
Transfer-Encoding: chunked\r\n
Date: Mon, 04 Aug 2014 07:10:35 GMT\r\n
Content-Type: application/octet-stream\r\n
Connection: keep-alive\r\n
\r\n

It doesn't look like there's much in either case that we have control over. Both have Content-Type: application/octet-stream, even though the server does not set that header explicitly. It's probably being done implicitly by http.DetectContentType.

comment:4 in reply to: ↑ 2 Changed 3 years ago by dcf

Replying to dcf:

Here's what server headers look like.

Google App Engine, 174 bytes.

HTTP/1.1 200 OK\r\n
Server: Google Frontend\r\n
Date: Mon, 04 Aug 2014 07:08:40 GMT\r\n
Content-Type: application/octet-stream\r\n
Content-Length: 65536\r\n
Alternate-Protocol: 443:quic\r\n
\r\n

Amazon CloudFront, 321 bytes.

HTTP/1.1 200 OK\r\n
X-Cache: Miss from cloudfront\r\n
X-Amz-Cf-Id: Wq5WzKia5-NimKbhVoTr0SJ9D7i4RRtgWxqcqPniq-GqqNEhOjyyXA==\r\n
Via: 1.1 d368aa75357ba38c7d574850f7952d23.cloudfront.net (CloudFront)\r\n
Transfer-Encoding: chunked\r\n
Date: Mon, 04 Aug 2014 07:10:35 GMT\r\n
Content-Type: application/octet-stream\r\n
Connection: keep-alive\r\n
\r\n

It doesn't look like there's much in either case that we have control over. Both have Content-Type: application/octet-stream, even though the server does not set that header explicitly. It's probably being done implicitly by http.DetectContentType.

It turns out that even if you disable the Content-Type header at meek-server (with w.Header()["Content-Type"] = nil), Google and Amazon will just add it back anyway. And when the response body is empty, they will set it to "text/plain; charset=utf-8", which is even longer than "application/octet-stream". So in fa5fbb807a5d81cd234caf402f262d4a5de0533e I just set the Content-Type header unconditionally.

Last edited 2 years ago by dcf (previous) (diff)

comment:5 Changed 2 years ago by dcf

  • Resolution set to fixed
  • Status changed from new to closed

I shortened the session ID (to 8 bytes, 11 base64-encoded) in 4812b9a8.

I removed extraneous headers from the Firefox helper in 0e6ced86.

In order to shorten meek-client's session ID, I had to lower meek-server's requirement for session ID length in c8f2dd1e. The requirement is a guard against client misimplementations (like someone forgetting to send the X-Session-Id header), but it was stricter than it needed to be. The operators of the meek-amazon, meek-azure, and meek-google backends have already upgraded. If someone uses an client with short IDs with an unupgraded server that requires longer IDs, the connection will fail with a 400 Bad Request error.

The client header, with the Firefox helper, now looks like this, about 162 bytes:

POST / HTTP/1.1\r\n
X-Session-Id: NTfizNtP+EU\r\n
Host: meek-reflect.appspot.com\r\n
Content-Type:application/octet-stream\r\n
Content-Length: 543\r\n
Connection: keep-alive\r\n
\r\n

comment:6 Changed 2 years ago by dcf

For posterity, here is the Microsoft Azure server header, 167 bytes:

HTTP/1.1 200 OK\r\n
X-Powered-By: ASP.NET\r\n
Server: Microsoft-IIS/8.0\r\n
Date: Fri, 26 Dec 2014 08:40:49 GMT\r\n
Content-Type: application/octet-stream\r\n
Content-Length: 543\r\n
\r\n

comment:7 Changed 2 years ago by dcf

A back-of-the-envelope calculation of potential savings: In November 2014 we had 167 M requests and 1269 GB on meek-amazon. If we saved 413 − 162 = 251 bytes on each request, that's 167 M × 251 B = 39 GB, or 3% of total bandwidth, or somewhere between $4.50 and $7.50 in cost, depending on the region. These numbers are approximate because TLS pads application data records so we don't save exactly 251 bytes on each request, and additionally the Host and Content-Length headers can change the length of the header.

Note: See TracTickets for help on using tickets.