Opened 23 months ago

Last modified 3 weeks ago

#26923 new project

Intent to create Pluggable Transport: HTTPS proxy

Reported by: sf Owned by:
Priority: Medium Milestone:
Component: Circumvention/Pluggable transport Version:
Severity: Normal Keywords: httpsproxy
Cc: dcf, hiro, dmr, phw, cohosh Actual Points:
Parent ID: Points: 30
Reviewer: Sponsor: Sponsor28-can



HTTP CONNECT method is one of the standard ways to proxy internet traffic, which is used both in HTTP/1.1 and HTTP/2. HTTPS traffic is very popular on the web, and pluggable transports could benefit from this fact. There's very high collateral damage that would result from full HTTPS blocking, and it adds diversity to PTs’ shapes because most current PTs do not resemble HTTPS.

Usage of HTTPS proxies also helps with active probing: a proxy can be an actual web server that serves content, as opposed to circumvention technologies, that don't show any apparent collateral damage nor respond in any way, when probed. To a prober that doesn't have correct credentials, httpsproxy server can look like a real web server, if it is a real web server.

Way to use it HTTPS proxies with Tor

Naive proxy

Given correct credentials, user can request any standard forwardproxy on the web to connect to Tor. Client establishes TLS connection to the web proxy, and sends request in a form of

Proxy-Authorization: Basic dXNlcjpwYXNz

where is address of arbitrary vanilla Tor entry node. Web Server would establish tcp connection to this address and relay subsequent traffic to it.

Such an approach allows us to use a diverse set of standard proxies: a webproxy is easy to set up and does not need to speak Tor. However, the web proxy operator will likely want to whitelist Tor entrance nodes in order to prevent abuse. As such, they would benefit from talking to some sort of https-proxy-authority, which would provide an entrance node(s) to whitelist, and allow proxies to let Tor Project know that their servers could be used as a proxy.

While lack of server-side PT makes it easier to deploy, it also means we cannot collect metrics.

Full Bridge

A full bridge runs a Tor entry node, a pluggable transport and an upstreaming frontend webserver. The upstreaming webserver would check credentials, and, instead of consuming CONNECT requests, it would upstream them into the pluggable transport ExtORPort, while also stapling client’s IP to it in a header. The PT would parse the IP from the HTTP request header, and pass it to ExtORPort, thus enabling metrics collection.

Registering with BridgeDB

As it currently stands, bridges have to have an ORPort open to be registered with BridgeDB #7349
This leads to easy identification and blocking of bridges. However, we can still register bridge lines with BridgeDB, if we add an additional hop to an intermediate proxy before entering a bridge. A censor would only be able to observe the address of the intermediate proxy.

Having such a 2-hop setup is a natural property of Naive Proxy, as described above. Bridge line example:

httpsproxy [vanilla entry addr] [entry fingerprint] url=

We can use 2-hop approach with full bridges as well: the intermediate proxy would forward HTTP request (preferably with client IP in “Forwarded: for=IP:port” header). In this case, intermediate proxy just redirects all requests (as long as credentials are correct) to the chosen full bridge(s), which is essentially a reverse proxy -- a widely supported technology.

While the second hop adds overhead, there's a benefit in not requiring would-be proxy operators to run a full bridge, since configuration of a proxy now becomes substantially easier, and, ideally, would amount to adding a few lines to a web server config file and registering themselves w/ bridgeDB via some script. Not requiring them to install, configure and run both PT and Tor daemons may allow us to attract a bigger amount of volunteers for the entrance servers.

However it’s unclear which party and how would actually register the bridge line. Perhaps, a separate https-proxy-authority could do that (and provide web proxies with entries to use)

Current prototype

Works with standard HTTP/1.1 and HTTP/2.0 proxies with both naive proxies and full bridges. If there's an interest in seeing current prototype, I would gladly share it, @dcf already created ticket for the repo creation #26793.


Both client and server are implemented in Golang. Relatively safe, cross-platform language.


Bandwidth overhead depends on aggressiveness of padding, but I would not expect goodput to drop below 80%, especially for high-bandwidth workloads, which should mostly consist of MTU-sized packets. Detailed evaluation would be done after padding is implemented.
Computational overhead amounts to TLS handshake per flow plus the usual connection management.


Running a real web server helps, however there are multiple potential fingerprintabilities. Those include:

Probing web server with proxy requests without a secret

By default, web servers with this sort of forward proxying enabled will respond to unauthenticated proxy requests with “407 Proxy Authentication Required”, whereas a web server without forwardproxying enabled will respond differently, stating that it's not a proxy and doesn't want your CONNECT requests.
It would be beneficial to hide the fact of proxying (although note that this doesn't give out proxy as a Tor proxy, just that forward proxying is enabled). This feature is already supported by Caddy web server (see "probe_resistance" option), which is used for the current implementation.

TLS ClientHello fingerprinting

meek has been blocked before based on its TLS ClientHello at least twice. There is a library called utls that provides the ability to mimic arbitrary ClientHello messages. It uses real world data from to learn what it should mimic based on provided collateral damage, and allows developers to confirm the correctness of their mimicking. In the event of any particular "fingerprint" being blocked or incorrectly mimicked, this transport would use multiple "fingerprints" and cycle through them until an unblocked one is found.

Other TLS fingerprinting

Evaluation of other TLS handshake messages and TLS records, and how they may differ from mimicked implementations remains a TODO.

Traffic Size Patterns

The current prototype doesn't use padding yet, and traces generated by it look extremely fingerprintable by constantly generating packets of size CELL_SIZE * N + constant overhead.

We intend to address this problem shortly by splitting and padding http/2 frames to resemble common web traffic.
There is no standard way to pad http/1.1 that will work with standard web proxies, but we can probably split the cells.

Connection establishment traffic patterns

This is especially relevant to 2-hop approaches: the client might have to wait for the first response for a long time, while the proxy establishes connection. This is an issue for many proxies, which is also possible to solve, just noting it requires attention and solution.

Connection lifetime

Being connected to the same server for prolonged periods of time (HTTPS tunnel may work fine for hours, if not days) could be a distinguishing feature. Client should redial at least once an hour. TODO

Child Tickets

#26793closedtor-gitadmCreate /pluggable-transports/httpsproxy repoInternal Services/Service - git

Attachments (1)

httpsproxy.tar.bz2 (14.6 KB) - added by sf 22 months ago.
httpsproxy PT source code

Download all attachments as: .zip

Change History (16)

comment:1 Changed 23 months ago by dcf

Keywords: pt httpsproxy added

Changed 22 months ago by sf

Attachment: httpsproxy.tar.bz2 added

httpsproxy PT source code

comment:2 Changed 22 months ago by dmr

Cc: dmr added

comment:3 Changed 17 months ago by arma

Sponsor: Sponsor19

comment:4 Changed 16 months ago by teor

Owner: asn deleted
Status: newassigned

asn does not need to own any obfuscation tickets any more. Default owners are trouble.

comment:5 Changed 16 months ago by cohosh

Status: assignednew

tickets were assigned to asn, setting them as unassigned (new) again.

comment:6 Changed 13 months ago by phw

Cc: phw added
Points: 30

Thanks a lot for filing this, sf! Has there been any further development since you posted the code on here?

Here are some thoughts after I had a look at the code and design:

  • I just set up an httpsproxy bridge. It looks like the caddy dependency has an outdated dependency on but the current version seems to expose After updating this in my local caddy clone, I got httpsproxy to work. There's also a dependency on but this repository doesn't seem to exist anymore.
  • I quite like the idea of naive proxies because it solves the problem of "we need to expose a bridge's OR port" (#7349) and already-deployed web servers host "natural" content that won't look suspicious to censors. All of this, however, comes at a cost: Getting httpsproxy bridges registered in BridgeDB will be tricky. Bridges make it into BridgeDB by first sending their bridge descriptor to the bridge authority, which periodically copies all bridges it collected over to BridgeDB. The bridge authority doesn't know how to speak anything other than vanilla Tor to bridges. Naive proxies aren't bridges, and would thus have to make it into BridgeDB over a different channel. A snowflake-style broker may be a better solution here, so we don't have to deal with BridgeDB altogether, but we also have to have a bridge distribution strategy that makes it difficult to enumerate all httpsproxies. Snowflake doesn't have to deal with this because its proxies are short-lived.
  • For bridge operators who set up an HTTP server just for the sake of running httpsproxy, I worry that the hosted content will be easy to attribute to httpsproxy. I don't think we can expect bridge operators to be creative and figure out what non-fingerprintable content to host, so we should have an automated solution to this.
  • The re-dialing approach to fix the connection lifetime fingerprint sounds fine. We should also randomise the times at which clients re-dial. It also makes me wonder how much of the web would break if a censor were to reset TCP connections to web servers after, say, 30 seconds.
  • Your "probe a web server without credentials" attack worries me too. I would expect web servers that support CONNECT to be very rare, to a point where a censor is willing to block them all, but my intuition may be off here. Also, with its active probing infrastructure, the GFW seems well-equipped to start such an attack whenever it pleases.

comment:7 Changed 13 months ago by phw

Keywords: anti-censorship-roadmap-maybe added

comment:8 Changed 12 months ago by gaba

Keywords: anti-censorship-roadmap added

Adding this tickets to the backlog.

comment:9 Changed 12 months ago by gaba

Keywords: anti-censorship-roadmap-maybe removed

comment:10 Changed 12 months ago by phw

Sponsor: Sponsor19Sponsor28-can

Moving from Sponsor 19 to Sponsor 28.

comment:11 Changed 11 months ago by gaba

Keywords: anti-censorship-roadmap-september added; pt anti-censorship-roadmap removed

comment:12 Changed 4 months ago by sf

Just noticed the response by phw.

Development of this transport was put on hold after I didn't manage to convince Golang devs to add padding support at a user-accessible layer in x/net/http2. This seemed to be the easiest way to add padding, and since that didn't work out, we'd have to find another way.
I think the way to go to is to use an additional inner layer, that will allow to send padding. PT client and server could communicate whether they support said layer in a HTTP header during the "HTTP CONNECT -> <- HTTP OK" phase, and not lose interoperability with other clients and servers.

Seems like I'll have to update Caddy dependencies. Hopefully the "inithack" isn't necessary anymore, but we'll see.

I agree with your thoughts on BridgeDB, and that Snowflake's broker won't cut it due to not having sophisticated protection against enumeration. Not sure if it would be better to roll a new BridgeLiteDB or expand the regular one.

For bridge operators who set up an HTTP server just for the sake of running httpsproxy, I worry that the hosted content will be easy to attribute to httpsproxy. I don't think we can expect bridge operators to be creative and figure out what non-fingerprintable content to host, so we should have an automated solution to this.

I do share this concern, but this is something censor also would want to automate, and I doubt they will have an easy time doing that. Censor would have to learn how to crawl websites at scale and distinguish proxy-looking websites from legitimate ones. While it is suspicious to exchange GBs of traffic with a website that just looks like Apache2 default page, I feel like it take some time and effort for censors to catch up and implement blocking even for an effortless default like that, while we figure out better solutions.

  • A good idea might be to ask people to deploy websites with login forms, such that censor can't be too sure that there isn't more content on a given website.
  • When/if the encrypted SNI comes, we can just respond to probes with "Wrong SNI"! The added benefit is that, to my knowledge, the usual way to say "Wrong SNI!" is a vague handshake_failure alert, so censor may not even be sure if it's the SNI or if they're not offering right ciphers/curves/etc. We could employ that strategy before the encrypted SNI comes too, however, currently the censor is supposed to be able to use their network access to see which SNI people are using to connect to which IPs. This also defeats the "probe a web server without credentials" attack.
  • We can also do a reverse proxy to a randomly chosen website, if the client didn't demonstrate the knowledge of the secret (somewhere in ClientHello). This also defeats the "probe a web server without credentials" attack.
  • Automated website generation could be a useful direction, but I am not sure what that would look like. Simply fetching content from various randomly chosen places with some bits of randomized customization may be practical.

comment:13 Changed 4 months ago by phw

Cc: cohosh added

comment:14 Changed 4 months ago by gaba

Keywords: anti-censorship-roadmap-2020Q1 added; anti-censorship-roadmap-september removed

comment:15 Changed 3 weeks ago by gaba

Keywords: anti-censorship-roadmap-2020Q1 removed

No more Q1 in 2020.

Note: See TracTickets for help on using tickets.