Intent to create Pluggable Transport: HTTPS proxy

added component::circumvention/pluggable transport httpsproxy points::30 priority::medium reporter::sf severity::normal sponsor::28-can status::new type::project labels

Trac:
Keywords: N/A deleted, pt httpsproxy added

Trac:
Username: sf

httpsproxy.tar.bz2

httpsproxy PT source code

Trac:
Username: sf

Trac:
Cc: dcf, hiro to dcf, hiro, dmr

Trac:
Sponsor: N/A to Sponsor19

asn does not need to own any obfuscation tickets any more. Default owners are trouble.

Trac:
Owner: asn to N/A
Status: new to assigned

tickets were assigned to asn, setting them as unassigned (new) again.

Trac:
Status: assigned to new

Thanks a lot for filing this, sf! Has there been any further development since you posted the code on here?

Here are some thoughts after I had a look at the code and design:

I just set up an httpsproxy bridge. It looks like the caddy dependency has an outdated dependency on github.com/lucas-clemente/quic-go/h2quic but the current version seems to expose github.com/lucas-clemente/quic-go/http3. After updating this in my local caddy clone, I got httpsproxy to work. There's also a dependency on github.com/Jigsaw-Code/volunteer/server/inithack but this repository doesn't seem to exist anymore.
I quite like the idea of naive proxies because it solves the problem of "we need to expose a bridge's OR port" (#7349 (moved)) and already-deployed web servers host "natural" content that won't look suspicious to censors. All of this, however, comes at a cost: Getting httpsproxy bridges registered in BridgeDB will be tricky. Bridges make it into BridgeDB by first sending their bridge descriptor to the bridge authority, which periodically copies all bridges it collected over to BridgeDB. The bridge authority doesn't know how to speak anything other than vanilla Tor to bridges. Naive proxies aren't bridges, and would thus have to make it into BridgeDB over a different channel. A snowflake-style broker may be a better solution here, so we don't have to deal with BridgeDB altogether, but we also have to have a bridge distribution strategy that makes it difficult to enumerate all httpsproxies. Snowflake doesn't have to deal with this because its proxies are short-lived.
For bridge operators who set up an HTTP server just for the sake of running httpsproxy, I worry that the hosted content will be easy to attribute to httpsproxy. I don't think we can expect bridge operators to be creative and figure out what non-fingerprintable content to host, so we should have an automated solution to this.
The re-dialing approach to fix the connection lifetime fingerprint sounds fine. We should also randomise the times at which clients re-dial. It also makes me wonder how much of the web would break if a censor were to reset TCP connections to web servers after, say, 30 seconds.
Your "probe a web server without credentials" attack worries me too. I would expect web servers that support CONNECT to be very rare, to a point where a censor is willing to block them all, but my intuition may be off here. Also, with its active probing infrastructure, the GFW seems well-equipped to start such an attack whenever it pleases.

Trac:
Points: N/A to 30
Cc: dcf, hiro, dmr to dcf, hiro, dmr, phw

Trac:
Keywords: pt httpsproxy deleted, pt httpsproxy anti-censorship-roadmap-maybe added

Adding this tickets to the backlog.

Trac:
Keywords: pt httpsproxy anti-censorship-roadmap-maybe deleted, pt, httpsproxy, anti-censorship-roadmap-maybe, anti-censorship-roadmap added

Trac:
Keywords: anti-censorship-roadmap-maybe deleted, N/A added

Moving from Sponsor 19 to Sponsor 28.

Trac:
Sponsor: Sponsor19 to Sponsor28-can

Trac:
Keywords: pt, anti-censorship-roadmap deleted, anti-censorship-roadmap-september added

Just noticed the response by phw.

Development of this transport was put on hold after I didn't manage to convince Golang devs to add [support at a user-accessible layer] in x/net/http2. This seemed to be the easiest way to add padding, and since that didn't work out, we'd have to find another way. I think the way to go to is to use an additional inner layer, that will allow to send padding. PT client and server could communicate whether they support said layer in a HTTP header during the "HTTP CONNECT -> <- HTTP OK" phase, and not lose interoperability with other clients and servers.

Seems like I'll have to update Caddy dependencies. Hopefully the "inithack" isn't necessary anymore, but we'll see.

I agree with your thoughts on BridgeDB, and that Snowflake's broker won't cut it due to not having sophisticated protection against enumeration. Not sure if it would be better to roll a new BridgeLiteDB or expand the regular one.

For bridge operators who set up an HTTP server just for the sake of running httpsproxy, I worry that the hosted content will be easy to attribute to httpsproxy. I don't think we can expect bridge operators to be creative and figure out what non-fingerprintable content to host, so we should have an automated solution to this.

I do share this concern, but this is something censor also would want to automate, and I doubt they will have an easy time doing that. Censor would have to learn how to crawl websites at scale and distinguish proxy-looking websites from legitimate ones. While it is suspicious to exchange GBs of traffic with a website that just looks like Apache2 default page, I feel like it take some time and effort for censors to catch up and implement blocking even for an effortless default like that, while we figure out better solutions.

A good idea might be to ask people to deploy websites with login forms, such that censor can't be too sure that there isn't more content on a given website.
When/if the encrypted SNI comes, we can just respond to probes with "Wrong SNI"! The added benefit is that, to my knowledge, the usual way to say "Wrong SNI!" is a vague handshake_failure alert, so censor may not even be sure if it's the SNI or if they're not offering right ciphers/curves/etc. We could employ that strategy before the encrypted SNI comes too, however, currently the censor is supposed to be able to use their network access to see which SNI people are using to connect to which IPs. This also defeats the "probe a web server without credentials" attack.
We can also do a reverse proxy to a randomly chosen website, if the client didn't demonstrate the knowledge of the secret (somewhere in ClientHello). This also defeats the "probe a web server without credentials" attack.
Automated website generation could be a useful direction, but I am not sure what that would look like. Simply fetching content from various randomly chosen places with some bits of randomized customization may be practical.

Trac:
Username: sf

Trac:
Cc: dcf, hiro, dmr, phw to dcf, hiro, dmr, phw, cohosh

Trac:
Keywords: anti-censorship-roadmap-september deleted, anti-censorship-roadmap-2020Q1 added

No more Q1 in 2020.

Trac:
Keywords: anti-censorship-roadmap-2020Q1 deleted, N/A added

changed time estimate to 240h

mentioned in issue #29278 (moved)

mentioned in issue #29287 (moved)

mentioned in issue #32872 (moved)

moved to tpo/anti-censorship/pluggable-transports/trac#26923 (moved)

Intent to create Pluggable Transport: HTTPS proxy

httpsproxy

Way to use it HTTPS proxies with Tor

Naive proxy

Full Bridge

Registering with BridgeDB

Current prototype

Language

Overhead

Fingerprinting

Probing web server with proxy requests without a secret

TLS ClientHello fingerprinting

Other TLS fingerprinting

Traffic Size Patterns

Connection establishment traffic patterns

Connection lifetime

Child items 0

Activity