Opened 5 months ago

Closed 4 weeks ago

#29278 closed task (fixed)

Assess HTTP proxy

Reported by: cohosh Owned by: phw
Priority: Low Milestone:
Component: Circumvention/Pluggable transport Version:
Severity: Normal Keywords: anti-censorship-roadmap
Cc: cohosh, phw Actual Points: 1
Parent ID: Points: 2
Reviewer: Sponsor: Sponsor19

Description

Look at the status of HTTP proxy and see what it will take to integrate it with Tor.

Child Tickets

Change History (8)

comment:1 Changed 5 months ago by teor

Owner: asn deleted
Status: newassigned

asn does not need to own any obfuscation tickets any more. Default owners are trouble.

comment:2 Changed 5 months ago by cohosh

Status: assignednew

tickets were assigned to asn, setting them as unassigned (new) again.

comment:4 Changed 2 months ago by phw

Cc: phw added

comment:5 Changed 5 weeks ago by phw

Owner: set to phw
Points: 2
Status: newassigned

comment:6 Changed 5 weeks ago by phw

Here are some general thoughts:

  • I quite like the concept. httpsproxy is the closest we've ever gotten to a transport that "looks like HTTP". It uses HTTP's CONNECT method (conceptually similar to SOCKS), which makes it flexible and low-overhead. It also means that anyone who runs a web server could turn on CONNECT (and, to prevent abuse, limit outgoing connections to IP addresses of guard relays), effectively turning the web server into a snowflake-like bridge that doesn't run a Tor client, which conveniently fixes #7349. This, however, requires non-trivial changes to BridgeDB as I explain below.
  • In my opinion, httpsproxy's biggest problem is that it still suffers from the proxy distribution problem. No matter how well httpsproxy can disguise Tor traffic, we still end up trying to distribute a small number of long-lived bridges while hoping that our adversaries are having a hard time collecting them all. We don't know how many of our bridges have been collected (#9316 may shed light on this) but it's certainly easier than we would like it to be.
  • I worry that the crowd that can run an httpsproxy bridge may be smaller than the crowd that can run an obfs4 bridge. httpsproxy supports two deployment scenarios; "naive proxy" and "full bridge". The "naive proxy" scenario is similar to snowflake and expects you to already be running a web server. We may have many motivated volunteers, but I'm afraid that only a small fraction runs their own web server. This is not necessary in the "full bridge" scenario, but this comes at the cost of being less resistant to fingerprinting. In comparison, snowflake's barrier to entry is significantly lower—especially once we have a web extension (#23888).

Here are my thoughts on what deployment would entail:

  • httpsproxy is written in golang. It's not a lot of code (the HTTP logic comes from the caddy module) and the concept behind it is relatively simple, meaning that we would be able to maintain it even if the original author would vanish.
  • The "naive proxy" deployment scenario won't work with our bridge authority and BridgeDB because they assume that a tor client and its pluggable transport run on the same machine. To make the "naive proxy" scneario work, we would probably have to come up with a new channel that allows tor-less httpsproxies to announce themselves to BridgeDB. Since this is similar to the way snowflake works, a snowflake-style broker mechanism may come in handy here but unlike snowflake, httpsproxy is affected by the bridge distribution problem, so the broker would need to get some of the smartness that BridgeDB already has (see #29296).
  • Alternatively (or in parallel), we can deploy httpsproxy in the orthodox "full bridge" scenario, which is similar to obfs4. In this case, a tor client ships with a web server (currently caddy). This will work out of the box with our bridge authority and BridgeDB, but we will have a number of additional issues:
    1. Bridges will expose a web server and an OR port. Because of #7349, this will enable confirmation attacks à la "Not sure if this web server runs a Tor bridge? Just port scan it and look for an OR port". This isn't a new problem but it somewhat defeats the purpose of shipping a well-designed pluggable transport.
    2. All bridges will run the same web server and if this web server isn't particularly popular on the Internet, censors could fingerprint and block them all. I don't know how popular caddy is, but I've never heard of it before I started learning about httpsproxy.
    3. The content hosted on the bridge's web server needs to look "natural". A web server that gives you a simple 404 or 403 for its landing page may look suspicious. Or maybe not? I don't think we can expect our bridge operators to be creative, and serve "natural" content on their httpsproxy web servers.

The benefit of this scenario is that it doesn't require architectural changes to BridgeDB. In fact, we could move forward with deploying the "full bridge" scneario and start supporting the "naive proxy" approach later on.

  • There are a bunch of fingerprinting issues that we would have to think about. Sergey, the author of httpsproxy, already did a great job discussing them over here. I'm particularly worried about an active probing attack that allows a censor to confirm if a web server supports CONNECT.

comment:7 Changed 4 weeks ago by gaba

Keywords: anti-censorship-roadmap added

comment:8 Changed 4 weeks ago by phw

Actual Points: 1
Resolution: fixed
Status: assignedclosed

I'm closing this ticket because we now have an idea of what it takes to deploy HTTPS Proxy. For the actual deployment, see #26923.

Note: See TracTickets for help on using tickets.