Improve the PT spec and how PTs interface with Tor

added anti-censorship-roadmap-2020 component::circumvention/pluggable transport owner::phw points::15 priority::high severity::normal sponsor::28-must status::assigned type::project labels

asn does not need to own any obfuscation tickets any more. Default owners are trouble.

Trac:
Owner: asn to N/A
Status: new to assigned

tickets were assigned to asn, setting them as unassigned (new) again.

Trac:
Status: assigned to new

Trac:
Description: Make it easier for developers (and academics) to do things with PTs so that we can improve the PT integration pipeline.

We need to assess current pain points, think about how the bridge distribution will factor into it, and talk with the community to see what they need.

to

We want to make it easier for developers (and academics) to design and implement new pluggable transports and get them easily integrated with Tor so that we can have a well-functioning PT integration pipeline.

This is a large project that will consist of several things:

We need to assess pain points with the current PT spec and desired features from a variety of PT developers.
We might want to take a look at the PTv2 specification to see where features differ from our v1 and also which features seem to be liked or used by PT developers.
We should think about how bridge distribution should factor into the PT specification. For example, some transports such as meek and snowflake handle "bridge" information differently than transports whose bridges are distributed through BridgeDB. This results in a different interaction with Tor, and we might consider modifying the spec with the snowflake/broker model in mind (ticket #29296 (moved)).

In general, we should improve our communication with the pluggable transports community to see what they need and figure out how to get more PTs integrated with Tor.
Summary: Improve the PT interface with Tor to Improve the PT spec and how PTs interface with Tor

Trac:
Keywords: N/A deleted, network-team-roadmap-2019-Q1Q2 added

Here's an incomplete list of issues with our current spec:

iOS applications aren't allowed to fork a subprocess, which means that "obfs4proxy will likely run in the same process as the app and tor." Mike managed to work around this in iObfs.
Exposing a SOCKS proxy on Android is not future proof. In fact, even Unix domain sockets may be a problem in the future. Still, a domain socket would be better than SOCKS-based IPC and, according to Yawning, will facilitate sandboxing.
The PT should be able to communicate its bootstrap status to the invoking process.
The spec should incorporate the proposed dormant mode (see #28849 (moved)).
Some PTs such as meek and snowflake don't rely on an IP address. The current workaround is to use awkward pseudo IP addresses (#18611 (moved)).
Other transports may want to rely on multiple IP address; or at least listen on both an IPv4 and IPv6 address. We need to reconsider the outdated notion of a bridge line. (#11211 (moved))
Transports are not allowed to emit bytes with the high bit set to stdout in messages such as PROXY-ERROR, but there is no guidance for how to handle/escape such bytes if they happen to appear in a user-provided message or filename, for example.
SOCKS args can only hold a maximum of about 512 bytes (#10671 (moved)).
There is an ambiguity in encoding of SOCKS args that end in a NUL byte (comment:11:ticket:29627).
There are multiple incompatible and hard-to-implement dictionary encodings.
- §3.3.3 SMETHOD ARGS: comma-separated, must escape backslash, equals, and comma. {{{ key1=value1,key2=value2 }}}
- §3.5 client per-connection arguments: semicolon-separated, must escape backslash, equals, and semicolon. {{{ key1=value1;key2=value2 }}}
- §3.2.3 TOR_PT_SERVER_TRANSPORT_OPTIONS: semicolon-separated, technically a nested dictionary with each element additionally colon-prefixed with the transport it pertains to, must escape backslash, colon, and semicolon (but [[ticket:12931|not equals]]—in this encoding it's impossible for a key to contain an equals sign). {{{ transport1:key1=value1;transport2:key2=value2 }}}
- §3.2.2 TOR_PT_SERVER_BINDADDR: comma-separated, uses - instead of = to separate key and value, no escaping necessary because of data types. {{{ transport1-1.2.3.4:1234,transport2-5.6.7.8:5678 }}}
- §3.3.4 and §3.3.5 LOG and STATUS: space-separated with C-style escapes, [[comment:12:ticket:28940|ambiguous as to when quotes are required]]. {{{ key1="value1" key2="value2" }}}
- If we're including PT 2.0, then there is also UTF-8 JSON (§1.4. Pluggable PT Client Per-Connection Arguments). {{{ {"key1": "value1", "key2": "value2"} }}}
There is no way to run multiple instances of the same server transport with different options. This is because both TOR_PT_SERVER_BINDADDR and TOR_PT_SERVER_TRANSPORT_OPTIONS are keyed by transport name, with nothing to distinguish multiple instances that use the same name. It's an annoyance when, for example, you want to run multiple copies of obfs4 with different certificates for access control, or with different iat-mode settings. The only way to do it is to (1) run multiple independent instances of tor with their own configuration files, or (2) hack the PT source so that it recognizes multiple synonymous method names, e.g. obfs4a, obfs4b, obfs4c. There is a similar problem with torrc, in that ServerTransportPlugin, ServerTransportListenAddr, and ServerTransportOptions are also all keyed by transport name (#31228 (moved) and #11211 (moved)).
- In comparison, the PT spec does support multiple instances of client transports with different options, because the options come in SOCKS args rather than an environment variable, so they are bound to a specific CMETHOD listener.

And here's an incomplete list of existing library implementations:

A seemingly unnamed Swift implementation of the v2.1 specification, maintained by the Operator Foundation.
PLUTO2 is a Java implementation of the v2.x specification, maintained by the Guardian Project.
goptlib is a Go implementation of the v1.0 specification, maintained by the Tor Project.
pyptlib is a Python implementation of the v1.0 specification, (formerly) maintained by the Tor Project.

Edit: remove duplicate issues

Trac:
Keywords: N/A deleted, anti-censorship-roadmap added
Cc: cohosh to cohosh, arma, gaba
Priority: Medium to High
Points: N/A to 15
Status: new to assigned
Owner: N/A to phw
Sponsor: Sponsor19 to Sponsor28-must

We now have a discussion thread on tor-dev@ and I started pointing some implementers to this thread in the hope that they will share their experience.

Replying to phw:

And here's an incomplete list of existing library implementations:

Really there are two types of PT implementations, or three if you count PT 2.0 additions. There aren't really standard names for these.

IPC manager/dispatcher. As far as I know, tor and [https://github.com/twisteroidambassador/ptadapter] are the only two implementations of this. This is the thing that sets e.g. TOR_PT_MANAGED_TRANSPORT_VER and manages subprocesses of type (2).
IPC transport/plugin. This is goptlib and pyptlib. It's a subprocess managed by an implementation of type (1). This is the thing that writes e.g. CMETHOD to stdout.
From PT 2.0, there are also plugin/transport implementations that you are meant to link with directly in the same executable, without going through the IPC interface. There are Go and Swift API spec. From talking to Brandon Wiley, my understanding is that everything that uses PT other than tor and ptadapter uses such an API, or something like it, not the IPC model. shapeshifter-dispatcher converts implementations of type (3) into type (2).

The Pluggable Transports Base Spec v2.1 calls types (1) and (2) "IPC" and type (3) "API".

Replying to dcf:

Replying to phw:

And here's an incomplete list of existing library implementations:

Really there are two types of PT implementations, or three if you count PT 2.0 additions. There aren't really standard names for these.

IPC manager/dispatcher. As far as I know, tor and [https://github.com/twisteroidambassador/ptadapter] are the only two implementations of this. This is the thing that sets e.g. TOR_PT_MANAGED_TRANSPORT_VER and manages subprocesses of type (2).

The module within Tor Launcher that implements Moat (interactive bridge retrieval) is another example of the above. We did that so we could reuse your Meek PT implementation. See https://gitweb.torproject.org/tor-launcher.git/tree/src/modules/tl-bridgedb.jsm?h=maint-0.2.18#n188

Trac:
Cc: cohosh, arma, gaba to cohosh, arma, gaba, brade, mcs

Replying to mcs:

The module within Tor Launcher that implements Moat (interactive bridge retrieval) is another example of the above. We did that so we could reuse your Meek PT implementation. See https://gitweb.torproject.org/tor-launcher.git/tree/src/modules/tl-bridgedb.jsm?h=maint-0.2.18#n188

Oh good call, IIRC on the server side of Moat as well there is some half-baked shell script managing the meek-server process, that could be replaced with ptadapter (#29096 (moved)).

#21816 (moved) (Add support for Pluggable Transports 2.0) is related.

We also want to be able to assign multiple addresses to each pluggable transport. For example, and IPv4 and IPv6 address. See #30953 (moved) for details.

Trac:
Cc: cohosh, arma, gaba, brade, mcs to cohosh, arma, gaba, brade, mcs, msherr

Trac:
Keywords: network-team-roadmap-2019-Q1Q2, anti-censorship-roadmap deleted, anti-censorship-roadmap-august added

Trac:
Keywords: anti-censorship-roadmap-august deleted, anti-censorship-roadmap-october added

Trac:
Keywords: anti-censorship-roadmap-october deleted, anti-censorship-roadmap-2020Q1 added

No more Q1 in 2020...

Trac:
Keywords: anti-censorship-roadmap-2020Q1 deleted, anti-censorship-roadmap-2020 added

changed time estimate to 120h

mentioned in issue #29296 (moved)

mentioned in issue #30707 (moved)

Improve the PT spec and how PTs interface with Tor

Child items ...

Activity