Opened 7 weeks ago

Last modified 3 hours ago

#30716 assigned task

Improve the obfs4 obfuscation protocol

Reported by: phw Owned by: phw
Priority: High Milestone:
Component: Circumvention/Obfs4 Version:
Severity: Normal Keywords: sponsor28, anti-censorship-roadmap-august
Cc: arma, cohosh, gaba, phw, robgjansen, msherr Actual Points:
Parent ID: Points: 20
Reviewer: Sponsor: Sponsor28-must

Description (last modified by phw)

As part of our work for Sponsor 28, we will evaluate and improve the obfs4 obfuscation protocol, which may result in obfs5.

Roger started the discussion on our anti-censorship-team mailing list. Relevant reading is the CCS'15 paper Seeing through Network-Protocol Obfuscation and the S&P'16 paper SoK: Towards Grounding CensorshipCircumvention in Empiricism.

Let's use this ticket to keep track of this effort. Below is a list of ideas that we may or may not want to incorporate in obfs5.

Randomisation

Obfs4 already implements randomisation for packet lengths and inter-arrival times but there are other protocol aspects that we can randomise. Note that the adoption of these strategies may complicate censorship analysis: if obfs5 instance X looks very different from obfs5 instance Y, then X may end up getting blocked while Y still works. Instead of saying "obfs5 is blocked," one may then have to be more specific and say "the obfs5 instances that rely on UDP are blocked."

  • Payload: All bytes that obfs4 writes to the wire are randomly distributed. These high-entropy packets may or may not be common on the Internet. We could evade a "high-entropy filter" by having obfs4 servers derive a formal language from the shared secret. This language could, say, use dummy clear-text headers.
  • Cover traffic: dcf explains that obfs4 only sends data when it's given data to send. To improve on this, as dcf suggests, we could make obfs5 send data even when the application has nothing to send.
  • Packet directions: An obfs4 flow begins with the client sending data to the server. We could randomise packet directions and have, say, the server talk first with a server-specific probability.
  • Transport protocol: An obfs4 server could talk either TCP or UDP or SCTP. This may very well not be worth the effort.

Lessons learned from CCS'15 paper

  • DPI boxes tend to classify flows by only inspecting the first N packets of a flow. Keeping state is expensive, after all. We could exploit this by relaxing our obfuscation techniques after N packets to increase throughput.
  • The paper's data set may not be representative of what countries or ISPs would see:
    • It's "only" a university uplink. Universities typically have policies that prohibit file sharing such as BitTorrent. BitTorrent's "message stream encryption" may look similar to obfs3 and obfs4.
    • The data sets are from 2014, 2012, and 2010, respectively. That's a long time in Internet years.
    • The detectors' false positive rates are non-trivial and, as the authors point out themselves, would be problematic for a censor given that non-obfuscated traffic significantly outweighs obfuscated traffic.
    • Does the data set only contain one obfs4 server instance? This may have affected their results.

Miscellaneous

  • yawning writes that obfs4 doesn't easily support backward incompatible protocol alterations.
  • Crazy idea: Use a modified TCP stack that ignores RST and FIN segments, so the GFW's on-path devices cannot tear down the connection. Instead, the obfs5 protocol could signal the end of the connection in an authenticated control frame. We could ignore RST and FIN segments by using firewall rules, or to get more crazy, by shipping a user space TCP stack (this may be easy to fingerprint, though).

Child Tickets

TicketStatusOwnerSummaryComponent
#30986assignedphwUnderstand the "long tail" of unclassifiable network trafficCircumvention

Change History (8)

comment:1 Changed 7 weeks ago by yawning

One of the design deficiencies of the obfs4 protocol is that it doesn't easily/efficiently support backward incompatible protocol alterations.

There are ways around this, but at that point, people are better off writing a new different/backwards incompatible protocol entirely, that fixes a number of the design flaws in the underlying protocol.

comment:2 Changed 6 weeks ago by dcf

The obfs4 framing format is pretty nice, in that is allow arbitrary shaping: both client and server can send any amount of data, at any time. The only exception is at one point during the handshake: after the client has sent the MAC indicating the end of its padding, the client must remain silent until after the server has sent its part of the handshake. You can see the gap in the bottom two graphs at https://people.torproject.org/~dcf/obfs4-timing/.

So one desideratum from me is that the protocol should allow either side to send any amount of data at any time, and have it correctly interpreted as padding or meaningful data. Ideally it should even be possible for the server to send data before the client has sent anything.

comment:3 Changed 6 weeks ago by yawning

The framing could use better cryptography and a more sensible design overall, but there are larger deficiencies in the protocol.

So one desideratum from me is that the protocol should allow either side to send any amount of data at any time, and have it correctly interpreted as padding or meaningful data. Ideally it should even be possible for the server to send data before the client has sent anything.

At one point I had thoughts of how I would like to implement something like this, but it's been years since I gave serious thought about this problem. I personally would have felt uneasy about a responder-speaks-first design.

comment:4 Changed 4 weeks ago by phw

Description: modified (diff)

comment:5 Changed 3 weeks ago by phw

Description: modified (diff)

comment:6 Changed 2 weeks ago by robgjansen

Cc: robgjansen added

comment:7 Changed 6 days ago by phw

Cc: msherr added

comment:8 Changed 3 hours ago by gaba

Keywords: anti-censorship-roadmap-august added; anti-censorship-roadmap removed
Note: See TracTickets for help on using tickets.