Opened 9 months ago

Last modified 3 days ago

#28942 needs_review enhancement

Evaluate pion WebRTC

Reported by: backkem Owned by: cohosh
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords: anti-censorship-roadmap-august
Cc: ahf, dcf, arlolra, cohosh, backkem, sukhbir, boklm Actual Points:
Parent ID: Points: 5
Reviewer: Sponsor: Sponsor28-must

Description

We've made a pure Go WebRTC port over at pions/webrtc. This may provide a viable alternative to libwebrtc.

Disadvantages

  • We've not done much work on security yet. However, we definitely intend to work on this since hardening the security will be a requirement for many other use-cases.
  • Not entirely feature complete but this seems less important for your use-case.
  • We don't support TURN yet. We currently plan to build this for our next release.

Advantages

  • It is fully go-gettable, fixing the horrible built process. In addition, it should run everywhere Go runs.
  • We've tested our data channel implementation against multiple targets, including Chrome, Firefox and NodeJS and aim to automate these in the future.
  • It will give you more freedom to make changes to the WebRTC stack and even allow experimentation, E.g.: to reduce fingerprinting.
  • It may solve/invalidate some of your other problems, including #19026, #19315, #19569, #22718 and #25483.
  • We're working on exposing an idiomatic API based on the io.ReadWriteCloser.
  • We have an active community and development team. We're more than happy to fix any problems that may arise. We're also open to prioritizing any features you conciser blocking. Lastly, we have great PR response times.

I'm also interested in collaborating on NAT testing as mentioned in #25595.

Child Tickets

Attachments (4)

gomodtorbm (4.9 KB) - added by dcf 3 weeks ago.
Rough script to produce rbm config files from go mod graph output.
win8-firewall.png (48.8 KB) - added by dcf 3 weeks ago.
Screenshot of a Winfows firewall popup on running commit a45ba4792f on Windows 8.
vendor.zip (2.3 MB) - added by cohosh 3 weeks ago.
A working directory of pion/webrtc dependencies
0001-Allow-gathering-of-candidates-to-generate-offer.patch (1.3 KB) - added by cohosh 2 weeks ago.
Patch for pion/webrtc bug

Change History (69)

comment:1 Changed 8 months ago by gaba

Sponsor: Sponsor19

comment:2 Changed 4 months ago by gaba

Keywords: ex-sponsor-19 added

Adding the keyword to mark everything that didn't fit into the time for sponsor 19.

comment:3 Changed 4 months ago by phw

Sponsor: Sponsor19Sponsor28-can

Moving from Sponsor 19 to Sponsor 28.

comment:4 Changed 3 months ago by cohosh

Owner: set to cohosh
Status: newassigned

Taking a look at this now. The API is very similar to what we're using and the code has been actively maintained this whole time.

Although the API isn't a direct match, it looks like relatively little work to get it hooked up to what we have. I'm going to work on getting it building first, and then take a look at the code.

If this works, it will be a lot easier to build and maintain.

comment:5 Changed 3 months ago by cohosh

Cc: cohosh added

comment:6 Changed 3 months ago by cohosh

I got proxy-go building with pion/webrtc. The changes necessary were fairly small and can be seen here. Datachannel creation and teardown appear to be working as expected and some data is flowing through.

However, I'm having trouble bootstrapping the Tor client past 50%. Here's a log of the pion port:

Jun 14 22:44:25.188 [notice] Tor 0.2.9.16 (git-9ef571339967c1e5) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.1.0j and Zlib 1.2.8.
Jun 14 22:44:25.188 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Jun 14 22:44:25.188 [notice] Read configuration file "/go/bin/torrc-5".
Jun 14 22:44:25.190 [warn] Path for DataDirectory (datadir5) is relative and will resolve to /go/bin/datadir5. Is this what you wanted?
Jun 14 22:44:25.191 [notice] Opening Socks listener on 127.0.0.1:9055
Jun 14 22:44:25.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.
Jun 14 22:44:25.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.
Jun 14 22:44:25.000 [notice] Bootstrapped 0%: Starting
Jun 14 22:44:25.000 [notice] Delaying directory fetches: No running bridges
Jun 14 22:44:27.000 [notice] Bootstrapped 5%: Connecting to directory server
Jun 14 22:44:27.000 [notice] Bootstrapped 10%: Finishing handshake with directory server
Jun 14 22:44:30.000 [notice] Learned fingerprint 2B280B23E1107BB62ABFC40DDCC8824814F80A72 for bridge 0.0.3.0:1 (with transport 'snowflake').
Jun 14 22:44:30.000 [notice] Bootstrapped 15%: Establishing an encrypted directory connection
Jun 14 22:44:30.000 [notice] Bootstrapped 20%: Asking for networkstatus consensus
Jun 14 22:44:30.000 [notice] new bridge descriptor 'flakey' (fresh): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 0.0.3.0
Jun 14 22:44:30.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
Jun 14 22:44:32.000 [notice] Bootstrapped 25%: Loading networkstatus consensus
Jun 14 22:44:34.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
Jun 14 22:44:34.000 [notice] Bootstrapped 40%: Loading authority key certs
Jun 14 22:44:34.000 [notice] Bootstrapped 45%: Asking for relay descriptors
Jun 14 22:44:34.000 [notice] I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/6533, and can only build 0% of likely paths. (We have 0% of guards bw, 0% of midpoint bw, and 0% of exit bw = 0% of path bw.)
Jun 14 22:44:34.000 [notice] Bootstrapped 50%: Loading relay descriptors
Jun 14 22:45:07.000 [notice] Delaying directory fetches: No running bridges

compared to a log of using go-webrtc:

Jun 14 22:48:51.025 [notice] Tor 0.2.9.16 (git-9ef571339967c1e5) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.1.0j and Zlib 1.2.8.
Jun 14 22:48:51.025 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Jun 14 22:48:51.025 [notice] Read configuration file "/go/bin/torrc-6".
Jun 14 22:48:51.026 [warn] Path for DataDirectory (datadir6) is relative and will resolve to /go/bin/datadir6. Is this what you wanted?
Jun 14 22:48:51.027 [notice] Opening Socks listener on 127.0.0.1:9056
Jun 14 22:48:51.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.
Jun 14 22:48:51.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.
Jun 14 22:48:51.000 [notice] Bootstrapped 0%: Starting
Jun 14 22:48:51.000 [notice] Delaying directory fetches: No running bridges
Jun 14 22:48:53.000 [notice] Bootstrapped 5%: Connecting to directory server
Jun 14 22:48:53.000 [notice] Bootstrapped 10%: Finishing handshake with directory server
Jun 14 22:48:53.000 [notice] Learned fingerprint 2B280B23E1107BB62ABFC40DDCC8824814F80A72 for bridge 0.0.3.0:1 (with transport 'snowflake').
Jun 14 22:48:53.000 [notice] Bootstrapped 15%: Establishing an encrypted directory connection
Jun 14 22:48:53.000 [notice] Bootstrapped 20%: Asking for networkstatus consensus
Jun 14 22:48:53.000 [notice] new bridge descriptor 'flakey' (fresh): $2B280B23E1107BB62ABFC40DDCC8824814F80A72~flakey at 0.0.3.0
Jun 14 22:48:53.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
Jun 14 22:48:54.000 [notice] Bootstrapped 25%: Loading networkstatus consensus
Jun 14 22:48:58.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no usable consensus.
Jun 14 22:48:58.000 [notice] Bootstrapped 40%: Loading authority key certs
Jun 14 22:48:58.000 [notice] Bootstrapped 45%: Asking for relay descriptors
Jun 14 22:48:58.000 [notice] I learned some more directory information, but not enough to build a circuit: We need more microdescriptors: we have 0/6533, and can only build 0% of likely paths. (We have 0% of guards bw, 0% of midpoint bw, and 0% of exit bw = 0% of path bw.)
Jun 14 22:48:58.000 [notice] Bootstrapped 50%: Loading relay descriptors
Jun 14 22:49:01.000 [notice] Bootstrapped 57%: Loading relay descriptors
Jun 14 22:49:01.000 [notice] Bootstrapped 65%: Loading relay descriptors
Jun 14 22:49:01.000 [notice] Bootstrapped 71%: Loading relay descriptors
Jun 14 22:49:01.000 [notice] Bootstrapped 78%: Loading relay descriptors
Jun 14 22:49:01.000 [notice] Bootstrapped 80%: Connecting to the Tor network
Jun 14 22:49:02.000 [notice] Bootstrapped 90%: Establishing a Tor circuit
Jun 14 22:49:02.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Jun 14 22:49:02.000 [notice] Bootstrapped 100%: Done

I've reproduced this several times with no luck getting past 50%. Going to take a look at whether the messages are getting through as expected.

comment:7 Changed 3 months ago by cmm323

Have you looked at the snowflake logs? I forgot where they are, but snowflake produces its own logs, probably in the data dir.

Last edited 3 months ago by cmm323 (previous) (diff)

comment:8 Changed 3 months ago by cohosh

Looks like something's up with reading from the data channel. I put a version of BytesSyncLogger at the proxy-go instance and at the client but changed it to count totals and not reset the inbound and outbound counts. I'm finding that the proxy-go instance (which runs pion webrtc) is not receiving all of the messages that the client (running go-webrtc) sends:

At the proxy:

2019/06/18 13:58:34 Traffic Bytes (in|out): 40279 | 650650 -- (34 OnMessages, 97 Sends)
2019/06/18 13:58:34 OnClose channel
ortc ERROR: 2019/06/18 13:58:34 Failed to read from data channel sending reset packet in non-established state: state=Closed

At the client:

2019/06/18 13:58:24 Traffic Bytes (in|out): 650650 | 335191 -- (97 OnMessages, 43 Sends)
2019/06/18 13:58:53 Traffic Bytes (in|out): 650650 | 339361 -- (97 OnMessages, 44 Sends)
2019/06/18 13:58:53 WebRTC: No messages received for 30 seconds -- closing stale connection.
2019/06/18 13:58:53 WebRTC: closing DataChannel

I'm going to dig into the pion/webrtc code to see if anything obvious comes up. As far as I can tell the datachannels being created are reliable just as before so maybe there's something weird with the signaling code they use to notify the datachannel that data has been read.

The client has 9 unreceived sends as of 13:58:24 and the proxy-go instance errors out 10 full seconds later.

Also note that the error message of the closed channel state occurs on the proxy-go side before the the client closes their end of the datachannel. From 13:58:34 on, the client believes the channel is still open and is trying to read data.

Last edited 3 months ago by cohosh (previous) (diff)

comment:9 in reply to:  7 Changed 3 months ago by cohosh

Replying to cmm323:

Have you looked at the snowflake logs? I forgot where they are, but snowflake produces its own logs, probably in the data dir.

Sorry I just saw your question. Snowflake clients have logging turned off by default: https://gitweb.torproject.org/pluggable-transports/snowflake.git/tree/client/snowflake.go#n72
but you can set a log path using the -log option.

I'm actually using pion/webrtc at the proxy-go instance (which doesn't speak to little-t-tor directly and doesn't have a datadir). For this i'm again using the -log option to set my own logs. As you can see above I'm using both to compare :)

If you're interested in contributing to snowflake you can check out the README's in each of the directories for more information, but you might have to look at the source code in some cases to see what the behaviour is.

Last edited 3 months ago by cohosh (previous) (diff)

comment:10 Changed 3 months ago by cohosh

Upon further investigation, it looks like the Snowflake bridge is actually closing the connection to the proxy at that 13:58:34 timestamp. That leads me to believe that the webrtc library is somehow misordering or mangling data causing the bridge to error out the connection.

I checked the snowflake server logs and see these messages:

2019/06/18 13:58:34 error copying ORPort to WebSocket
2019/06/18 13:58:34 error copying WebSocket to ORPort

comment:11 Changed 3 months ago by cohosh

A byte-by-byte comparison of data sent by the client and data received from OnMessage by the proxy show that they differ :/

Interestingly, data send by the proxy (pion) and received by the client (go-webrtc) appears fine. So maybe the pion/webrtc isn't ordering data upon receipt?

Last edited 3 months ago by cohosh (previous) (diff)

comment:12 Changed 3 months ago by cohosh

A closer inspection yields that a big chunk of data (~298KB) went missing at the proxy side, but other than that bytes seem to be in order. This shouldn't happen if the channel is reliable. I've confirmed the channel type is set to reliable and ordered, so this could be a bug.

comment:13 Changed 3 months ago by dcf

Great work on this, cohosh. Missing data does seem to point to a bug in the pions implementation. backkem should be Cc'ed on this ticket so they'll get the report.

My first guess is that the bug lies somewhere inside pion/sctp. As I understand it, a reliable dataconnection is SCTP inside DTLS. A working SCTP should make it impossible to drop a chunk of data.

comment:14 Changed 3 months ago by cohosh

Cc: backkem added

Found the issue. The reassembly queue is returning an io.ErrShortBuffer error. It seems the dataChannelBufferSize constant is too small for the data that the client is sending: datachannel.go#L16

The reassembly queue works by concatenating all of the fragments for a SCTP Stream Sequence Number (SSN) and trying to read them into the provided buffer all at once. If the buffer is too short, an io.ErrShortBuffer is returned from reassemblyQueue.read here, but the function calling it (Stream.ReadSCTP) doesn't return an error and the data for that sequence number is simply lost here. Not that in particular the error returned from the reassemblyQueue read is being overwritten.

There's a few bugs here:

  1. If the buffer is too small, we should split up the reads into multiple subsequent reads.
  2. The error for the reassembly queue should be checked.

I can write up a patch for this, now that I have a good handle of what's going on.

comment:15 Changed 3 months ago by cohosh

Looking at the webrtc specification it seems to be a good idea to preserve user message boundaries when passing data to OnMessage().

It also looks like the pion webrtc implementation is set up to check for io.ErrShortBuffer errors: datachannel.go#L292, but it isn't handled.

I think I'll go about this by writing two patches:

  • a patch for pion/sctp that correctly forwards the error message
  • a patch for pion/webrtc that calls ReadDataChannel again with a larger buffer

comment:16 in reply to:  15 ; Changed 3 months ago by cohosh

Replying to cohosh:

Looking at the webrtc specification it seems to be a good idea to preserve user message boundaries when passing data to OnMessage().

It also looks like the pion webrtc implementation is set up to check for io.ErrShortBuffer errors: datachannel.go#L292, but it isn't handled.

I think I'll go about this by writing two patches:

  • a patch for pion/sctp that correctly forwards the error message

Fixed with https://github.com/pion/sctp/pull/51

  • a patch for pion/webrtc that calls ReadDataChannel again with a larger buffer

Fixed with https://github.com/pion/webrtc/pull/719

And Snowflake clients are now bootstrapping to 100% :)

comment:17 Changed 3 months ago by cmm323

Great! One concern I have is that if there are specific features in pion implementation that are different from the native implementation, which make it easy to block.

comment:18 in reply to:  17 Changed 3 months ago by Sean-Der

Replying to cmm323:

Great! One concern I have is that if there are specific features in pion implementation that are different from the native implementation, which make it easy to block.

Hi! I am Sean DuBois (one of the Pion WebRTC devs) We *should* have zero differences, and if they do pop up will try my best to fix them :)

We run our test suite against both Pion, and the browser implementation. We use the same Go code in both cases (but compile it to WASM for the browser)

Thanks again for the fixes cohosh! Go is amazing, it makes things so easy to fix.

comment:19 in reply to:  17 ; Changed 3 months ago by dcf

Replying to cmm323:

Great! One concern I have is that if there are specific features in pion implementation that are different from the native implementation, which make it easy to block.

I am not too worried about that at this point, because what little research we did using libwebrtc (doc/Snowflake/Fingerprinting) showed that even with the native library, Snowflake did not match other applications. I don't think swapping one library for another costs much at this point, in that respect.

Replying to Sean-Der:

We *should* have zero differences

I would be surprised if this is the case--unless pion has paid extraordinary attention to matching externally visible protocol implementation details, which goes farther than interoperability. What about the order of ciphersuites in the DTLS handshake, or the metadata inside STUN messages? One of the things we found is that there's no single "WebRTC" fingerprint, nor even a single "Chrome WebRTC" fingerprint--it depends on the specific application. That said, I'm glad that you are involved, and I am hopeful that pion will be easier to adapt if and when needed.

comment:20 in reply to:  16 ; Changed 3 months ago by dcf

Replying to cohosh:

And Snowflake clients are now bootstrapping to 100% :)

A next step is perhaps to run and monitor a pion-based proxy-go alongside the existing libwebrtc-based ones? We can check for crashes and see if there are anomalies with regard to number of clients handled, for example.

comment:21 in reply to:  20 ; Changed 3 months ago by cohosh

Replying to dcf:

Replying to cohosh:

And Snowflake clients are now bootstrapping to 100% :)

A next step is perhaps to run and monitor a pion-based proxy-go alongside the existing libwebrtc-based ones? We can check for crashes and see if there are anomalies with regard to number of clients handled, for example.

Sounds good. Do you want to do a code review first?

I've also started the process of switching over to pion/webrtc in the client.

comment:22 in reply to:  19 ; Changed 3 months ago by cohosh

Replying to dcf:

Replying to cmm323:

Great! One concern I have is that if there are specific features in pion implementation that are different from the native implementation, which make it easy to block.

I am not too worried about that at this point, because what little research we did using libwebrtc (doc/Snowflake/Fingerprinting) showed that even with the native library, Snowflake did not match other applications. I don't think swapping one library for another costs much at this point, in that respect.

There's also something to be said for the trade-off in complexity/adaptability, which I believe motivated the move from a headless Firefox helper to uTLS in meek: https://trac.torproject.org/projects/tor/ticket/29077

If pion/webrtc makes it so we can build snowflake for Windows and Android more easily and if they are amenable to us proposing changes based on observed blocking, then this seems like a good path forward to me.

comment:23 in reply to:  19 Changed 3 months ago by Sean-Der

Replying to dcf:

Replying to cmm323:

Great! One concern I have is that if there are specific features in pion implementation that are different from the native implementation, which make it easy to block.

I am not too worried about that at this point, because what little research we did using libwebrtc (doc/Snowflake/Fingerprinting) showed that even with the native library, Snowflake did not match other applications. I don't think swapping one library for another costs much at this point, in that respect.

Replying to Sean-Der:

We *should* have zero differences

I would be surprised if this is the case--unless pion has paid extraordinary attention to matching externally visible protocol implementation details, which goes farther than interoperability. What about the order of ciphersuites in the DTLS handshake, or the metadata inside STUN messages? One of the things we found is that there's no single "WebRTC" fingerprint, nor even a single "Chrome WebRTC" fingerprint--it depends on the specific application. That said, I'm glad that you are involved, and I am hopeful that pion will be easier to adapt if and when needed.

Oh yes you are 100% right about that. I haven't paid any attention to that, I have only been concerned about interoperability.

This hasn't been a concern for me, but I would love to make this work for you. Maybe we can write some sort of test suite that does ICE/DTLS/SCTP and compares libwebrtc/Pion nightly and brings down the drift.

Would it also be helpful to 'randomize' Pion? We could add features that helps make Snowflake more resistant to fingerprinting. I am not up to date on concepts/needs around censorship circumvention

comment:24 in reply to:  22 Changed 3 months ago by Sean-Der

Replying to cohosh:

If pion/webrtc makes it so we can build snowflake for Windows and Android more easily and if they are amenable to us proposing changes based on observed blocking, then this seems like a good path forward to me.

I would love to take any/all changes you propose! We have a few active contributors, and everyone just needs one sign-off to merge master.

I am really excited to get your involvement/oversight. The quality of Tor's work is so high so I it will end up making Pion a lot better I think :) Pion doesn't make any money/not a corporate project so I only do this after work hours, but I try to move it quickly.

comment:25 Changed 3 months ago by backkem

Hi guys, sorry for the delay. Apparently, I didn't have an email address configured.

Really cool to see this moving forward. We've had users run pion/webrtc on all sorts of platforms, including Windows and Android. Since the entire stack is in pure Go with little to no dependencies, it should be highly portable. We even act as a wrapper around the JavaScript WebRTC implementation when compiling to the JS/WASM build target.

Some comments:

comment:26 Changed 3 months ago by cohosh

Status: assignedaccepted

Moving tickets I'm currently working on to "accepted"

comment:27 in reply to:  25 Changed 3 months ago by cohosh

Replying to backkem:

  • As mentioned in the original ticket and the SCTP PR, you can 'detach' a data channel to get access to the underlying idiomatic API based on the io.ReadWriteCloser.

Thanks! I replied there but I'm copying the answer here to keep everyone in the same loop. While this does solve the problem of how to handle an io.ErrShortBuffer, it doesn't fix the bug in SCTP. I think we'd still prefer to have access to the OnMessage callback anyway instead of implementing our own readLoop, but depending on how you decide to handle the io.ErrShortBuffer return in your readLoop we may need to go that route.

Thanks! This is really interesting, we'll take a look. We have a few tickets about alternative signaling as well: #25985 and #25985

comment:28 Changed 3 months ago by cohosh

Finished porting the client to pion/webrtc: https://github.com/cohosh/snowflake/commit/6bbb9a4b820f34aef4d45c14acff72374307da5e

Most of the changes were small, but there were some tricky differences to work around:

  • OnNegotiationNeeded is no longer supported, but putting the code from that callback into a go routine and calling it immediately after creation of the PeerConnection seems to work just fine.
  • By default, something called the "trickle method" is set to false, which means the OnICEGatheringStateChange callback never gets called (see icegatherer.go#L262 vs icegatherer.go#L143). We get around this by using a SettingEngine here, but it was a bit confusing. The default API should probably have isTrickle set to true by default.
  • OnICECandidate returns a nil candidate when gathering is complete. This isn't really a problem but I got a silent failure when i didn't check for it. It would be nice if this behaviour was documented in the GoDocs.

comment:29 in reply to:  21 Changed 3 months ago by dcf

Replying to cohosh:

Replying to dcf:

Replying to cohosh:

And Snowflake clients are now bootstrapping to 100% :)

A next step is perhaps to run and monitor a pion-based proxy-go alongside the existing libwebrtc-based ones? We can check for crashes and see if there are anomalies with regard to number of clients handled, for example.

Sounds good. Do you want to do a code review first?

The proxy-go changes look good to me.

comment:30 in reply to:  28 Changed 3 months ago by dcf

Replying to cohosh:

Finished porting the client to pion/webrtc: https://github.com/cohosh/snowflake/commit/6bbb9a4b820f34aef4d45c14acff72374307da5e

Most of the changes were small, but there were some tricky differences to work around:

  • OnNegotiationNeeded is no longer supported, but putting the code from that callback into a go routine and calling it immediately after creation of the PeerConnection seems to work just fine.
  • By default, something called the "trickle method" is set to false, which means the OnICEGatheringStateChange callback never gets called (see icegatherer.go#L262 vs icegatherer.go#L143). We get around this by using a SettingEngine here, but it was a bit confusing. The default API should probably have isTrickle set to true by default.
  • OnICECandidate returns a nil candidate when gathering is complete. This isn't really a problem but I got a silent failure when i didn't check for it. It would be nice if this behaviour was documented in the GoDocs.

This is great, thanks. I looked over the changes and didn't spot any problems. Here there's still a reference to OnNegotiationNeeded.

comment:31 Changed 3 months ago by cohosh

I just modified my PRs to pion/webrtc after listening to their feedback. They linked an interesting article https://lgrahl.de/articles/demystifying-webrtc-dc-size-limit.html about message size limits in different implementations of WebRTC (which was the root of our problem). Looks like no implementations handle this particularly well.

comment:32 Changed 3 months ago by backkem

I wanted to quickly reiterate that this is one of the reasons for existence of our (Pion WebRTC) detach API. It provides you with an io.ReadWriter. With this pattern the upper layer protocol (your protocol) can supply the buffer for reading/writing. This means, since you likely know what the appropriate buffer size is for your use-case, you can allocate it accordingly. There is actually a long running issue to expose this in the WebRTC API. More modern web APIs are modeled in a similar way, E.g.: incoming-stream & outgoing-stream. To ensure compatibility with other WebRTC implementations it is of-course still recommended to respect the buffer-size limits mentioned in Lennart's blog post.

comment:33 Changed 3 months ago by cohosh

Hi @bakkem,

Thanks, the size limit change in my pull request actually affects reads only. I still think you want to make that change since your peers have a strictly larger write limit (the write limit is 64KiB and the read limit is 16KiB) than read limit, meaning that peers that both use your implementation will frequently drop data from messages that are too large using the normal Send/OnMessage API. Respecting the limits in the linked article would mean either increasing your read lihttps://github.com/pion/webrtc/issues/718mit to match the write limit of 64KiB or decreasing your write limit to match the read limit of 16KiB. Either would help the problem on this end.

Also a note that I modified my pull request from the original to not increase the buffer on an io.ErrShortBuffer, as explained here

Last edited 3 months ago by cohosh (previous) (diff)

comment:34 Changed 2 months ago by cohosh

As an update to this, the pion/webrtc and pion/sctp pull requests have been approved and merged to master.

I'm going to deploy a proxy-go instance built with pion/webrtc and try running a client with pion/webrtc and see how it goes.

comment:35 in reply to:  34 Changed 2 months ago by cohosh

Replying to cohosh:

I'm going to deploy a proxy-go instance built with pion/webrtc and try running a client with pion/webrtc and see how it goes.

I've started a new proxy-go instance on snowflake.bamsoftware.com named pion-proxy which runs a pion library version of the proxy-go instance located in /usr/local/bin/pion-proxy-go. Logs are available in /home/snowflake-proxy/pion-proxy.d

comment:36 Changed 2 months ago by gaba

Keywords: anti-censorship-roadmap-august added; ex-sponsor-19 removed
Points: 5

comment:37 Changed 2 months ago by gaba

Sponsor: Sponsor28-canSponsor28-must

comment:38 Changed 6 weeks ago by sukhbir

Cc: sukhbir added

comment:39 Changed 6 weeks ago by cohosh

Just to give an update on this, building Tor Browser with this pion library is a bit painful right now. Our reproducible build system (rbm) doesn't work nicely with modules and, after a conversation with boklm, it's preferrable to create a separate project for each go lib dependency. This means a total of 13 pion libraries plus an additional 14+ dependencies that these libraries have. There might be more, I stopped going down the rabbit hole after a while. I don't think creating 30-ish projects just to build this is a viable or sustainable option.

There's an open ticket for integrating go modules into rbm (#28325) it would be nice to know how fast or difficult that task is. Otherwise we could maybe hack together something with custom build commands. It will be a pain to keep track of versions/commits for all of the dependencies.

Last edited 6 weeks ago by cohosh (previous) (diff)

comment:40 in reply to:  39 ; Changed 3 weeks ago by dcf

Replying to cohosh:

Just to give an update on this, building Tor Browser with this pion library is a bit painful right now. Our reproducible build system (rbm) doesn't work nicely with modules and, after a conversation with boklm, it's preferrable to create a separate project for each go lib dependency. This means a total of 13 pion libraries plus an additional 14+ dependencies that these libraries have. There might be more, I stopped going down the rabbit hole after a while. I don't think creating 30-ish projects just to build this is a viable or sustainable option.

I'm going to try brute-force packaging all the dependency projects. The go mod graph command outputs a tree of dependencies. I'm going to use that to try and automate the creation of most of the dependency projects, probably followed by some manual editing.

comment:41 Changed 3 weeks ago by boklm

Cc: boklm added

comment:42 in reply to:  40 Changed 3 weeks ago by boklm

Replying to dcf:

Replying to cohosh:

Just to give an update on this, building Tor Browser with this pion library is a bit painful right now. Our reproducible build system (rbm) doesn't work nicely with modules and, after a conversation with boklm, it's preferrable to create a separate project for each go lib dependency. This means a total of 13 pion libraries plus an additional 14+ dependencies that these libraries have. There might be more, I stopped going down the rabbit hole after a while. I don't think creating 30-ish projects just to build this is a viable or sustainable option.

I'm going to try brute-force packaging all the dependency projects. The go mod graph command outputs a tree of dependencies. I'm going to use that to try and automate the creation of most of the dependency projects, probably followed by some manual editing.

One idea that I didn't try yet but that could maybe help with this would be to create a generic go-module project, in order to be able to list all go dependencies in input_files without having to create a separate project for each. For example a project requiring goxnet and goxsys would include this in its input_files:

input_files:
 - project: go-module
   git_url: https://go.googlesource.com/net
   git_hash: 7dbad50ab5b31073856416cdcfeb2796d682f844
   var:
     go_module_name: goxnet
     go_lib: golang.org/x/net
     go_lib_install:
       - golang.org/x/net/proxy
 - project: go-module
   git_url: https://github.com/golang/sys
   git_hash: 11f53e03133963fb11ae0588e08b5e0b85be8be5
   var:
     go_module_name: goxsys
     go_lib: golang.org/x/sys
     go_lib_install:
       - golang.org/x/sys/cpu

In order to avoid cloning all modules in the same git_clones directory, projects/go-module/config would need to define git_clone_dir to something like:

git_clone_dir: 'git_clones/go-module/[% c("var/go_module_name") %]'

Changed 3 weeks ago by dcf

Attachment: gomodtorbm added

Rough script to produce rbm config files from go mod graph output.

comment:43 in reply to:  40 ; Changed 3 weeks ago by dcf

Replying to dcf:

I'm going to try brute-force packaging all the dependency projects.

Using the attached script, I generated projects for pion-webrtc and all its dependencies. The result is in a branch.

The script doesn't do everything by itself. Its raw output, basically a transcription of go mod graph, is here. I added fixup commits here (adding missing dependencies, enumerating multiple modules within one repo, removing duplicates) and here (expanding version numbers into hashes).

I discovered by accident that one of the dependencies, golang.org/x/sync, was not actually necessary, so I removed it. It's possible that there are more than can be removed. I'm not sure if this is a maintainer neglecting to run go mod tidy to remove an unused dependency, or what.

Overall, my impression so far is that this is not the way we want to continue doing things. The problem I foresee is maintenance across upgrades: the pion-webrtc module upgrades ones of its dependencies, which causes a cascade of updated version requirements down the dependency tree. boklm's suggestion from comment:42 would prevent a proliferation of rbm projects, but we'll still want something to semi-automatically handle upgrades for us. We could of course use go itself--but that's a discussion for #28325.


I did a make testbuild using the pion-webrtc branch at 0f31d1bcfd71bba9ef27fab3b5a6d3231c60213d and put the outputs, including checksums, here:

https://people.torproject.org/~dcf/pt-bundle/tor-browser-pion-webrtc-20190829-0f31d1bcfd/

Everything builds, and from the command line I can run snowflake-client -h and see that it produces output, but unfortunately it doesn't bootstrap for me. But then again, neither does 3cc240625c from cohosh's pion branch from comment:28. So whatever is going wrong for me, is possibly not related to the rbm build.

This is what I see in the snowflake-client log. After this, there's no more output for at least several minutes (that's as long as I waited).

2019/08/29 01:43:47 Rendezvous using Broker at: https://snowflake-broker.bamsoftware.com/
2019/08/29 01:43:47 WebRTC: Collecting a new Snowflake. Currently at [0/3]
2019/08/29 01:43:47 snowflake-UQ9COqlX3fZ5JMmA  connecting...
2019/08/29 01:43:47 Started SOCKS listener.
2019/08/29 01:43:47 SOCKS listening...
2019/08/29 01:43:47 WebRTC: PeerConnection created.
2019/08/29 01:43:47 WebRTC: DataChannel created.
2019/08/29 01:43:47 WebRTC: Created offer
2019/08/29 01:43:47 WebRTC: Set local description
2019/08/29 01:43:48 SOCKS accepted:  {[scrubbed]   map[]}

comment:44 in reply to:  43 ; Changed 3 weeks ago by cohosh

Replying to dcf:
Thanks for looking into this!

I did a make testbuild using the pion-webrtc branch at 0f31d1bcfd71bba9ef27fab3b5a6d3231c60213d and put the outputs, including checksums, here:

https://people.torproject.org/~dcf/pt-bundle/tor-browser-pion-webrtc-20190829-0f31d1bcfd/

Everything builds, and from the command line I can run snowflake-client -h and see that it produces output, but unfortunately it doesn't bootstrap for me. But then again, neither does 3cc240625c from cohosh's pion branch from comment:28. So whatever is going wrong for me, is possibly not related to the rbm build.

This is what I see in the snowflake-client log. After this, there's no more output for at least several minutes (that's as long as I waited).

2019/08/29 01:43:47 Rendezvous using Broker at: https://snowflake-broker.bamsoftware.com/
2019/08/29 01:43:47 WebRTC: Collecting a new Snowflake. Currently at [0/3]
2019/08/29 01:43:47 snowflake-UQ9COqlX3fZ5JMmA  connecting...
2019/08/29 01:43:47 Started SOCKS listener.
2019/08/29 01:43:47 SOCKS listening...
2019/08/29 01:43:47 WebRTC: PeerConnection created.
2019/08/29 01:43:47 WebRTC: DataChannel created.
2019/08/29 01:43:47 WebRTC: Created offer
2019/08/29 01:43:47 WebRTC: Set local description
2019/08/29 01:43:48 SOCKS accepted:  {[scrubbed]   map[]}

Noting that I can reproduce this issue seemingly 100% of the time, I'll investigate whether it's due to recent changes to any of the pion libraries, since bootstrapping used to work

Changed 3 weeks ago by dcf

Attachment: win8-firewall.png added

Screenshot of a Winfows firewall popup on running commit a45ba4792f on Windows 8.

comment:45 Changed 3 weeks ago by dcf

Here are minor changes to build on Windows.

On running it, I got a popup from Windows Firewall asking if I wanted to allow snowflake-client.exe to do something (bind a UDP socket, if I had to guess). It runs, but does not bootstrap, producing a log similar to the one in comment:43.

Screenshot of a Winfows firewall popup on running commit a45ba4792f on Windows 8.

Changed 3 weeks ago by cohosh

Attachment: vendor.zip added

A working directory of pion/webrtc dependencies

comment:46 in reply to:  44 ; Changed 3 weeks ago by cohosh

Replying to cohosh:

Replying to dcf:

Everything builds, and from the command line I can run snowflake-client -h and see that it produces output, but unfortunately it doesn't bootstrap for me. But then again, neither does 3cc240625c from cohosh's pion branch from comment:28. So whatever is going wrong for me, is possibly not related to the rbm build.

This is what I see in the snowflake-client log. After this, there's no more output for at least several minutes (that's as long as I waited).

Noting that I can reproduce this issue seemingly 100% of the time, I'll investigate whether it's due to recent changes to any of the pion libraries, since bootstrapping used to work

Update on this: I tried building with pion/webrtc v2.0.23 (using pion/sctp v1.6.4 and otherwise sticking to the versions specified in go.mod for webrtc v2.0.23) and it bootstraps fully.

There must have been breaking changes to the pion webrtc library added since. I can investigate them but I suggest for now that we use the pion dependencies we know work.

Replying to dcf:

Overall, my impression so far is that this is not the way we want to continue doing things. The problem I foresee is maintenance across upgrades: the pion-webrtc module upgrades ones of its dependencies, which causes a cascade of updated version requirements down the dependency tree. boklm's suggestion from comment:42 would prevent a proliferation of rbm projects, but we'll still want something to semi-automatically handle upgrades for us. We could of course use go itself--but that's a discussion for #28325.

I think the easiest way to go forward here is to take boklm's suggestion in https://trac.torproject.org/projects/tor/ticket/28325#comment:5 and just package up the directory supplied by go mod vendor. I've attached a zip file of working dependencies in vendor.zip above.

comment:47 in reply to:  43 Changed 3 weeks ago by dcf

Replying to dcf:

I did a make testbuild using the pion-webrtc branch at 0f31d1bcfd71bba9ef27fab3b5a6d3231c60213d and put the outputs, including checksums, here:

https://people.torproject.org/~dcf/pt-bundle/tor-browser-pion-webrtc-20190829-0f31d1bcfd/

I did another build of 0f31d1bcfd71bba9ef27fab3b5a6d3231c60213d after wiping out my out/ directory, and reproduced the same checksums.

comment:48 in reply to:  46 Changed 3 weeks ago by dcf

Replying to cohosh:

I think the easiest way to go forward here is to take boklm's suggestion in https://trac.torproject.org/projects/tor/ticket/28325#comment:5 and just package up the directory supplied by go mod vendor. I've attached a zip file of working dependencies in vendor.zip above.

Downloading a premade vendor.zip is a workable idea, but it does reduce the reproducible build's resistance to targeted attacks somewhat. To plant a backdoor in vendor.zip, an attacker would only have to subvert the computer of the developer that produces it (or the small number of developers who produce it and compare their copies with each other's). Once the vendor.zip is "blessed" with a checksum in a build script, no further builds will have a chance to detect the subterfuge. Maybe we could run the go mod vendor step in a steps: fetch_sources: step in projects/pion-webrtc/config instead? Compare how it was done for webrtc: projects/webrtc/config has a custom fetch_sources script that outputs a webrtc-sources-XXX.tar.gz, which is then used by projects/webrtc/build.

comment:49 in reply to:  46 ; Changed 3 weeks ago by dcf

Replying to cohosh:

Update on this: I tried building with pion/webrtc v2.0.23 (using pion/sctp v1.6.4 and otherwise sticking to the versions specified in go.mod for webrtc v2.0.23) and it bootstraps fully.

I tried this hint here:

However it still doesn't work for me :/ Maybe I misunderstood what you suggested. I re-ran the gomodtorbm script with pion-webrtc at v2.0.23 and restored the manual fixups. Then I singularly upgraded pion-sctp to v1.6.4. You can see exactly the changes here. Does it look right?

It's also possible there's something wrong with my local network setup. Can someone try one of the bundles in https://people.torproject.org/~dcf/pt-bundle/tor-browser-pion-webrtc-20190901-6bfc0beea2/ and see if it works? The log output I get is different from what I got in comment:43 and comment:45. No data flows after the initial Buffered X bytes --> WebRTC.

2019/08/31 18:35:51 Negotiating via BrokerChannel...
Target URL:  snowflake-broker.azureedge.net 
Front URL:   ajax.aspnetcdn.com
2019/08/31 18:35:52 SOCKS accepted:  {[scrubbed]  map[]}
2019/08/31 18:35:52 BrokerChannel Response:
200 OK

2019/08/31 18:35:52 Received Answer.
2019/08/31 18:35:52 ---- Handler: snowflake assigned ----
2019/08/31 18:35:53 Buffered 322 bytes --> WebRTC
2019/08/31 18:35:58 Traffic Bytes (in|out): 0 | 322 -- (0 OnMessages, 1 Sends)
2019/08/31 18:36:02 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/08/31 18:36:12 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/08/31 18:36:22 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/08/31 18:36:23 WebRTC: No messages received for 30 seconds -- closing stale connection.
2019/08/31 18:36:23 WebRTC: closing PeerConnection
2019/08/31 18:36:23 WebRTC: Closing
2019/08/31 18:36:23 copy loop ended
2019/08/31 18:36:23 ---- Handler: closed ---
2019/08/31 18:36:23 SOCKS listening...
2019/08/31 18:36:23 synthesizing SIGTERM because of stdin close
2019/08/31 18:36:23 WebRTC: melted all 0 snowflakes.
2019/08/31 18:36:23 snowflake is done.

comment:50 in reply to:  49 Changed 3 weeks ago by cohosh

Replying to dcf:

Replying to cohosh:

Update on this: I tried building with pion/webrtc v2.0.23 (using pion/sctp v1.6.4 and otherwise sticking to the versions specified in go.mod for webrtc v2.0.23) and it bootstraps fully.

I tried this hint here:

However it still doesn't work for me :/ Maybe I misunderstood what you suggested. I re-ran the gomodtorbm script with pion-webrtc at v2.0.23 and restored the manual fixups. Then I singularly upgraded pion-sctp to v1.6.4. You can see exactly the changes here. Does it look right?

It's also possible there's something wrong with my local network setup. Can someone try one of the bundles in https://people.torproject.org/~dcf/pt-bundle/tor-browser-pion-webrtc-20190901-6bfc0beea2/ and see if it works?

Interesting...

I downloaded https://people.torproject.org/~dcf/pt-bundle/tor-browser-pion-webrtc-20190901-6bfc0beea2/tor-browser-linux64-9.0a4_en-US.tar.xz and extracted the snowflake-client executable and ran it in snowbox.

The client bootstrapped fully to 100%. Logs are:

2019/09/01 13:30:25 Negotiating via BrokerChannel...
Target URL:   
Front URL:   localhost:8080
2019/09/01 13:30:27 SOCKS accepted:  {[scrubbed]  map[]}
2019/09/01 13:30:30 BrokerChannel Response:
200 OK

2019/09/01 13:30:30 Received Answer.
2019/09/01 13:30:30 ---- Handler: snowflake assigned ----
2019/09/01 13:30:30 Buffered 316 bytes --> WebRTC
2019/09/01 13:30:30 WebRTC: DataChannel.OnOpen
2019/09/01 13:30:30 Flushed 316 bytes.
2019/09/01 13:30:30 Traffic Bytes (in|out): 742 | 316 -- (1 OnMessages, 1 Sends)
2019/09/01 13:30:35 Traffic Bytes (in|out): 925165 | 348793 -- (123 OnMessages, 49 Sends)
2019/09/01 13:30:40 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/09/01 13:30:41 Traffic Bytes (in|out): 1813256 | 54848 -- (179 OnMessages, 48 Sends)
2019/09/01 13:30:48 Traffic Bytes (in|out): 5430 | 5430 -- (10 OnMessages, 10 Sends)
2019/09/01 13:30:49 WebRTC: closing DataChannel
2019/09/01 13:30:49 WebRTC: closing PeerConnection
2019/09/01 13:30:49 ConnectLoop: stopped.
2019/09/01 13:30:49 WebRTC: DataChannel.OnClose [locally]
2019/09/01 13:30:49 WebRTC: Closing
2019/09/01 13:30:49 WebRTC: melted all 1 snowflakes.
2019/09/01 13:30:49 copy loop ended
2019/09/01 13:30:49 ---- Handler: closed ---
2019/09/01 13:30:49 SOCKS listening...
2019/09/01 13:30:49 snowflake is done.

This was with a standalone proxy also running the pion library and with a standalone proxy without pion.

Then I ran this in snowbox with only a browser-based proxy running and I got the same failure where it doesn't bootstrap past 10%. Log:

2019/09/01 13:42:28 Negotiating via BrokerChannel...
Target URL:   
Front URL:   localhost:8080
2019/09/01 13:42:28 BrokerChannel Response:
200 OK

2019/09/01 13:42:28 Received Answer.
2019/09/01 13:42:38 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/09/01 13:42:48 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/09/01 13:42:50 SOCKS accepted:  {[scrubbed]  map[]}
2019/09/01 13:42:50 ---- Handler: snowflake assigned ----
2019/09/01 13:42:50 Buffered 321 bytes --> WebRTC
2019/09/01 13:42:55 Traffic Bytes (in|out): 0 | 321 -- (0 OnMessages, 1 Sends)
2019/09/01 13:42:58 WebRTC: At capacity [1/1]  Retrying in 10 seconds...
2019/09/01 13:42:58 WebRTC: No messages received for 30 seconds -- closing stale connection.
2019/09/01 13:42:58 WebRTC: closing PeerConnection
2019/09/01 13:42:58 WebRTC: Closing
2019/09/01 13:42:58 copy loop ended
2019/09/01 13:42:58 ---- Handler: closed ---
2019/09/01 13:42:58 SOCKS listening...

So a pion client must not be working well with the browser based proxies. I tried running a non-pion client just to make sure it wasn't a problem with the proxy code and that client bootstrapped fine.

comment:51 Changed 3 weeks ago by dcf

I tried also with pion-webrtc v2.1.3, released 5 days ago.

With it, I get the same failure as in comment:43 with v2.1.2: not even a Buffered X bytes --> WebRTC in the log.

I'm looking at some possibly relevant commits:

Parse DTLS setup in SetRemoteDescription
Take into consideration if remote is running DTLS as a client/server. Before we ignored this value and we could enter cases where DTLS would never connect. Resolves #494
Fix for Safari and latest Firefox
This fixes the echo program so it works properly on Safari and Firefox, where the preferred offered dynamic media type is not 96/VP8. It loads MediaEngine with codecs found in the offer and then uses the payload type of the offer's preferred video codec in the answer.

The latter commit rearranges some API-using code in examples/echo/main.go. Maybe we need to do the same?

There must be a problem either in pion-webrtc, or in the way we are using it. I think a good debugging step now is to make a minimal reproducing example (text chat between a command-line pion-webrtc program and a browser) and see where it is failing.

Last edited 2 weeks ago by dcf (previous) (diff)

comment:52 Changed 3 weeks ago by cohosh

Well I can narrow down that at least one problem I'm having using just master versions of pion dependencies from the snowflake pion branch is due to ICE not gathering any candidates.

For me the client isn't even contacting the broker, but stalling in exchangeSDP L298 waiting for an offer to be sent from when the ICE gathering is completed L174. I don't see any log messages from OnICECandidate.

comment:53 in reply to:  52 Changed 2 weeks ago by cohosh

Replying to cohosh:

Well I can narrow down that at least one problem I'm having using just master versions of pion dependencies from the snowflake pion branch is due to ICE not gathering any candidates.

For me the client isn't even contacting the broker, but stalling in exchangeSDP L298 waiting for an offer to be sent from when the ICE gathering is completed L174. I don't see any log messages from OnICECandidate.

Found the introduction of at least one breaking change here

This is the cause for ICE not working at the client side. It turns out Gather() was never getting called because the condition there failed since we're trying to generate the offer not the answer. We use the new Trickle method of gathering ICE candidates (this was necessary to get pion working at the client, as outlined in comment:28) and this is the only place that Gather is called for that method. I'm not sure why this specific changes at R839 and R1034 were merged, since it doesn't seem to have anything to do with the commit message and I can't figure out it's purpose.

Last edited 2 weeks ago by cohosh (previous) (diff)

Changed 2 weeks ago by cohosh

Patch for pion/webrtc bug

comment:54 Changed 2 weeks ago by cohosh

In addition to the issues above, which can be solved with the attached patch, the proxy is very slow to generate SDP answers causing the broker to timeout. Here are some logs:

2019/09/04 18:01:04 broker returns: 504
2019/09/04 18:01:04 Received offer.
2019/09/04 18:01:24 Setting remote description
2019/09/04 18:01:24 sdp offer successfully received.
2019/09/04 18:01:24 Generating answer...
2019/09/04 18:01:24 error sending answer to client through broker: broker returned 410

I added the {Received offer} log message here as a local change.

You can see that it takes 20 seconds between receiving the offer and generating an answer and after further instrumenting, it appears to be caused by the NewPeerConnection function here.

Note that the proxy-go instances don't use the trickle method and so ICE gathering needs to complete before this function returns. I'll try using the trickle method and setting an OnICEGatheringStateChange callback to set the answer when it completes.

To be honest, starting ICE gathering in NewPeerConnection doesn't make sense to me from a design point of view. I would expect it to occur in CreateAnswer instead (with the trickle method it starts in SetLocalDescription which I also find unintuitive). The patch above also removes a call to Gather from SetRemoteDescription which also confused me.

comment:55 in reply to:  54 ; Changed 2 weeks ago by cohosh

Replying to cohosh:

In addition to the issues above, which can be solved with the attached patch, the proxy is very slow to generate SDP answers causing the broker to timeout. Here are some logs:

2019/09/04 18:01:04 broker returns: 504
2019/09/04 18:01:04 Received offer.
2019/09/04 18:01:24 Setting remote description
2019/09/04 18:01:24 sdp offer successfully received.
2019/09/04 18:01:24 Generating answer...
2019/09/04 18:01:24 error sending answer to client through broker: broker returned 410

I added the {Received offer} log message here as a local change.

You can see that it takes 20 seconds between receiving the offer and generating an answer and after further instrumenting, it appears to be caused by the NewPeerConnection function here.

To be honest, starting ICE gathering in NewPeerConnection doesn't make sense to me from a design point of view. I would expect it to occur in CreateAnswer instead (with the trickle method it starts in SetLocalDescription which I also find unintuitive). The patch above also removes a call to Gather from SetRemoteDescription which also confused me.

This turned out to be a local issue, but I still think it's a weird design.

Now I'm back to the issues found in comment:49. The client successfully completes the rendezvous/signaling and then is failing to open the data channel (which causes the proxy to eventually time out and keep polling).

comment:56 in reply to:  55 Changed 2 weeks ago by cohosh

Replying to cohosh:

Now I'm back to the issues found in comment:49. The client successfully completes the rendezvous/signaling and then is failing to open the data channel (which causes the proxy to eventually time out and keep polling).

Okay it seems to be working fine for me now (with this patch).

I enabled logging of pion trace and debug messages to the snowflake logger with this commit but this caused a race condition that prevented the client from opening a data channel.

comment:57 in reply to:  54 ; Changed 2 weeks ago by dcf

Replying to cohosh:

In addition to the issues above, which can be solved with the attached patch

I've started a build using the patch. The exact commit I'm building from is f52281ae5bca107414a5292e74e2f1eca0608a3b. Specifically, it makes the following changes relative to comment:51:

Replying to cohosh:

Now I'm back to the issues found in comment:49. The client successfully completes the rendezvous/signaling and then is failing to open the data channel (which causes the proxy to eventually time out and keep polling).

Replying to cohosh:

Okay it seems to be working fine for me now (with this patch).

I'm not sure what the expected outcome is now. Is attachment:0001-Allow-gathering-of-candidates-to-generate-offer.patch only meant to fix the post-2.0.23 errors in pion-webrtc that manifested in comment:43 and comment:51 and bring us back to the status quo ante of comment:49? Or does it fix everything in your tests and allow a complete bootstrap? Or only with proxy-go, not with browser-based proxies?

comment:58 in reply to:  57 Changed 2 weeks ago by cohosh

Replying to dcf:

Replying to cohosh:

In addition to the issues above, which can be solved with the attached patch

I've started a build using the patch. The exact commit I'm building from is f52281ae5bca107414a5292e74e2f1eca0608a3b. Specifically, it makes the following changes relative to comment:51:

That's correct. I added the logging commit for debugging purposes which was helpful in debugging the ICE issue but caused a race condition.

Replying to cohosh:

Now I'm back to the issues found in comment:49. The client successfully completes the rendezvous/signaling and then is failing to open the data channel (which causes the proxy to eventually time out and keep polling).

Replying to cohosh:

Okay it seems to be working fine for me now (with this patch).

I'm not sure what the expected outcome is now. Is attachment:0001-Allow-gathering-of-candidates-to-generate-offer.patch only meant to fix the post-2.0.23 errors in pion-webrtc that manifested in comment:43 and comment:51 and bring us back to the status quo ante of comment:49? Or does it fix everything in your tests and allow a complete bootstrap? Or only with proxy-go, not with browser-based proxies?

So it's bootstrapping for me to 100% by following the above procedure. I don't think we've solved the issues you were seeing though. I want to add a lock to the safelog write functions since the race issue there was definitely causing trouble that resulted in logs that matched what you were showing. It's very possible there are more race issues present.

I've found the PR in pion that added these bugs and am talking to them about it there: https://github.com/pion/webrtc/pull/763

The PR introduced this change to get a trickle-based example working. I think it will take some work to build some proof of concepts and demonstrate the issue. This should break things for everyone that relies on STUN to craft offers before they receive an answer (note that the example program they made here has the answering peer send the offering peer their candidates directly so the offering peer does not rely on ICE).

So, my plans are to:

  • implement locks for safelog
  • investigate this problem they mention in the PR about "giving candidates too early" (might be a bug in chromium?)
  • implement their trickle example differently to (1) demonstrate why this change is a problem if the offering peer isn't allowed to perform ICE, and (2) solve the above problem without breaking other use cases
  • create a patch for pion that reverts this breaking change and includes a working trickle example
  • follow up on whether dcf's build is still producing failures
Last edited 2 weeks ago by cohosh (previous) (diff)

comment:59 in reply to:  57 ; Changed 2 weeks ago by dcf

Replying to dcf:

Replying to cohosh:

In addition to the issues above, which can be solved with the attached patch

I've started a build using the patch. The exact commit I'm building from is f52281ae5bca107414a5292e74e2f1eca0608a3b. Specifically, it makes the following changes relative to comment:51:

Here is the build. It's working!

I tested the linux and windows builds, and also tried extracting snowflake-client and running it headless on a server. There are still user experience problems, but I think this is the first time we've had a rbm-build Tor Browser bootstrapping on Windows.

  • Bootstrapping on Windows took a long time, about 10 minutes. I tried again, after deleting the installation directory to remove the consensus cache, and it took about 7 minutes.
  • I'm having trouble actually loading a web page after bootstrapping, though it does notice that a 9.05a update is available and presumably starts downloading it. I only got it to load example.com once. This may be caused by a slow proxy, or #25429 or something. I see a lot of [NOTICE] We tried for 15 seconds to connect to '[scrubbed]'... in the logs. We could try a larger CircuitBuildTimeout.

I went back and checked to see if perhaps pion-webrtc v2.0.23 from comment:49 had really been working all along, and I just failed to test it properly. I extracted its snowflake-client again and tried bootstrapping about a dozen times, and twice it got to 25% then failed, the other times it was as in comment:49. So I don't know what to make of that. Maybe it happened to get lucky and hit a proxy-go instance those two times, which would be consistent with the observation in comment:50.

Last edited 2 weeks ago by dcf (previous) (diff)

comment:60 in reply to:  59 Changed 2 weeks ago by cypherpunks

Replying to dcf:

but I think this is the first time we've had a rbm-build Tor Browser bootstrapping on Windows.

🎉🎉🎉

comment:61 Changed 2 weeks ago by cohosh

Okay, I have a PR under submission for pion/webrtc: https://github.com/pion/webrtc/pull/816

As mentioned in https://github.com/pion/webrtc/pull/763/#issuecomment-528224065, it's worth it for us to revisit the problems we were having in comment:28 that caused us to use the trickle ICE gathering method.

I still think this library is the way to go since it seems our best path forward in getting Snowflake building on Windows (and probably Android as well). However, we've already seen in the short time we've been working with it that a minor version upgrade required several days of development time on our end to get things working again. I pointed out a concern I have with how this specific change was made in my comment on the PR that merged it: https://github.com/pion/webrtc/pull/763/#issuecomment-528967735
The commit that broke things for us significantly altered the functionality of the core WebRTC API, and was made only for the purpose of getting an unusually constructed example working. This development practice isn't what I'd prefer in a library we rely on so heavily.

Like I said, I still like this path forward, I think it will be far easier to maintain than libwebrtc and has many other advantages. The development team at pion have been very enthusiastic and helpful in addressing our issues and merging fixes.

comment:62 Changed 2 weeks ago by Sean-Der

@cohosh your changes have landed in master! Thank you so much for fixing that, and I really apologize for regressing that. I added a test in https://github.com/pion/webrtc/commit/9fa65c214765c65c118b03079b8f81ff365d4d6e to make sure we don't mess things up in the future.

OnNegotiationNeeded is no longer supported, but putting the code from that callback into a go routine and calling it immediately after creation of the PeerConnection seems to work just fine.

This just doesn't exist yet because the work hasn't been put in (but it should!) I filed a ticket and will get that in.

By default, something called the "​trickle method" is set to false

We do want to do this! It will just be an API breakage, when building the original version it was just easier not to do Trickle ICE. We use the SettingEngine to do non-portable behavior.

OnICECandidate returns a nil candidate when gathering is complete

This is to match this behavior from W3C/ECMAScript implementation.

 If the event's candidate property is null, ICE gathering has finished

If this is a sharp edge we should change/fix! Any ideas/how would you expect this to work?

comment:63 in reply to:  59 Changed 5 days ago by cohosh

Replying to dcf:

Here is the build. It's working!

I built tor browser using this branch and got the following sums:

a95656c0a1d220f51ebb7c02264ca14a6dfcaf1f49f9c1eb2b100b8912202fbb  tor-browser-9.0a4-linux-x86_64-95fcd7/mar-tools-linux64.zip
313467567d9f85b11f9fc075f2a450d0fa5daa575a17635a0852fbdecbe7163f  tor-browser-9.0a4-linux-x86_64-95fcd7/tor-browser-linux64-9.0a4_en-US.tar.xz
f2e28bd473b7fc21dc3a4ce0b1c4931de1cc2bbba643cb790a94cd23b5f8a44f  tor-browser-9.0a4-linux-x86_64-95fcd7/tor-browser-linux64-debug.tar.xz
05be2469134ab700ff06a464eaef4c5ae95252810158c6490ab1830d65c1e5df  tor-browser-9.0a4-linux-x86_64-95fcd7/tor-linux64-debug.tar.xz
3c44f8039334230540ca09eeea7b478eb6d2b94186b4f15f65b39d5b5afa8654  tor-browser-9.0a4-linux-i686-7088f9/mar-tools-linux32.zip
b154ec37b41f515a4618a4f2602c5fca082e7cbeb8c14ef6cc85788c156cd200  tor-browser-9.0a4-linux-i686-7088f9/tor-browser-linux32-9.0a4_en-US.tar.xz
08d05d0573f41f55d95bcca4c61374491469ddde220ddd51d9f32e754c395a16  tor-browser-9.0a4-linux-i686-7088f9/tor-browser-linux32-debug.tar.xz
0261e8109dc380f197a24e3e4799182916a060065a97bf66926f7e3f01a0523b  tor-browser-9.0a4-linux-i686-7088f9/tor-linux32-debug.tar.xz
aaf162ecc77e214c04ce78c4eb68ed2f7d239efbc6ad61daeb8af9f9e8d41018  tor-browser-9.0a4-osx-x86_64-bae6f1/TorBrowser-9.0a4-osx64_en-US.dmg
659e4e31ac2875ff24f667040f13a93caaddf2be73a742229bc64c6552af55dd  tor-browser-9.0a4-osx-x86_64-bae6f1/mar-tools-mac64.zip
7d5a8353bde39d0bafb40d9c03eb4f2b28cddd8306bf429e811f1caa9435189c  tor-browser-9.0a4-windows-i686-bc4573/mar-tools-win32.zip
27bee134cf8f4c700729c6d4662bee2c23db3ddf27164e878592091e9c5d6fc8  tor-browser-9.0a4-windows-i686-bc4573/torbrowser-install-9.0a4_en-US.exe
f3fadbac79c94e22eacb5902582bec5dbd303862768fc07c8d4f1bba469a8c5c  tor-browser-9.0a4-windows-x86_64-745fee/mar-tools-win64.zip
ed2a538d8e4b1aecc555e6b1683a93f4096184015549ed9992526b43759f0c9c  tor-browser-9.0a4-windows-x86_64-745fee/torbrowser-install-win64-9.0a4_en-US.exe

Weirdly the sum for TorBrowser-9.0a4-osx64_en-US.dmg differs, but all others look the same.

I couldn't build for android due to #31293

I'm going to update the version of pion used here since we had our fixes upstreamed.

comment:64 Changed 4 days ago by cohosh

Status: acceptedneeds_review

I'll put this into needs_review now. Here's a commit that bumps the version of webrtc and removes the need for the patch: https://github.com/cohosh/tor-browser-build/commit/873685ba2a4756176bf2a680563f09d297cd3a50

And the rebased pion branch of snowflake: https://github.com/cohosh/snowflake/compare/pion

While being able to being able to use the pion logs would be nice, we need Go 1.13 to do it nicely, and we haven't yet bumped the version for Tor Browser builds. I would like to cherry-pick adding locks to safelog though.

Note also that starting with v2.1.4, pion/webrtc is moving to Go 1.13. I'm not sure yet if it can still be built with Go 1.12.

comment:65 Changed 3 days ago by cohosh

I did a really quick speed check the difference between the pion library and the standalone chrome library we were using. I attempted to bootstrap through snowflakes running a pion client and the old version of the client 100 times each.

EDIT: The time measurement here is the time between starting the snowflake client and when a datachannel was opened.

> head(old)
                         runid case time
1  snowflake-snowflake-probe-0  old    2
2 snowflake-snowflake-probe-10  old   20
3 snowflake-snowflake-probe-11  old   20
4 snowflake-snowflake-probe-12  old    1
5 snowflake-snowflake-probe-13  old    2
6 snowflake-snowflake-probe-14  old   20
> pion = subset(data, case == "pion")
> head(pion)
                           runid case time
101  snowflake-snowflake-probe-0 pion   20
102 snowflake-snowflake-probe-10 pion    4
103 snowflake-snowflake-probe-11 pion   20
104 snowflake-snowflake-probe-12 pion   20
105 snowflake-snowflake-probe-13 pion    4
106 snowflake-snowflake-probe-14 pion    4
> mean(pion$time)
[1] 11.99
> mean(old$time)
[1] 10.12

A time of 20 means that the bootstrap timed out before a datachannel opened, so filtering out all timeouts:

> pion = subset(pion, time !=20)
> old = subset(old, time !=20)
> mean(pion$time)
[1] 4.886792
> mean(old$time)
[1] 2.357143
> sd(pion$time)
[1] 1.489197
> sd(old$time)
[1] 1.085919
> length(pion$time)
[1] 53
> length(old$time)
[1] 56

The larger issue here is clearly the fact that about half of all connections, regardless of library, are taking longer than 20 seconds to open a datachannel. The pion library looks slower, but 2 seconds in the context of 20 is really not what we need to be worrying about.

Last edited 3 days ago by cohosh (previous) (diff)
Note: See TracTickets for help on using tickets.