Opened 12 months ago

Closed 3 months ago

#31278 closed defect (fixed)

Chrome proxies hang with open idle connection

Reported by: cypherpunks Owned by:
Priority: Medium Milestone:
Component: Circumvention/Snowflake Version:
Severity: Normal Keywords: snowflake-webextension
Cc: arlolra, cohosh, phw, dcf Actual Points:
Parent ID: Points: 2
Reviewer: dcf Sponsor:

Description

Using Chrome 75.0.3770.142 on Windows, with addon 0.0.7 and the latest static page at https://snowflake.torproject.org/snowflake that was committed 2019-07-27. Started Chrome, enabled addon, and opened 3 tabs with the static page. The addon and all 3 tabs all established good client connections within an hour or two, exchanged data both directions for a while, and then stalled out with the client connection staying open, preventing serving any other clients.

All 4 client connections have been stuck in this state for 2 days now, with the addon just showing 1 connected and 0 in the past 24 hours, never changing, and the 3 tabs all just repeating "websocket --> WebRTC data: 543 bytes" every couple minutes with no transfer size change or any data flowing in the other direction. The 3 tabs show connections to clients at different IP addresses, so I don't think it's just a user leaving their client window open forever.

A similar test on Firefox with the same deployment date shows more expected behavior of a few clients served each day per tab or addon, with connections being closed after transferring data for a while and then later serving another client (so the dropped broker connection bug seems fixed there).

Child Tickets

Change History (10)

comment:1 Changed 12 months ago by dcf

There's a new ticket at https://github.com/keroserene/go-webrtc/issues/107 that has to do with Chrome.

According https://bugs.chromium.org/p/webrtc/issues/detail?id=9484 new versions of Chrome are sending new offer format and answer cannot be generated.

For example, for RemoteDescription with info

m=application 54111 UDP/DTLS/SCTP webrtc-datachannel a=sctp-port:5000

pc.CreateAnswer does not produce any result - no error, no answer

Similiar issue for node implementation - node-webrtc/node-webrtc#483

comment:2 Changed 12 months ago by cohosh

Points: 2

comment:3 in reply to:  1 Changed 3 months ago by cohosh

Replying to dcf:

There's a new ticket at https://github.com/keroserene/go-webrtc/issues/107 that has to do with Chrome.

According https://bugs.chromium.org/p/webrtc/issues/detail?id=9484 new versions of Chrome are sending new offer format and answer cannot be generated.

For example, for RemoteDescription with info

m=application 54111 UDP/DTLS/SCTP webrtc-datachannel a=sctp-port:5000

pc.CreateAnswer does not produce any result - no error, no answer

Similiar issue for node implementation - node-webrtc/node-webrtc#483

This specific cause should no longer be an issue for us, since we added the datachannel timeout in #31100 (it also looks like it's been fixed upstream in wrtc at least). I'm still seeing that Chrome takes a very long time to realize the datachannel has been closed by the client. I started running a proxy in my local set up and killed the client once the datachannel opened. Two hours later, my chrome proxy still hasn't detected that the channel has been closed.

My guess is this is a bug upstream that should be fixed, but in the short term, we could use something like the check for staleness at the client side to close the connection after 10-30 seconds of inactivity (especially since the client is supposed to be sending heartbeat messages every 10 seconds anyway).

comment:4 Changed 3 months ago by cohosh

I'm at 5.5 hours now and it's still going.

comment:5 Changed 3 months ago by cohosh

Status: newneeds_revision

Here's a workaround fix that takes advantage of our new keep-alive pings: https://github.com/cohosh/snowflake-webext/pull/4

I actually haven't been able to reproduce the behaviour today but I did test to make sure it doesn't interfere with regular proxy behaviour. Not sure if something changed upstream or if it's actually more difficult to reproduce than I thought.

comment:6 Changed 3 months ago by cohosh

Hmm, okay this isn't ready yet.

comment:7 Changed 3 months ago by cohosh

Status: needs_revisionneeds_review

Okay now it's ready, just had to update some tests too.

comment:8 Changed 3 months ago by dcf

Reviewer: dcf

comment:9 Changed 3 months ago by dcf

Status: needs_reviewmerge_ready

Looks okay. I would suggest false for the default value of messageTimer rather than 0, because it's more obviously false in an if statement and less likely to be confused with a count of milliseconds. I would also increase messageTimeout by 5 seconds or so, so it's well above SnowflakeTimeout in the client--just so that the sequence of events is deterministic and we don't have the client and proxy racing to close the connection with equal timeouts.

comment:10 Changed 3 months ago by cohosh

Resolution: fixed
Status: merge_readyclosed

Merged at 41009

Note: See TracTickets for help on using tickets.