Opened 5 weeks ago

Closed 5 weeks ago

Last modified 5 weeks ago

#29861 closed defect (fixed)

Snowflake is not working right now for some reason

Reported by: cypherpunks Owned by: cohosh
Priority: Medium Milestone:
Component: Obfuscation/Snowflake Version:
Severity: Normal Keywords:
Cc: dcf, arlolra, cohosh Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description


Child Tickets

Change History (9)

comment:1 Changed 5 weeks ago by cohosh

Owner: set to cohosh
Status: newassigned

Looks like the proxy-go instances are deadlocked (as reported in #25688). Not marking as duplicate yet because an attempt to restart them seems to be causing another deadlock.

comment:2 Changed 5 weeks ago by cohosh

We have three proxy-go instances set up to periodically restart every 17h, 23h, 29h (as per #25688). However from the logs it looks like the instances have deadlocked immediately upon the last few restarts:

$ sudo tail -f /home/snowflake-proxy/snowflake-proxy-standalone-23h.log.d/current 
INFO: peerconnection.go:504: fired OnIceConnectionStateChange:  41
INFO: peerconnection.go:436: fired OnSignalingStateChange:  41 Closed
2019/03/22 06:31:37 OnClose channel
2019/03/22 06:31:37 datachannelHandler ends
2019/03/22 09:05:25 starting
INFO: configuration.go:174: Created Configuration at  &{[{[stun:stun.l.google.com:19302]  }] All Balanced }
2019/03/22 16:21:26 starting
INFO: configuration.go:174: Created Configuration at  &{[{[stun:stun.l.google.com:19302]  }] All Balanced }
2019/03/22 16:21:43 starting
INFO: configuration.go:174: Created Configuration at  &{[{[stun:stun.l.google.com:19302]  }] All Balanced }

There was a deadlock around 06:31UTC and a restart at 09:05UTC that immediately deadlocked before I restarted again at 16:21UTC (which again immediately deadlocked)

comment:3 Changed 5 weeks ago by cohosh

A quick gdb investigation shows it's stalling while polling the broker:

    req, _ := http.NewRequest("POST", broker.String(), bytes.NewBuffer([]byte(sid)))
    req.Header.Set("X-Session-ID", sid)
    resp, err := client.Do(req)

More specifically, it stalls in the net/http library here: https://golang.org/src/net/http/client.go?s=17311:17363#L641

comment:4 Changed 5 weeks ago by cohosh

A manual POST to the broker's /proxy handler also hangs: $ curl -H "X-Session-ID: CDSNV6qqDTc6SCeA0BfWRQ==" -d "CDSNV6qqDTc6SCeA0BfWRQ==" https://snowflake-broker.bamsoftware.com/proxy.

Last edited 5 weeks ago by cohosh (previous) (diff)

comment:5 Changed 5 weeks ago by cohosh

This is definitely a problem with the broker. The last log output shows 2019/03/22 06:25:54 Received answer: and it is now 17:11UTC

Both the /proxy and /client handlers hang with curl. The /debug handler functions as expected, and the /answer handler returns a 410 status code:

$ curl -I https://snowflake-broker.bamsoftware.com/debug
HTTP/2 200 
access-control-allow-headers: Origin, X-Session-ID
access-control-allow-origin: *
content-type: text/plain; charset=utf-8
content-length: 52
date: Fri, 22 Mar 2019 17:16:43 GMT
$ curl -I https://snowflake-broker.bamsoftware.com/answer
HTTP/2 410 
access-control-allow-headers: Origin, X-Session-ID
access-control-allow-origin: *
date: Fri, 22 Mar 2019 17:15:47 GMT
Last edited 5 weeks ago by cohosh (previous) (diff)

comment:6 Changed 5 weeks ago by cohosh

Ahhh here is the problem: No space left on device

Investigating...

comment:7 Changed 5 weeks ago by dcf

Good catch on this. As you found, the cause is a disk full of broker logs.

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      5.0G  5.0G   20K 100% /
$ du -sh /var/log/snowflake-broker
4.2G    /var/log/snowflake-broker

The final lines in the log are (the last line is truncated):

2019/03/22 06:25:54 Proxy HqhbGCj+n6TCwOs/FB0HdA did not receive a Client offer.
2019/03/22 06:25:54 Received snowflake:  HqhbGCj+n6TCwOs/FB0HdA
2019/03/22 06:25:54 Received answer:

A similar thing happened with the proxy-go logs. comment:6:ticket:28390 describes what we did (rotation and compression) to fix it.

Do you want to try doing those steps here? Or I can do it. Stop the broker with sv stop snowflake-broker so it stops trying to write logs. Then free up some space; usually apt autoclean and apt autoremove work. Compress the old log with xz -v /var/log/snowflake-broker/current. Then modify the service files; on the snowflake bridge host, you can see the runit changes made in /etc for proxy-go with git show 18d2e70837f8434827ff2d92f1c1901f7be29569. Then, you'll also need to edit the config file in /var/log/snowflake-broker/ as shown. Finally restart with sv start snowflake-broker.

comment:8 Changed 5 weeks ago by cohosh

Resolution: fixed
Status: assignedclosed

Set up compressed and rotating logs. Things look like they are now back up and running.

comment:9 in reply to:  7 Changed 5 weeks ago by dcf

Replying to dcf:

Stop the broker with sv stop snowflake-broker so it stops trying to write logs. Then free up some space; usually apt autoclean and apt autoremove work. Compress the old log with xz -v /var/log/snowflake-broker/current. Then modify the service files; on the snowflake bridge host, you can see the runit changes made in /etc for proxy-go with git show 18d2e70837f8434827ff2d92f1c1901f7be29569. Then, you'll also need to edit the config file in /var/log/snowflake-broker/ as shown. Finally restart with sv start snowflake-broker.

A few changes we had to make to this plan: there wasn't enough disk space to even run apt autoclean, so we had to download the log file, delete it from the server, and re-upload a compressed version after we'd finished the other steps. We had to stop and start the snowflake-broker/log service separately from the snowflake-broker one. We didn't need to modify any service files, because it turns out that the snowflake-broker service was already set up to log using svlogd. So we only had to edit the /var/log/snowflake-broker/config file to add rotation and compression, as in comment:6:ticket:28390.

Timestamp for the first log message after being restarted:

2019/03/22 18:34:02 ACME hostnames: ["snowflake-broker.bamsoftware.com"]
Last edited 5 weeks ago by dcf (previous) (diff)
Note: See TracTickets for help on using tickets.