Opened 4 months ago

Last modified 3 months ago

#25688 new defect

proxy-go is still deadlocking occasionally

Reported by: dcf Owned by:
Priority: Low Milestone:
Component: Obfuscation/Snowflake Version:
Severity: Normal Keywords:
Cc: dcf, arlolra Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The three fallback proxy-go instances are still hanging, after variable delays of a few days. This is even after removing all memory restrictions I discussed in comment:64:ticket:21312.

The more heavily used instances seem to deadlock sooner. Those for the currently used broker would be more likely to stop than those for the standalone broker. But the ones for the standalone broker would stop too.

In the meantime, I've put the fallback proxies back on periodic restarts. Before the intervals were 1h,2h,10h; now I increased them to 17h,23h,29h (prime numbers, so the average time before the next restart is < 17h).

I'll update this ticket with a graph showing uptimes when I have time.

Child Tickets

Attachments (4)

proxy-go-log.20180403.zip (2.0 MB) - added by dcf 4 months ago.
proxy-go-fd.20180403.png (30.7 KB) - added by dcf 4 months ago.
proxy-go-mem.20180403.png (49.5 KB) - added by dcf 4 months ago.
proxy-go-starting.20180403.png (130.8 KB) - added by dcf 4 months ago.

Change History (7)

Changed 4 months ago by dcf

Attachment: proxy-go-log.20180403.zip added

Changed 4 months ago by dcf

Attachment: proxy-go-fd.20180403.png added

Changed 4 months ago by dcf

Attachment: proxy-go-mem.20180403.png added

Changed 4 months ago by dcf

comment:1 in reply to:  description Changed 4 months ago by dcf

Replying to dcf:

In the meantime, I've put the fallback proxies back on periodic restarts. Before the intervals were 1h,2h,10h; now I increased them to 17h,23h,29h (prime numbers, so the average time before the next restart is < 17h).

I'll update this ticket with a graph showing uptimes when I have time.

This graph shows restarts ╳, as well as polls and data transfer , so you can see where they got stuck after 3 or 4 days.

The rows 1h, 2h, 10h, 17h, 23h, 29h are set to periodically restart with no memory limit. The rows a, b, c ran with different memory limits at different times:

2018-03-22 03:19:19 all ulimit -v 400 MB
2018-03-22 05:16:59 a appengine unlimited, c appengine ulimit -v 800 MB, all others ulimit -v 400 MB
2018-03-23 01:02:18 a appengine unlimited, all others ulimit -m 200 MB
2018-03-27 19:26:12 a, b, c unlimited

ulimit -v caused quick deadlocks; ulimit -m was less quick but it still happened, and it still happened even with no ulimit.

Scroll right →→→

Memory use plateaus around the time of the hangs, though that may be as a result rather than a cause. The server has 2 GB and it doesn't get close to that. (Beware the horizontal axis doesn't quite line up with the graph above.)

File descriptors seem stable.

comment:2 Changed 3 months ago by arma

Is there some strace or ptrace or gdb equivalent for go, that lets you figure out *where* it's deadlocking? :)

comment:3 Changed 3 months ago by arlolra

Is there some strace or ptrace or gdb equivalent for go, that lets you figure out *where* it's deadlocking? :)

I'm working on that, but since we don't have any more clients, it's becoming harder to produce.

Note: See TracTickets for help on using tickets.