Opened 6 months ago

Last modified 7 weeks ago

#25688 new defect

proxy-go is still deadlocking occasionally

Reported by: dcf Owned by:
Priority: Low Milestone:
Component: Obfuscation/Snowflake Version:
Severity: Normal Keywords:
Cc: dcf, arlolra Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The three fallback proxy-go instances are still hanging, after variable delays of a few days. This is even after removing all memory restrictions I discussed in comment:64:ticket:21312.

The more heavily used instances seem to deadlock sooner. Those for the currently used broker would be more likely to stop than those for the standalone broker. But the ones for the standalone broker would stop too.

In the meantime, I've put the fallback proxies back on periodic restarts. Before the intervals were 1h,2h,10h; now I increased them to 17h,23h,29h (prime numbers, so the average time before the next restart is < 17h).

I'll update this ticket with a graph showing uptimes when I have time.

Child Tickets

Attachments (4)

proxy-go-log.20180403.zip (2.0 MB) - added by dcf 6 months ago.
proxy-go-fd.20180403.png (30.7 KB) - added by dcf 6 months ago.
proxy-go-mem.20180403.png (49.5 KB) - added by dcf 6 months ago.
proxy-go-starting.20180403.png (130.8 KB) - added by dcf 6 months ago.

Change History (8)

Changed 6 months ago by dcf

Attachment: proxy-go-log.20180403.zip added

Changed 6 months ago by dcf

Attachment: proxy-go-fd.20180403.png added

Changed 6 months ago by dcf

Attachment: proxy-go-mem.20180403.png added

Changed 6 months ago by dcf

comment:1 in reply to:  description Changed 6 months ago by dcf

Replying to dcf:

In the meantime, I've put the fallback proxies back on periodic restarts. Before the intervals were 1h,2h,10h; now I increased them to 17h,23h,29h (prime numbers, so the average time before the next restart is < 17h).

I'll update this ticket with a graph showing uptimes when I have time.

This graph shows restarts ╳, as well as polls and data transfer , so you can see where they got stuck after 3 or 4 days.

The rows 1h, 2h, 10h, 17h, 23h, 29h are set to periodically restart with no memory limit. The rows a, b, c ran with different memory limits at different times:

2018-03-22 03:19:19 all ulimit -v 400 MB
2018-03-22 05:16:59 a appengine unlimited, c appengine ulimit -v 800 MB, all others ulimit -v 400 MB
2018-03-23 01:02:18 a appengine unlimited, all others ulimit -m 200 MB
2018-03-27 19:26:12 a, b, c unlimited

ulimit -v caused quick deadlocks; ulimit -m was less quick but it still happened, and it still happened even with no ulimit.

Scroll right →→→

Memory use plateaus around the time of the hangs, though that may be as a result rather than a cause. The server has 2 GB and it doesn't get close to that. (Beware the horizontal axis doesn't quite line up with the graph above.)

File descriptors seem stable.

comment:2 Changed 5 months ago by arma

Is there some strace or ptrace or gdb equivalent for go, that lets you figure out *where* it's deadlocking? :)

comment:3 Changed 5 months ago by arlolra

Is there some strace or ptrace or gdb equivalent for go, that lets you figure out *where* it's deadlocking? :)

I'm working on that, but since we don't have any more clients, it's becoming harder to produce.

comment:4 Changed 7 weeks ago by arlolra

Is there some strace or ptrace or gdb equivalent for go, that lets you figure out *where* it's deadlocking? :)

(gdb) info threads
 Id   Target Id         Frame 
 1    Thread 0x7fef23e90740 (LWP 10511) "proxy-go" runtime.epollwait () at /usr/lib/go-1.7/src/runtime/sys_linux_amd64.s:525
 2    Thread 0x7fef21e8e700 (LWP 10512) "proxy-go" runtime.futex () at /usr/lib/go-1.7/src/runtime/sys_linux_amd64.s:388
 3    Thread 0x7fef2168d700 (LWP 10513) "proxy-go" runtime.futex () at /usr/lib/go-1.7/src/runtime/sys_linux_amd64.s:388
 4    Thread 0x7fef20e8c700 (LWP 10514) "proxy-go" runtime.futex () at /usr/lib/go-1.7/src/runtime/sys_linux_amd64.s:388
 5    Thread 0x7fef1bfff700 (LWP 10515) "proxy-go" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 6    Thread 0x7fef1b7fe700 (LWP 10516) "proxy-go" runtime.futex () at /usr/lib/go-1.7/src/runtime/sys_linux_amd64.s:388
 7    Thread 0x7fef1affd700 (LWP 13632) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 8    Thread 0x7fef1a7fc700 (LWP 13633) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 9    Thread 0x7fef2052b700 (LWP 13634) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 10   Thread 0x7fef2042a700 (LWP 13635) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 11   Thread 0x7fef197fa700 (LWP 14054) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 12   Thread 0x7fef19ffb700 (LWP 14055) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 13   Thread 0x7fef20117700 (LWP 14056) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 14   Thread 0x7fef18bdc700 (LWP 14057) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 15   Thread 0x7feefbfff700 (LWP 14061) "SCTP timer" 0x00007fff62becdb8 in gettimeofday ()
 16   Thread 0x7feefb7fe700 (LWP 14112) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 17   Thread 0x7feefaffd700 (LWP 14113) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 18   Thread 0x7fef182da700 (LWP 14114) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 19   Thread 0x7fef181d9700 (LWP 14115) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 20   Thread 0x7feef9ffb700 (LWP 14156) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 21   Thread 0x7feefa7fc700 (LWP 14157) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 22   Thread 0x7feef97fa700 (LWP 14158) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 23   Thread 0x7feef96f9700 (LWP 14159) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 24   Thread 0x7feef95f8700 (LWP 14167) "CGO Signalling" runtime.futex () at /usr/lib/go-1.7/src/runtime/sys_linux_amd64.s:388
 25   Thread 0x7feef8df7700 (LWP 14168) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 26   Thread 0x7feef85f6700 (LWP 14169) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 27   Thread 0x7feef84f5700 (LWP 14170) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 28   Thread 0x7feede7fc700 (LWP 14250) "proxy-go" runtime.futex () at /usr/lib/go-1.7/src/runtime/sys_linux_amd64.s:388
 29   Thread 0x7feedeffd700 (LWP 14251) "proxy-go" runtime.futex () at /usr/lib/go-1.7/src/runtime/sys_linux_amd64.s:388
 30   Thread 0x7feedffff700 (LWP 14287) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 31   Thread 0x7feedf7fe700 (LWP 14288) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 32   Thread 0x7feef83f4700 (LWP 14289) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 33   Thread 0x7feef82f3700 (LWP 14290) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 34   Thread 0x7feedd7fa700 (LWP 14293) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 35   Thread 0x7feeddffb700 (LWP 14294) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 36   Thread 0x7feef81f2700 (LWP 14295) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 37   Thread 0x7feedcff9700 (LWP 14296) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 38   Thread 0x7feed77fe700 (LWP 14506) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 39   Thread 0x7feed6ffd700 (LWP 14507) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 40   Thread 0x7feedc4f5700 (LWP 14508) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 41   Thread 0x7feedc3f4700 (LWP 14509) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 42   Thread 0x7feed7fff700 (LWP 14923) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 43   Thread 0x7feedcef8700 (LWP 14924) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 44   Thread 0x7feedc6f7700 (LWP 14925) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 45   Thread 0x7feedc5f6700 (LWP 14926) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
 46   Thread 0x7feed5ffb700 (LWP 15038) "CGO Signalling" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 47   Thread 0x7feed67fc700 (LWP 15039) "CGO Worker" 0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
 48   Thread 0x7feedc2f3700 (LWP 15040) "VoiceProcessThr" pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
 49   Thread 0x7feedc1f2700 (LWP 15041) "RtcEventLog thr" pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185


(gdb) thread 5
[Switching to thread 5 (Thread 0x7fef1bfff700 (LWP 10515))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055a388bd59cb in rtc::Event::Wait(int) ()
#2  0x000055a388cea279 in webrtc::DataChannelProxyWithInternal<webrtc::DataChannelInterface>::Send(webrtc::DataBuffer const&) ()
#3  0x000055a388bc7e88 in CGO_Channel_Send (channel=0x7feee401a060, data=0xc420116000, size=<optimized out>, binary=<optimized out>) at /home/[scrub]/go/src/github.com/keroserene/go-webrtc/datachannel.cc:19
#4  0x000055a388a10c70 in runtime.asmcgocall () at /usr/lib/go-1.7/src/runtime/asm_amd64.s:590
#5  0x000000c4204d9c20 in ?? ()
#6  0x000055a3889bb54d in runtime.cgocall (fn=0x0, arg=0xc4200bb1e0, ~r2=469757624) at /usr/lib/go-1.7/src/runtime/cgocall.go:115
#7  0x000000c420013500 in ?? ()
#8  0x0000000000000000 in ?? ()


(gdb) thread 17
[Switching to thread 17 (Thread 0x7feefaffd700 (LWP 14113))]
#0  0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
84	in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x00007fef227b83c3 in select () at ../sysdeps/unix/syscall-template.S:84
#1  0x000055a388c21e08 in rtc::PhysicalSocketServer::Wait(int, bool) ()
#2  0x000055a388bd4371 in rtc::MessageQueue::Get(rtc::Message*, int, bool) ()
#3  0x000055a388bd2a32 in rtc::Thread::Run() ()
#4  0x000055a388bd2743 in rtc::Thread::PreRun(void*) ()
#5  0x00007fef2331a494 in start_thread (arg=0x7feefaffd700) at pthread_create.c:333
#6  0x00007fef227bfacf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97


(gdb) thread 45
[Switching to thread 45 (Thread 0x7feedc5f6700 (LWP 14926))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055a388bd59cb in rtc::Event::Wait(int) ()
#2  0x000055a38900630f in webrtc::RtcEventLogHelperThread::ProcessEvents() ()
#3  0x000055a388bd783f in rtc::PlatformThread::Run() ()
#4  0x000055a388bd7746 in rtc::PlatformThread::StartThread(void*) ()
#5  0x00007fef2331a494 in start_thread (arg=0x7feedc5f6700) at pthread_create.c:333
#6  0x00007fef227bfacf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
Note: See TracTickets for help on using tickets.