Memory request for cappadocicum

added component::internal services/tor sysadmin team owner::phobos priority::low resolution::fixed status::closed type::enhancement labels

Hmm. https://tor-guest@munin.torproject.org/torproject.org/cappadocicum.torproject.org/memory.html

Is this still happening?

Replying to weasel:

Hmm. https://tor-guest@munin.torproject.org/torproject.org/cappadocicum.torproject.org/memory.html

I believe weasel is noticing that there is around 800 MB of free memory on average. perhaps something else is hitting an internal limit which isn't tied to the kernel OOM limit.

Is this still happening?

Nope. The last occurrence was on 6/25. I'll tentatively close this for now and we'll see if it reoccurs.

Trac:
Status: new to closed
Resolution: N/A to worksforme

Reopening as this has happened sixteen more times over the last couple days. One provided the aforementioned stacktrace but most simply say 'killed'. I'm watching the host a bit right now. First thing that I noticed is that there were multiple python processes running in parallel. Cron should be spacing them out to avoid that, but it looks like the consensus checker is taking a lot longer to run of late...

grep "Checks finished" /srv/doctor.torproject.org/doctor/logs/consensus_health_checker | tail -n 15
07/03/2014 14:49:02 [DEBUG] Checks finished, runtime was 241.29 seconds
07/03/2014 15:51:28 [DEBUG] Checks finished, runtime was 385.82 seconds
07/03/2014 16:48:15 [DEBUG] Checks finished, runtime was 193.36 seconds
07/03/2014 17:48:54 [DEBUG] Checks finished, runtime was 232.95 seconds
07/03/2014 18:51:13 [DEBUG] Checks finished, runtime was 371.43 seconds
07/03/2014 19:46:21 [DEBUG] Checks finished, runtime was 79.41 seconds
07/04/2014 03:46:56 [DEBUG] Checks finished, runtime was 113.61 seconds
07/04/2014 04:46:16 [DEBUG] Checks finished, runtime was 75.24 seconds
07/04/2014 05:51:23 [DEBUG] Checks finished, runtime was 381.72 seconds
07/04/2014 06:52:03 [DEBUG] Checks finished, runtime was 421.76 seconds
07/04/2014 08:00:32 [DEBUG] Checks finished, runtime was 930.34 seconds
07/04/2014 15:01:49 [DEBUG] Checks finished, runtime was 1006.77 seconds
07/04/2014 17:00:36 [DEBUG] Checks finished, runtime was 934.79 seconds
07/04/2014 17:58:50 [DEBUG] Checks finished, runtime was 827.66 seconds
07/04/2014 19:02:07 [DEBUG] Checks finished, runtime was 1025.27 seconds

Changing the title and reassigning to my component as I'm really sure if this is actually memory related or not.

Trac:
Summary: Memory request for cappadocicum to consensus_health_checker repeatedly failing
Resolution: worksforme to N/A
Status: closed to reopened
Component: Tor Sysadmin Team to DocTor

Ok, sending this back over to Andrew. The host has 1 GB memory and the consensus tracker is gobbling up almost all of it. The host is eating into swap which is probably why the consensus tracker is taking quite a bit longer now, overlapping with the other checks.

Would it be prohibitively costly to bump the memory to 1.5 GB? If so then maybe I should go back to running this from my desktop here at home.

KiB Mem:   1026500 total,   977456 used,    49044 free,     2956 buffers
KiB Swap:   724988 total,   535648 used,   189340 free,    17528 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
20667 doctor    20   0 1168m 863m 1964 R   1.7 86.1   1:22.02 python

Trac:
Summary: consensus_health_checker repeatedly failing to Memory request for cappadocicum
Component: DocTor to Tor Sysadmin Team

Trac:
Owner: N/A to phobos
Status: reopened to accepted

upgrade request submitted. sorry it took so long.

No problem, and thanks!

upgraded to 2 GB.

Trac:
Status: accepted to closed
Resolution: N/A to fixed

closed

Memory request for cappadocicum

Child items ...

Activity