Hi, I'm presently trying to move doctor from yatei to its new vm (cappadocicum). On yatei doctor was owned by the metrics user and located on /srv/doctor.torproject.org. I've copied doctor over to cappadocicum but lack permission to move it to /srv (which is owned by root). Ideally I suspect it would be nice to have a doctor user with /srv/doctor.
After this is done it should be fine to revoke my permissions to run as the metrics user. Also, yatei's /srv/doctor.torproject.org will be safe to delete (I've commented out the doctor crontab there).
Thanks! -Damian
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Karsten re-enabled the crontab on yatei so scratch the part about that being safe to delete. When this ticket is addressed I'll re-transfer the contents to cappadocicum and disable the copy on yatei again.
There is now an /srv/doctor.torproject.org on cappadocicum, owned by a doctor uid and gid.
Great! Disabled Doctor on yatei and spun it up on cappadocicum. Next steps are to...
Remove my access to yatei and the metrics user.
Remove Doctor from yatei (rm -rf /srv/doctor.torproject.org). I've removed it from the crontab so Doctor is now unused. The only reason I didn't delete it myself was because I wasn't sure if there was an automated backup process or something else acting on /srv I should be wary of.
After that this can be resolved.
Sorry this got lost between the cracks. Ping me if you need anything else.
No worries. I thought about pinging you but decided this wasn't very important (Doctor was still chugging along happily on yatei).
Hmmm, on reflection looks like it might not be all roses. I saw one successful run but since this I've started getting cron emails saying simply that consensus_health_checker.py was killed. I'll need to look into this tomorrow.
DocTor runs three checkers: descriptor validation, sybil checks, and consensus health checks. The first two are running fine since they only operate on a single consensus at a time. The last however is triggering the OOM killer.
DocTor's consensus health checker downloads both the vote and consensus from each authority, then runs a series of checks. This means 18 network status documents in memory at a time (2 documents x 9 authorities). This was fine for yatei, but cappadocicum only has 496 MB.
I've sunk an hour into seeing if there's a quick change I can made to sidestep this, but not spotting anything. Can we boost cappadocicum's memory? If not then this might need to swap back to yatei.
Re-enabled the consensus health checks on yatei, though still running the other two on cappadocicum. Peter suggested checking with Andrew about bumping the memory so making it so.
DocTor's consensus health checker downloads both the vote and consensus from each authority, then runs a series of checks. This means 18 network status documents in memory at a time (2 documents x 9 authorities). This was fine for yatei, but cappadocicum only has 496 MB.
atagar@cappadocicum:~$ free total used free shared buffers cachedMem: 1026504 492732 533772 0 100312 229212-/+ buffers/cache: 163208 863296Swap: 724988 0 724988
I gave our consensus tracker a whirl and it worked great (actually, barely dipped into swap which I found surprising). We should be good to go - I disabled DocTor's cron on yatei and reactivated it on cappadocicum.
Please wait a couple days so I can be sure DocTor is chugging along happily on cappadocicum. If I don't report any further issues then we can move forward with removing my access to metrics/yatei and cleaning DocTor off it.