Opened 3 years ago

Closed 3 years ago

#10413 closed enhancement (implemented)

Request for doctor user and /srv directory

Reported by: atagar Owned by: phobos
Priority: Low Milestone:
Component: Internal Services/Tor Sysadmin Team Version:
Severity: Keywords:
Cc: karsten Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Hi, I'm presently trying to move doctor from yatei to its new vm (cappadocicum). On yatei doctor was owned by the metrics user and located on /srv/doctor.torproject.org. I've copied doctor over to cappadocicum but lack permission to move it to /srv (which is owned by root). Ideally I suspect it would be nice to have a doctor user with /srv/doctor.

After this is done it should be fine to revoke my permissions to run as the metrics user. Also, yatei's /srv/doctor.torproject.org will be safe to delete (I've commented out the doctor crontab there).

Thanks! -Damian

Child Tickets

Change History (16)

comment:1 Changed 3 years ago by atagar

Karsten re-enabled the crontab on yatei so scratch the part about that being safe to delete. When this ticket is addressed I'll re-transfer the contents to cappadocicum and disable the copy on yatei again.

comment:2 Changed 3 years ago by karsten

  • Cc karsten added

comment:3 Changed 3 years ago by weasel

  • Status changed from new to needs_review

There is now an /srv/doctor.torproject.org on cappadocicum, owned by a doctor uid and gid.

You can sudo to the doctor user.

Sorry this got lost between the cracks. Ping me if you need anything else.

comment:4 Changed 3 years ago by atagar

  • Status changed from needs_review to new

There is now an /srv/doctor.torproject.org on cappadocicum, owned by a doctor uid and gid.

Great! Disabled Doctor on yatei and spun it up on cappadocicum. Next steps are to...

  1. Remove my access to yatei and the metrics user.
  2. Remove Doctor from yatei (rm -rf /srv/doctor.torproject.org). I've removed it from the crontab so Doctor is now unused. The only reason I didn't delete it myself was because I wasn't sure if there was an automated backup process or something else acting on /srv I should be wary of.

After that this can be resolved.

Sorry this got lost between the cracks. Ping me if you need anything else.

No worries. I thought about pinging you but decided this wasn't very important (Doctor was still chugging along happily on yatei).

Cheers! -Damian

comment:5 Changed 3 years ago by atagar

Hmmm, on reflection looks like it might not be all roses. I saw one successful run but since this I've started getting cron emails saying simply that consensus_health_checker.py was killed. I'll need to look into this tomorrow.

comment:6 follow-up: Changed 3 years ago by atagar

DocTor runs three checkers: descriptor validation, sybil checks, and consensus health checks. The first two are running fine since they only operate on a single consensus at a time. The last however is triggering the OOM killer.

DocTor's consensus health checker downloads both the vote and consensus from each authority, then runs a series of checks. This means 18 network status documents in memory at a time (2 documents x 9 authorities). This was fine for yatei, but cappadocicum only has 496 MB.

I've sunk an hour into seeing if there's a quick change I can made to sidestep this, but not spotting anything. Can we boost cappadocicum's memory? If not then this might need to swap back to yatei.

comment:7 Changed 3 years ago by atagar

Re-enabled the consensus health checks on yatei, though still running the other two on cappadocicum. Peter suggested checking with Andrew about bumping the memory so making it so.

comment:8 Changed 3 years ago by phobos

  • Owner set to phobos
  • Status changed from new to accepted

comment:9 in reply to: ↑ 6 Changed 3 years ago by phobos

Replying to atagar:

DocTor's consensus health checker downloads both the vote and consensus from each authority, then runs a series of checks. This means 18 network status documents in memory at a time (2 documents x 9 authorities). This was fine for yatei, but cappadocicum only has 496 MB.

How much memory does it need?

comment:10 Changed 3 years ago by atagar

Unfortunately I'm not really sure. Mind if we try 2-4 GB?

comment:11 Changed 3 years ago by phobos

The VM is hosted in iceland, so we have to pay for it. Can we try 1 GB first?

comment:12 Changed 3 years ago by atagar

Sure.

comment:13 Changed 3 years ago by phobos

request to upgrade sent

comment:14 Changed 3 years ago by phobos

upgrade done.

comment:15 Changed 3 years ago by atagar

Thanks! Memory is indeed up...

atagar@cappadocicum:~$ free
             total       used       free     shared    buffers     cached
Mem:       1026504     492732     533772          0     100312     229212
-/+ buffers/cache:     163208     863296
Swap:       724988          0     724988

I gave our consensus tracker a whirl and it worked great (actually, barely dipped into swap which I found surprising). We should be good to go - I disabled DocTor's cron on yatei and reactivated it on cappadocicum.

Please wait a couple days so I can be sure DocTor is chugging along happily on cappadocicum. If I don't report any further issues then we can move forward with removing my access to metrics/yatei and cleaning DocTor off it.

Cheers! -Damian

comment:16 Changed 3 years ago by phobos

  • Resolution set to implemented
  • Status changed from accepted to closed

Sounds like everything is working. great.

Note: See TracTickets for help on using tickets.