Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#3189 closed defect (fixed)

Find out why bwscanners break after a few days/weeks of operation

Reported by: karsten Owned by: mikeperry
Priority: Medium Milestone:
Component: Core Tor/Torflow Version:
Severity: Keywords:
Cc: aagbsn@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

All four bwscanners on gabelmoo broke at different times in the past two weeks. When running cron.sh manually, I get:

WARN[Sun May 15 18:12:49 2011]:Bandwidth scanner scanner.1 stale. Possible dead bwauthority.py. Timestamp: Mon May  2 23:10:20 2011
WARN[Sun May 15 18:12:49 2011]:Bandwidth scanner scanner.3 stale. Possible dead bwauthority.py. Timestamp: Tue May 10 21:26:10 2011
WARN[Sun May 15 18:12:49 2011]:Bandwidth scanner scanner.2 stale. Possible dead bwauthority.py. Timestamp: Wed May  4 18:13:06 2011
WARN[Sun May 15 18:12:49 2011]:Bandwidth scanner scanner.4 stale. Possible dead bwauthority.py. Timestamp: Wed May 11 02:38:38 2011

Other than that, the logs look pretty normal to me. But maybe that's because Tor was only logging on notice level.

After staring at logs for an hour or two, I changed Tor logging to debug and redirected all of bwscanner's cron output to files. Depending on available disk space, I might reduce Tor logging to info soon.

Hopefully the logs will tell us something about the problem here. The last time I ran into this problem, Mike suggested to just restart the scanners which I did. And Andrew says he's restarting his bwscanners every three days to avoid them breaking. I'd like to find out why they are breaking and get that fixed.

Child Tickets

Change History (3)

comment:1 Changed 8 years ago by mikeperry

Aaron discovered #2947 while working on some tickets for upgrading the bw scanners to use newer sqlalchemy and also to fix the postgre and mysql backends. It seems as though sqlalchemy doesn't like you to clear all tables/objects easily without leaking some data.

I think that is the most likely culprit. He has been working on a fix for it and is currently testing it on his personal machine.

comment:2 Changed 8 years ago by aagbsn

Cc: aagbsn@… added
Resolution: fixed
Status: newclosed

#2947 outlines the approach taken to resolve this issue. The fixes have been merged. Can we close this ticket?

comment:3 Changed 8 years ago by mikeperry

Yes.

Note: See TracTickets for help on using tickets.