BwScanner does not clear stream data between slices

added component::core tor/torflow owner::mikeperry priority::medium resolution::fixed status::closed type::defect labels

https://github.com/aagbsn/torctl/tree/ticket2947

Trac:
Cc: N/A to aagbsn@extc.org

Replying to aagbsn:

https://github.com/aagbsn/torctl/tree/ticket2947

per commit msg: Tests show that SQLSupport.reset_all() may clear too much because if BwAuthority calls Scansupport.reset_stats() after each speedrace() run only the first slice is properly recorded; the rest are empty.

Update:

We suspected that this bug was the cause of BwAuthority's memory leak problems as items were not being cleared after each slice completed. However, after fixing this issue the memory leaks persisted.
mikeperry and I decided that refactoring BwAuthority as a parent-child pair of processes would ensure that memory leaks would not persist past each run. Basically, rather than 1 long-running process, we split bwauthority.py into a pair of processes: one parent process that was responsible for passing slice parameters to a child process that would actually scan that slice.
After refactoring BwAuthority, we discovered more issues -- in some cases circuit_status_events referenced Router objects that were not stored in the database (SQLAlchemy raised sqlalchemy.orm.exc.NoResultFound) even though update_consensus() and _update_db() had supposedly inserted the referenced Router objects. To restate: objects that were supposedly stored in the database were failing in queries only seconds later.
This issue happens pretty rarely; my best estimate is about 2 weeks or so...
This is likely caused by a race between the sessions bound to Elixir models and another shared session (tc_session).
I believe this race occurs because of misuse of SQLAlchemy scoped_sessions. I refactored BwAuthority to use local sessions for each function in torctl/SQLSupport.py that accesses the database, and call tc_session.remove() prior to returning, as is recommended here: http://www.sqlalchemy.org/docs/orm/session.html#contextual-thread-local-sessions . The effect should be to flush mapped objects from the local session to the database so that queries from the Elixir bound sessions will succeed.
Now we wait and see if this race condition persists :-(.

Fixes to bwscanner.py for ensure that if any exceptions occur the slice will be restarted by the parent process.

See: commit 37668610f78ac5b29ae399fac24ba70e2a2a643c

Update: This issue has been resolved.

Trac:
Resolution: N/A to fixed
Status: new to closed

closed

mentioned in issue #3189 (moved)

mentioned in issue #3834 (moved)

moved to tpo/network-health/torflow#2947 (closed)

BwScanner does not clear stream data between slices

Child items ...

Activity