Recursion limit problem in TaskManager
This issue was automatically migrated from github issue https://github.com/TheTorProject/ooni-probe/issues/296.
The task manager is currently designed to recursively call _fillSlots
that will then call _run
that then again will call _fillSlots
on success or failure. This means that when there are a lot of tasks failing very quickly it is very likely that the default python recursion limit will be overcome (1000).
To reproduce this bug you can try and run a test with a long invalid input for example http_requests:
ooniprobe blocking/http_requests -f data/complete.deck
Note that the fact that this test fails is correct, however it fails in a surprising manner:
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
File "/ooni-probe/ooni/managers.py", line 153, in _failed
super(LinkedTaskManager, self)._failed(result, task)
File "/ooni-probe/ooni/managers.py", line 44, in _failed
task.done.errback(failure)
File "/.virtualenvs/ooni-probe/lib/python2.7/site-packages/twisted/internet/defer.py", line 423, in errback
self._startRunCallbacks(fail)
File "/.virtualenvs/ooni-probe/lib/python2.7/site-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/.virtualenvs/ooni-probe/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/ooni-probe/ooni/director.py", line 188, in measurementFailed
log.msg("Failed doing measurement: %s" % measurement)
File "/ooni-probe/ooni/utils/log.py", line 62, in msg
print "%s" % msg
File "/.virtualenvs/ooni-probe/lib/python2.7/site-packages/twisted/python/log.py", line 505, in write
msg(message, printed=1, isError=self.isError)
File "/.virtualenvs/ooni-probe/lib/python2.7/site-packages/twisted/python/threadable.py", line 53, in sync
return function(self, *args, **kwargs)
File "/.virtualenvs/ooni-probe/lib/python2.7/site-packages/twisted/python/log.py", line 185, in msg
actualEventDict = (context.get(ILogContext) or {}).copy()
File "/.virtualenvs/ooni-probe/lib/python2.7/site-packages/twisted/python/context.py", line 121, in getContext
return self.currentContext().getContext(key, default)
exceptions.RuntimeError: maximum recursion depth exceeded
I think this bug is perhaps a good opportunity to discuss some possible refactoring of the task scheduler related code. It may be a good idea to draw some inspiration from: https://github.com/terrycojones/txrdq