best_priority() tries to measure unmeasured and failing relays first.
But if fraction_relays or min_relays always fail, those relays will always end up first in the priority queue. (More precisely, those relays will end up first in the priority queue, until the results of the good relays time out are discarded for being too old.)
Thinking about starvation is complicated, because of the freshness_reduction_factor on some errors.
Here's a very simple algorithm that avoids starving good relays for failed relays:
Count the number of times that sbws has attempted to get a result from each relay.
Test the relays with the lowest number of attempts first. (Don't check if the attempt succeeded or failed.)
For this priority rule to work, every time a relay is queued, it must get a result. Here's how we can make that happen"
Modify result_putter_error() to store an error result to the queue.
Make sure timeouts store an error result to the queue.
Add a unit test and integration test that makes sure every queued relay has a result.
Here's an alternative that might be simpler to implement:
before a relay is queued using pool.apply_async() in run_speedtest(), store a ResultAttempt to the queue
only count ResultAttempts when prioritising relays
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Trac: Description: best_priority() tries to measure unmeasured and failing relays first.
But if fraction_relays or min_relays always fail, those relays will always end up first in the priority queue. (More precisely, those relays will end up first in the priority queue, until the results of the good relays time out.)
Thinking about starvation is complicated, because of the freshness_reduction_factor on some errors.
Here's a very simple algorithm that avoids starving good relays for failed relays:
Count the number of times that sbws has attempted to get a result from each relay.
Test the relays with the lowest number of attempts first. (Don't check if the attempt succeeded or failed.)
For this priority rule to work, every time a relay is queued, it must get a result. Here's how we can make that happen"
Modify result_putter_error() to store an error result to the queue.
Make sure timeouts store an error result to the queue.
Add a unit test and integration test that makes sure every queued relay has a result.
Here's an alternative that might be simpler to implement:
before a relay is queued using pool.apply_async() in run_speedtest(), store a ResultAttempt to the queue
only count ResultAttempts when prioritising relays
to
best_priority() tries to measure unmeasured and failing relays first.
But if fraction_relays or min_relays always fail, those relays will always end up first in the priority queue. (More precisely, those relays will end up first in the priority queue, until the results of the good relays time out are discarded for being too old.)
Thinking about starvation is complicated, because of the freshness_reduction_factor on some errors.
Here's a very simple algorithm that avoids starving good relays for failed relays:
Count the number of times that sbws has attempted to get a result from each relay.
Test the relays with the lowest number of attempts first. (Don't check if the attempt succeeded or failed.)
For this priority rule to work, every time a relay is queued, it must get a result. Here's how we can make that happen"
Modify result_putter_error() to store an error result to the queue.
Make sure timeouts store an error result to the queue.
Add a unit test and integration test that makes sure every queued relay has a result.
Here's an alternative that might be simpler to implement:
before a relay is queued using pool.apply_async() in run_speedtest(), store a ResultAttempt to the queue
only count ResultAttempts when prioritising relays
There's one problem with this scheme: if many new relays join the network every hour, then they will starve older relays. But that's a problem for the bad relays people, not sbws.
To avoid this problem, we could have two queues/pools: one for unmeasured relays, and one for measured relays. (Torflow does something like this, by having ~8 measured partitions, and an unmeasured partition.)
best_priority() tries to measure unmeasured and failing relays first.
But if fraction_relays or min_relays always fail, those relays will always end up first in the priority queue. (More precisely, those relays will end up first in the priority queue, until the results of the good relays time out are discarded for being too old.)
Thinking about starvation is complicated, because of the freshness_reduction_factor on some errors.
Here's a very simple algorithm that avoids starving good relays for failed relays:
Count the number of times that sbws has attempted to get a result from each relay.
This is already done when writing the results: ResultError and ResultSuccess keep that.
Test the relays with the lowest number of attempts first. (Don't check if the attempt succeeded or failed.)
This's what i was proposing by commenting the part where it prioritizes ResultError measurements.
For this priority rule to work, every time a relay is queued, it must get a result. Here's how we can make that happen"
Modify result_putter_error() to store an error result to the queue.
result_putter already writes ResultError.
Here there're two other bugs, result_putter_error, only happens when:
The relay being measured, doesn't have a descriptor (#28870 (moved))
best_priority() tries to measure unmeasured and failing relays first.
But if fraction_relays or min_relays always fail, those relays will always end up first in the priority queue. (More precisely, those relays will end up first in the priority queue, until the results of the good relays time out are discarded for being too old.)
Thinking about starvation is complicated, because of the freshness_reduction_factor on some errors.
Here's a very simple algorithm that avoids starving good relays for failed relays:
Count the number of times that sbws has attempted to get a result from each relay.
This is already done when writing the results: ResultError and ResultSuccess keep that.
But some failures do not write a ResultError.
Test the relays with the lowest number of attempts first. (Don't check if the attempt succeeded or failed.)
This's what i was proposing by commenting the part where it prioritizes ResultError measurements.
I don't understand what you mean here.
Can you link to the comment, or quote it?
For this priority rule to work, every time a relay is queued, it must get a result. Here's how we can make that happen"
Modify result_putter_error() to store an error result to the queue.
result_putter already writes ResultError.
But result_putter_error() is called when there is an exception in apply_async(), and it does not write ResultError.
Here there're two other bugs, result_putter_error, only happens when:
The relay being measured, doesn't have a descriptor (#28870 (moved))
Here's a very simple algorithm that avoids starving good relays for failed relays:
Count the number of times that sbws has attempted to get a result from each relay.
This is already done when writing the results: ResultError and ResultSuccess keep that.
But some failures do not write a ResultError.
Test the relays with the lowest number of attempts first. (Don't check if the attempt succeeded or failed.)
This's what i was proposing by commenting the part where it prioritizes ResultError measurements.
I don't understand what you mean here.
Can you link to the comment, or quote it?
sorry, i don't remember now where i said that, but i think i missunderstand you.
I think this adds more complexity but might help to get more eligible relays.
What if we open a new ticket for that?
For this priority rule to work, every time a relay is queued, it must get a result. Here's how we can make that happen"
Modify result_putter_error() to store an error result to the queue.
result_putter already writes ResultError.
But result_putter_error() is called when there is an exception in apply_async(), and it does not write ResultError.
Ah, i get you now, you're right.
This might need some more changes.
What if we also open a new ticket for this?.
I created two children tickets, but there're still more things in the ticket description that i didn't implemented in the PR https://github.com/torproject/sbws/pull/328.
What i implemented was basically not prioritizing relays that failed to be measured, which is one of the two things (the other is #28897 (moved)) i believe makes sbws stall.
Setting to needs_review again.
The code in PR328 looks reasonable to me. I added a minor comment to a boolean expression, but nothing blocking.
I still feel that I don't know the codebase well enough to say if changes are net positive/negative for the overall codebase, but I trust juga to get those details right.