Opened 6 months ago

Closed 2 months ago

#28652 closed defect (implemented)

When sbws stops making progress, log a warning

Reported by: teor Owned by: juga
Priority: Medium Milestone: sbws: 1.1.x-final
Component: Core Tor/sbws Version:
Severity: Normal Keywords: no-changes-version
Cc: juga, teor Actual Points:
Parent ID: #28547 Points: 1
Reviewer: asn Sponsor:

Description

In #28639, sbws stopped making progress, but didn't log anything.

sbws should detect when its progress rate falls beneath some minimum (for example, all relays measured min_result times in data_period days), and log a warning.

Child Tickets

Change History (11)

comment:1 Changed 5 months ago by juga

#28932 does not do what described here, since it logs the stack after 3 minutes not finishing with the results to start a new loop, but it's similar.
I think that sbws stalling (#28663) happens after the best_priority loop (potentially solved in #28897), maybe while checking the destination again (potentially solved in #28868), so this wouldn't be needed.
It still might be useful to implement this ticket.

comment:2 Changed 3 months ago by juga

Keywords: no-changes-version added

Add keyword to help planify releases/milestones.
Tickets that doesn't imply a change of version are tickets which do not affect the code (docs, tests) and some time of refactors.

comment:3 Changed 3 months ago by juga

Points: 1

Add points

comment:4 Changed 2 months ago by juga

Owner: set to juga
Status: newassigned

comment:5 Changed 2 months ago by juga

Status: assignedneeds_revision

https://github.com/torproject/sbws/pull/349

(In revision until other tickets are merged).

comment:6 Changed 2 months ago by juga

Status: needs_revisionneeds_review

I thought it would be better to detect there is no progress measuring unique relays in the scanner, since there we can keep track there of all the different relays seen and measured.

Also because if we try to detect it in the bandwidth file, it will continue making progress until the 5th day, but the number of relays reported will be approximately the same after 5 days. We also don't know in the bandwidth file whether the relays reported are different relays from the previous bandwidth files.

Note that the scanner is already either logging that some relays failed to be measured or stop, no matter whether the relays were unique or not.

https://github.com/torproject/sbws/pull/349

comment:7 Changed 2 months ago by asn

Reviewer: asn

comment:8 Changed 2 months ago by asn

Status: needs_reviewneeds_revision

Hey juga, did another review and left some comments on the PR. Also travis is failing, not sure what's up with that.

comment:9 Changed 2 months ago by juga

Status: needs_revisionneeds_review

should be fixed now

comment:10 in reply to:  9 Changed 2 months ago by asn

Status: needs_reviewmerge_ready

Replying to juga:

should be fixed now

Hm, better. And the test looks good.

I think there might be more stuff you can hide into the heartbeat module so that they are not exposed in the main loop. The goal is to simplify the main loop and hide the heartbeat details into the heartbeat module.

I did an implementation here: https://github.com/torproject/sbws/pull/354/commits/944bad5561dc92d04569ed06eceadd139315e0d4

However, I was not able to test it because my user file does not work anymore (something about country codes?), and I'm not sure how to run the unittest either.

If you like the code feel free to use it, otherwise, feel free to merge the current state of your branch. It's all good.

comment:11 Changed 2 months ago by juga

Resolution: implemented
Status: merge_readyclosed

i like the code, merged.

Note: See TracTickets for help on using tickets.