Opened 8 years ago

Closed 5 years ago

#3835 closed defect (user disappeared)

Scanners on bwscan+moria sometimes don't make progress

Reported by: mikeperry Owned by: aagbsn
Priority: High Milestone:
Component: Core Tor/Torflow Version:
Severity: Keywords:
Cc: aagbsn@…, arma Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Scanner 3 is still stale.

WARN[Sat Aug 27 22:40:12 2011]:Bandwidth scanner scanner.3 stale. Possible dead bwauthority.py. Timestamp: Wed Aug 24 17:11:27

Note the bwscan vm is running sqlalchemy 0.5.6 still. Perhaps we should just upgrade to 0.7.1 and restart?

Child Tickets

Change History (6)

comment:1 Changed 8 years ago by aagbsn

Cc: aagbsn@… added

Upgrading to 0.7.x is probably not a bad idea, but I am confused -- the scanner seems to be making progress:
{{
DEBUG[Sat Aug 27 10:45:39 2011]:Starting slice number 3
DEBUG[Sat Aug 27 11:11:49 2011]:Starting slice number 4
DEBUG[Sat Aug 27 12:15:31 2011]:Starting slice number 5
DEBUG[Sat Aug 27 13:19:29 2011]:Starting slice number 6
DEBUG[Sat Aug 27 14:26:42 2011]:Starting slice number 7
DEBUG[Sat Aug 27 15:04:15 2011]:Starting slice number 8
DEBUG[Sat Aug 27 15:36:56 2011]:Starting slice number 9
DEBUG[Sat Aug 27 16:23:38 2011]:Starting slice number 10
DEBUG[Sat Aug 27 16:49:48 2011]:Starting slice number 11
DEBUG[Sat Aug 27 17:47:58 2011]:Starting slice number 12
DEBUG[Sat Aug 27 18:29:50 2011]:Starting slice number 0
DEBUG[Sat Aug 27 19:43:41 2011]:Starting slice number 1
}}

I also checked through the logs and did not see anything unusual (did not appear stalled). Do we have any other theory as to what is happening?

comment:2 Changed 8 years ago by mikeperry

No idea. Possibly it stalled our for a bit and then started going again? I have not gotten any mails lately, and the ones I did get were intermittent.

comment:3 Changed 8 years ago by mikeperry

Cc: arma added
Priority: normalmajor

This could be caused by the new ratio grouping (#3444). moria is also not making progress on occasion, despite taking continued measurements and not crashing.

Basically, grouping by ratio may cause higher churn in the slices, since it depends on both measured value and observed value. If the churn rate is faster than slice progress, no progress will be made.

I also flipped FetchDirInfoEarly and FetchDirInfoExtraEarly in c86c1e134ae6d0b50b8b5344b1a25b01b4366302. Since they will cause consensus fetches every hour, this may be even worse.

We could make the slices smaller to ensure completion, but I think nothing currently ensures that we will get at least one exit node in a slice. Perhaps we should still make them smaller, and also fix the selection to ensure exits anyway...

comment:4 Changed 8 years ago by mikeperry

Summary: Scanner on bwscan vm is not making progress.Scanners on bwscan+moria sometimes don't make progress

comment:5 Changed 8 years ago by arma

My cron has stopped mailing me complaints like this. I'm not sure if that means they're still happening and I've stopped being told, or if they magically disappeared.

comment:6 Changed 5 years ago by arma

Resolution: user disappeared
Status: newclosed

No clue if the underlying bug was fixed, but anyway the symptom is three years old now. Closing.

Note: See TracTickets for help on using tickets.