#23696 closed defect (fixed)

Bug: scheduler_kist.c:520: kist_scheduler_schedule: Non-fatal assertion !((diff < 0)) failed.

Reported by: cypherpunks Owned by: dgoulet
Priority: Medium Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version: Tor: 0.3.2.1-alpha
Severity: Normal Keywords: tor-client, tor-sched, 0.3.2.2-alpha-must
Cc: pastly Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Just upgraded to TBB 7.5a5 on Windows 10 and at startup:

Tor WARN: tor_bug_occurred_(): Bug: scheduler_kist.c:520: kist_scheduler_schedule: Non-fatal assertion !((diff < 0)) failed. (Future instances of this warning will be silenced.) (on Tor 0.3.2.1-alpha ) 
Tor WARN: Bug: Non-fatal assertion !((diff < 0)) failed in kist_scheduler_schedule at scheduler_kist.c:520. (Stack trace not available) (on Tor 0.3.2.1-alpha )

Child Tickets

Change History (19)

comment:1 Changed 15 months ago by arma

Version: Tor: 0.3.2.1-alpha

comment:2 Changed 15 months ago by arma

Cc: pastly added

Question for pastly:

A) This is on Windows -- is the kist scheduler supposed to be picked here? Is it supposed to be kistlite or something? (I still haven't read all the code.)

Question for network team person:

B) Is the monotonic time thing supposed to work on Windows too?

comment:3 Changed 15 months ago by cypherpunks

After third restart it changed to

Tor NOTICE: Scheduler type KISTLite has been enabled. 

comment:4 in reply to:  2 Changed 15 months ago by pastly

Replying to arma:

Question for pastly:

A) This is on Windows -- is the kist scheduler supposed to be picked here? Is it supposed to be kistlite or something? (I still haven't read all the code.)

I'm guessing KISTLite was used from the beginning. They both use the same code, except for ~1 function (not this one).

If Tor switched during runtime, it should have logged the switch and probably a reason why.

Regardless, I'm betting this is a failure of monotime actually being monotonic. Or ... if this was literally right away on startup and if we can't assume statically-allocated memory on Windows is initialized to 0, then it's possible scheduler_last_run was garbage and larger than now.

comment:5 Changed 15 months ago by pastly

Keywords: tor-sched added

comment:6 Changed 15 months ago by pastly

Keywords: easy added

nickm says we should just init scheduler_last_run in the scheduler init function, and init it to now.

My only protest is that we didn't just run and will have to wait 10ms in order to actually do so for the first time, but that probably doesn't really matter.

comment:7 in reply to:  6 Changed 15 months ago by dgoulet

Replying to pastly:

nickm says we should just init scheduler_last_run in the scheduler init function, and init it to now.

My only protest is that we didn't just run and will have to wait 10ms in order to actually do so for the first time, but that probably doesn't really matter.

Yeah impact should be very minimal but the other option is now - 1 ;)

comment:8 Changed 15 months ago by dgoulet

Keywords: 0.3.2.2-alpha-must added

I'm arguing that this MUST be in the next alpha. Trivial fix and would be good to avoid warnings for another 2 or 3 weeks ;).

comment:9 Changed 15 months ago by dgoulet

Status: newneeds_review

See branch: ticket23696_032_01.

comment:10 Changed 15 months ago by nickm

Status: needs_reviewmerge_ready

This seems okay to me. I'll merge it once isis/ahf say "go"

comment:11 in reply to:  10 Changed 15 months ago by isis

Replying to nickm:

This seems okay to me. I'll merge it once isis/ahf say "go"


go

comment:12 Changed 15 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

merged

comment:13 Changed 13 months ago by cypherpunks

Resolution: fixed
Status: closedreopened

After hibernation on Windows 7:

Tor WARN: tor_bug_occurred_(): Bug: scheduler_kist.c:530: kist_scheduler_schedule: Non-fatal assertion !((diff < 0)) failed. (Future instances of this warning will be silenced.) (on Tor 0.3.2.4-alpha ) 
Tor WARN: Bug: Non-fatal assertion !((diff < 0)) failed in kist_scheduler_schedule at scheduler_kist.c:530. (Stack trace not available) (on Tor 0.3.2.4-alpha ) 

comment:14 in reply to:  13 ; Changed 12 months ago by dgoulet

Status: reopenedneeds_information

Replying to cypherpunks:

After hibernation on Windows 7:

But before that, it was running properly right? So it basically was running then hibernation and then came back and boom?

Do you recall the length of time that it was in hibernation? Was it in the range of minutes or days?

Tor WARN: tor_bug_occurred_(): Bug: scheduler_kist.c:530: kist_scheduler_schedule: Non-fatal assertion !((diff < 0)) failed. (Future instances of this warning will be silenced.) (on Tor 0.3.2.4-alpha ) 
Tor WARN: Bug: Non-fatal assertion !((diff < 0)) failed in kist_scheduler_schedule at scheduler_kist.c:530. (Stack trace not available) (on Tor 0.3.2.4-alpha ) 

This is very surprising, it basically means that we got the last scheduler run in the future from "now" but both should be monotonic. Even if the last run is 0, the diff should be positive...

Looking at our Windows compat call for monotonic_get(), I see this comment:

  /* Alas, QueryPerformanceCounter is not always monotonic: see bug list at

    https://www.python.org/dev/peps/pep-0418/#windows-queryperformancecounter
   */

Which means that I would assume here we aren't getting a monotonic time at all and this warning is printed. Fortunately, this is harmless because we adjust the diff value back to 0 in this case and the scheduler will become active which is what we want.

Considering that we can't guarantee monotonic time it seems, we should probably remove the IF_BUG_ONCE()... Or make it log_info() maybe...

comment:15 in reply to:  14 Changed 12 months ago by cypherpunks

Keywords: easy removed
Status: needs_informationnew

Replying to dgoulet:

Replying to cypherpunks:

After hibernation on Windows 7:

But before that, it was running properly right?

Visually - yes, but who knows ;)

So it basically was running then hibernation and then came back and boom?

No boom, just logs :)

Do you recall the length of time that it was in hibernation?

No. Is it meaningful?

Was it in the range of minutes or days?

Hours.
Here is a new sample:

Tor NOTICE: Your system clock just jumped 47582 seconds forward; assuming established circuits no longer work. 
Tor NOTICE: Heartbeat: Tor's uptime is 1 day 4:22 hours, with 0 circuits open. I've sent 10.84 MB and received 82.03 MB. 
Tor NOTICE: Average packaged cell fullness: 57.419%. TLS write overhead: 4% 
[12-04 07:00:02] Torbutton INFO: tor catchall circuit has been dirty for over 10 minutes. Rotating.
[12-04 07:00:02] Torbutton INFO: New domain isolation for --unknown--: c17295dbb878f7b7cdeacddaa85791f2
[12-04 07:00:02] Torbutton INFO: tor SOCKS: https://aus1.torproject.org/torbrowser/update_3/alpha/WINNT_x86-gcc3-x86/7.5a8/en-US via
                       --unknown--:c17295dbb878f7b7cdeacddaa85791f2
Tor WARN: Failed to find node for hop #1 of our path. Discarding this circuit. 
[12-04 07:02:03] Torbutton INFO: controlPort >> 650 STREAM 2081 FAILED 0 aus1.torproject.org:443 REASON=TIMEOUT
[12-04 07:02:03] Torbutton INFO: controlPort >> 650 STREAM 2081 CLOSED 0 aus1.torproject.org:443 REASON=TIMEOUT
Tor NOTICE: Tried for 120 seconds to get a connection to [scrubbed]:443. Giving up. (waiting for circuit) 
Tor WARN: tor_bug_occurred_(): Bug: scheduler_kist.c:530: kist_scheduler_schedule: Non-fatal assertion !((diff < 0)) failed. (Future instances of this warning will be silenced.) (on Tor 0.3.2.4-alpha ) 
Tor WARN: Bug: Non-fatal assertion !((diff < 0)) failed in kist_scheduler_schedule at scheduler_kist.c:530. (Stack trace not available) (on Tor 0.3.2.4-alpha ) 
Tor NOTICE: Tor has successfully opened a circuit. Looks like client functionality is working. 

comment:16 Changed 12 months ago by dgoulet

This is kind of confusing because your clock jumps 47582 seconds forward which means that this condition:

  diff = monotime_diff_msec(&scheduler_last_run, &now);
  IF_BUG_ONCE(diff < 0) {

... now should be way long after scheduler_last_run but yet the diff is negative so somehow the monotime_get(&now) is getting something _before_.

I do think more and more that the solution is to downgrade the warning to info log level since it seems some platform can't guarantee monotonic time properly so we should expect it.

comment:17 Changed 12 months ago by dgoulet

Owner: set to dgoulet
Status: newaccepted

comment:18 Changed 12 months ago by dgoulet

Status: acceptedmerge_ready

pastly did agree on IRC to downgrade the warning to info.

Branch: bug23696_032_01

Patch is pretty trivial.

comment:19 Changed 12 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

merged

Note: See TracTickets for help on using tickets.