Opened 5 weeks ago

Closed 4 weeks ago

#24158 closed defect (fixed)

I get this error "Looks like our kernel doesn't have the support for KIST anymore." on my relay

Reported by: Dbryrtfbcbhgf Owned by: pastly
Priority: Medium Milestone: Tor: 0.3.2.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-sched
Cc: Actual Points:
Parent ID: Points:
Reviewer: dgoulet Sponsor:

Description

Im running a tor relay, running tor-0.3.2.3-alpha and I keep getting this error.
[notice] Looks like our kernel doesn't have the support for KIST anymore. We will fallback to the naive approach. Remove KIST from the Schedulers list to disable.

Relay kernel, GNU/Linux 2.6.32-042stab111.12 x86_64

Child Tickets

Change History (19)

comment:1 Changed 5 weeks ago by Dbryrtfbcbhgf

In future messages I get.

Scheduler type KIST has been disabled by the consensus or no kernel support.
[notice] Scheduler type KISTLite has been enabled.

comment:2 Changed 5 weeks ago by Sebastian

Why are these notice messages? If the user hasn't explicitly configured anything wrt KIST, I don't think *any* of those should be logged at notice.

Also, while this is fixed, please add a better explanation to the manpage, it currently says:

     **Vanilla**: The scheduler that tor has always used that is do as much as
     possible or AMAP.

I have no idea what AMAP is and the term is also not explained in the manpage anywhere else. It doesn't even occur again in the entirety of the tor source code.

comment:3 Changed 5 weeks ago by Sebastian

Milestone: Tor: 0.3.2.x-final

comment:4 in reply to:  2 Changed 5 weeks ago by cypherpunks

Replying to Sebastian:

I have no idea what AMAP is and the term is also not explained in the manpage anywhere else. It doesn't even occur again in the entirety of the tor source code.

AMAP = As Much As Possible. e.g.:

https://i.imgur.com/L3YYCE0.gif

Last edited 5 weeks ago by cypherpunks (previous) (diff)

comment:5 Changed 5 weeks ago by pastly

Keywords: tor-sched added

Thank you for running alpha Tor and reporting this.

I keep getting this error.
[notice]

It's not an error. It's a friendly FYI.

Dbryrtfbcbhgf: do you get these messages almost immediately when Tor starts or a few minutes/hours/++ later? My thinking is this: if it happens immediately, then that's a good argument for moving these messages into info as there may be plenty more relays that run on these rather old (but stable and bug fixed, right?) kernels. If it happens later, it isn't as good as an argument IMHO.

Some open questions: is it important for people to know what scheduler they are using? What if they changed Schedulers in their torrc? What if their kernel suddenly stopped supporting a scheduler and they changed?

In general I think it is important for people to be able to tell what scheduler their relay is using. Maybe these fallback messages are too error-looking though. And maybe if the user didn't set Schedulers, they really don't care and we shouldn't log anything. Do we tell users that they're using EWMA?

So my proposed solution is to log this stuff at info by default, but if the user has set Schedulers, then to log it at notice.

Sebastian: I'm making the manual a bit better. I think you're right: no one needs to know the acronym "AMAP".

See my ticket24158 on gitweb.tpo for the manual language change.

comment:6 Changed 5 weeks ago by dgoulet

Status: newneeds_revision

@Dbryrtfbcbhgf: The notice shouldn't be logged more than once after KIST has been disabled. So if this log is being seen over and over (probably at each consensus), we have a bug. Are you seeing it constantly? If yes, what time rate?

[notice] Looks like our kernel doesn't have the support for KIST anymore. We will fallback to the naive approach. Remove KIST from the Schedulers list to disable.

kist_no_kernel_support is set to 1 in the case of no support for KIST so the fallback is KISTLite. That notice can't be logged after that unless that value changes back to 0 which it doesn't unless tor restarts.

I think that notice will be logged each time the conf changed or new consensus while having KIST disabled by no kernel support. We should probably log it once and only once.

        log_notice(LD_SCHED, "Scheduler type KIST has been disabled by "
                             "the consensus or no kernel support.");

With regards of logging the change even though the user never specified anything in Schedulers, I do think we should keep it because KIST is a default and if there was a fallback, user should get informed. This is especially important on non-Linux system where KIST is not supported which informs of the choice of Tor at startup. But it shouldn't be that loud especially after bootstrap.

@pastly: The man page change lgtm *except* that I wouldn't put and doesn't prioritize very well. It is good to be honest but then also it is kind of weird to document how shitty it is in the manual page but still offer the option :P.

comment:7 in reply to:  6 Changed 5 weeks ago by pastly

Status: needs_revisionneeds_information

Replying to dgoulet:

@pastly: The man page change lgtm *except* that I wouldn't put and doesn't prioritize very well. It is good to be honest but then also it is kind of weird to document how shitty it is in the manual page but still offer the option :P.

Pushed a fixup commit.

I believe we are now waiting on information from Dbryrtfbcbhgf where in the logs he sees these messages.

  • Startup, or later?
  • Once (per time you start Tor) or periodically?
Last edited 5 weeks ago by pastly (previous) (diff)

comment:8 Changed 5 weeks ago by pastly

Replying to dgoulet:

I think that notice will be logged each time the conf changed or new consensus while having KIST disabled by no kernel support. We should probably log it once and only once.

log_notice(LD_SCHED, "Scheduler type KIST has been disabled by "
                     "the consensus or no kernel support.");

I'm skimmed the code a bit and I think you're right. Every consensus we probably log the above message, but only on the first consensus after having to fall back do we log Scheduler type KISTLite has been enabled.

comment:9 in reply to:  5 Changed 5 weeks ago by Dbryrtfbcbhgf

Replying to pastly:

Thank you for running alpha Tor and reporting this.

I keep getting this error.
[notice]

It's not an error. It's a friendly FYI.

Dbryrtfbcbhgf: do you get these messages almost immediately when Tor starts or a few minutes/hours/++ later? My thinking is this: if it happens immediately, then that's a good argument for moving these messages into info as there may be plenty more relays that run on these rather old (but stable and bug fixed, right?) kernels. If it happens later, it isn't as good as an argument IMHO.

Some open questions: is it important for people to know what scheduler they are using? What if they changed Schedulers in their torrc? What if their kernel suddenly stopped supporting a scheduler and they changed?

In general I think it is important for people to be able to tell what scheduler their relay is using. Maybe these fallback messages are too error-looking though. And maybe if the user didn't set Schedulers, they really don't care and we shouldn't log anything. Do we tell users that they're using EWMA?

So my proposed solution is to log this stuff at info by default, but if the user has set Schedulers, then to log it at notice.

Sebastian: I'm making the manual a bit better. I think you're right: no one needs to know the acronym "AMAP".

See my ticket24158 on gitweb.tpo for the manual language change.

I get the log messages within a few minutes after I start Tor on the relay

Last edited 5 weeks ago by Dbryrtfbcbhgf (previous) (diff)

comment:10 Changed 5 weeks ago by dgoulet

Owner: set to pastly
Status: needs_informationassigned

OK so only once.

@pastly, I think we should go for only printing the log_notice from above, can you do it with your current branch that changes the man page? Else, I'll get to it, just assigned the ticket to me.

comment:11 Changed 5 weeks ago by pastly

dgoulet: can you clarify what message(s) you want to keep and which you want to get rid of?

We have

Scheduler type %s has been enabled every time the scheduler switches cleanly. I think we should keep this as notice.

Looks like our kernel doesn't have the support for KIST every time we try to use KIST but have to fallback to KISTLite. Should happen at most once unless you play games with your torrc and HUPing. I think we should keep this as notice (maybe even warn, but we don't want it to be too scary since it may not indicate a problem).

Scheduler type KIST has been disabled by the consensus or no kernel support which gets called every time we call select_scheduler(), have KIST in Schedulers, but can't use it. I think this should be moved to info or removed.

I will gladly take care of this.

comment:12 in reply to:  11 Changed 5 weeks ago by dgoulet

Replying to pastly:

dgoulet: can you clarify what message(s) you want to keep and which you want to get rid of?

We have

Scheduler type %s has been enabled every time the scheduler switches cleanly. I think we should keep this as notice.

Yes we keep this one.

Looks like our kernel doesn't have the support for KIST every time we try to use KIST but have to fallback to KISTLite. Should happen at most once unless you play games with your torrc and HUPing. I think we should keep this as notice (maybe even warn, but we don't want it to be too scary since it may not indicate a problem).

Yes keep this one.

Scheduler type KIST has been disabled by the consensus or no kernel support which gets called every time we call select_scheduler(), have KIST in Schedulers, but can't use it. I think this should be moved to info or removed.

This one I propose we keep it notice but print _once_ and after that we do not. However, in the case of no kernel support, this is detected at runtime and we do log notice about it when detected so we don't need it. Thus the only way to transition out of KIST while having KIST enabled in Schedulers then is through the consensus. We should log notice that it happened but only once, not everytime we get a consensus. And if we ever transition back to KIST after a while, we'll get the "Scheduler type %s has been enabled" so we'll be back on track.

I will gladly take care of this.

Makes sense to you?

comment:13 Changed 4 weeks ago by pastly

Status: assignedneeds_review

See ticket24158_01 for the documentation change and the change to logging "Scheduler type KIST has been disabled by the consensus or no kernel support" only once. Rebased on master!

comment:14 in reply to:  13 ; Changed 4 weeks ago by dgoulet

Reviewer: dgoulet
Status: needs_reviewneeds_information

Replying to pastly:

See ticket24158_01 for the documentation change and the change to logging "Scheduler type KIST has been disabled by the consensus or no kernel support" only once. Rebased on master!

Quick question. What if the consensus enables it back? And then days later disable it again? We won't have the logging. What if we set have_logged_kist_suddenly_disabled = 0 if we have KIST enabled and we can actually use it (the else of the if (!should use kist) )?

comment:15 in reply to:  14 Changed 4 weeks ago by pastly

Status: needs_informationneeds_review

Replying to dgoulet:

Replying to pastly:

See ticket24158_01 for the documentation change and the change to logging "Scheduler type KIST has been disabled by the consensus or no kernel support" only once. Rebased on master!

Quick question. What if the consensus enables it back? And then days later disable it again? We won't have the logging. What if we set have_logged_kist_suddenly_disabled = 0 if we have KIST enabled and we can actually use it (the else of the if (!should use kist) )?

I like the way you think.

Pushed another commit to that branch.

comment:16 Changed 4 weeks ago by dgoulet

Status: needs_reviewmerge_ready

lgtm;

comment:17 Changed 4 weeks ago by pastly

Status: merge_readyaccepted

Wait. Gotta sort out which ticket to modify the man page in.

comment:18 Changed 4 weeks ago by pastly

Status: acceptedneeds_review

Removed the man page changes. Squashed and rebased on master again. ticket24158_02

comment:19 Changed 4 weeks ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Cherry-picked to maint-0.3.2; merged forward.

Note: See TracTickets for help on using tickets.