Opened 13 years ago

Last modified 7 years ago

#326 closed defect (Fixed)

New eventdns error messages w/svn 8289

Reported by: yancm Owned by: nickm
Priority: Low Milestone: 0.1.2.x-final
Component: Core Tor/Tor Version:
Severity: Keywords:
Cc: yancm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I just started getting the following error messages with the latest svn builds ~8287-8289(current):

Aug 29 02:20:50.964 [warn] eventdns: Nameserver 127.0.0.1 has failed: request timed out.
Aug 29 02:20:50.964 [warn] eventdns: All nameservers have failed
Aug 29 02:20:52.381 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 04:53:41.131 [warn] eventdns: Nameserver 127.0.0.1 has failed: Bad response 2
Aug 29 04:53:41.132 [warn] eventdns: All nameservers have failed
Aug 29 04:53:41.140 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 04:53:43.140 [warn] eventdns: Nameserver 127.0.0.1 has failed: Bad response 2
Aug 29 04:53:43.140 [warn] eventdns: All nameservers have failed
Aug 29 04:53:43.172 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 04:53:51.581 [warn] eventdns: Nameserver 127.0.0.1 has failed: request timed out.
Aug 29 04:53:51.581 [warn] eventdns: All nameservers have failed
Aug 29 04:53:51.886 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 04:53:54.591 [warn] eventdns: Nameserver 127.0.0.1 has failed: request timed out.
Aug 29 04:53:54.591 [warn] eventdns: All nameservers have failed
Aug 29 04:53:55.110 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 04:53:57.601 [warn] eventdns: Nameserver 127.0.0.1 has failed: request timed out.
Aug 29 04:53:57.602 [warn] eventdns: All nameservers have failed
Aug 29 04:53:57.886 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 06:21:49.752 [warn] eventdns: Nameserver 127.0.0.1 has failed: request timed out.
Aug 29 06:21:49.752 [warn] eventdns: All nameservers have failed
Aug 29 06:21:56.882 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 07:33:48.065 [warn] eventdns: Nameserver 127.0.0.1 has failed: Bad response 2
Aug 29 07:33:48.066 [warn] eventdns: All nameservers have failed
Aug 29 07:33:49.163 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 07:33:51.873 [warn] eventdns: Nameserver 127.0.0.1 has failed: Bad response 2
Aug 29 07:33:51.874 [warn] eventdns: All nameservers have failed
Aug 29 07:33:52.445 [warn] eventdns: Nameserver 127.0.0.1 is back up
Aug 29 07:33:53.723 [warn] eventdns: Nameserver 127.0.0.1 has failed: Bad response 2
Aug 29 07:33:53.723 [warn] eventdns: All nameservers have failed
Aug 29 07:33:53.827 [warn] eventdns: Nameserver 127.0.0.1 is back up

Is this expected?

I'm on NetBSD 3_Stable

[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (10)

comment:1 Changed 13 years ago by nickm

A big pile of people are seeing this problem with eventdns; I think the timeout threshold
is set too low, and the threshold for declaring a server down is set too low. Also, it's
a little silly do declare a server down if we only have one server, since we then immediately
(I think) declare it up again and try again.

comment:2 Changed 13 years ago by Maschi

This bug still occurs in version 0.1.2.3-alpha on SuSE 10.0.

comment:3 Changed 13 years ago by nickm

Fabian Keil's diagnosis in his mail to or-talk on 26 October was, I think, completely right.
I've checked a fix into svn as of r9054: if the warnings go away for me, I'll close this bug.
Please re-open it if they don't go away for with this change.

(Note to self: send r9054 to Niels for inclusion in libevent.)

comment:4 Changed 13 years ago by nickm

The warnings are still appearing, even with this fix, but the fix has slowed them down by a lot.

Possibilities:

  • Maybe there are more circumstances where we need to add ns->timedout = 0;
  • Maybe the timeout is still too low
  • Maybe we should require a larger number of timeouts in a row before we decide a nameserver is dead.
  • Maybe when a nameserver times out but somes back quickly, we should raise our timeout threshold automatically.
  • Maybe we should just never allow our sole nameserver to die.
  • Maybe we should warn less.
    • Maybe we shouldn't warn until the 2nd retry of the nameserver gets no answer

comment:5 Changed 13 years ago by nickm

Okay, I've downgraded single-nameserver failures to INFO...
...and raised the failure threshold when there's only one nameserver.

I'll close this bug if these fixes work for me.

comment:6 Changed 12 years ago by nickm

Peacetime still gives me:
Feb 21 09:47:42.648 [warn] eventdns: All nameservers have failed
Feb 21 09:47:52.739 [notice] eventdns: Nameserver 18.244.0.188 is back up
Feb 21 09:47:57.647 [warn] eventdns: All nameservers have failed
Feb 21 09:47:59.597 [notice] eventdns: Nameserver 18.244.0.188 is back up
Feb 22 09:12:53.286 [warn] eventdns: All nameservers have failed
Feb 22 09:13:03.362 [notice] eventdns: Nameserver 18.244.0.188 is back up
Feb 23 00:58:05.179 [warn] eventdns: All nameservers have failed
Feb 23 00:58:12.252 [notice] eventdns: Nameserver 18.244.0.188 is back up
Feb 23 00:58:43.267 [warn] eventdns: All nameservers have failed
Feb 23 00:58:48.232 [notice] eventdns: Nameserver 18.244.0.188 is back up
Feb 23 06:04:29.378 [warn] eventdns: All nameservers have failed
Feb 23 06:04:39.414 [notice] eventdns: Nameserver 18.244.0.188 is back up
Feb 23 19:45:47.206 [warn] eventdns: All nameservers have failed
Feb 23 19:45:53.296 [notice] eventdns: Nameserver 18.244.0.188 is back up
Feb 23 19:46:07.923 [warn] eventdns: All nameservers have failed
Feb 23 19:46:17.938 [notice] eventdns: Nameserver 18.244.0.188 is back up

This is too annoying. It must get fixed.

comment:7 Changed 12 years ago by nickm

Okay, it looks like rcode 2 is the culprit. I checked in a patch that will (if I did it right)
treat rcode 2 as "try later," not "nameserver is dead."

Running on peacetime; going to see if it explodes.

comment:8 Changed 12 years ago by nickm

my fix seems to help, but I'm still worried about causation and whether it's doing the right thing. these
serverfailed messages appear with really unpleasant clustering.

Example:

Feb 28 23:06:20.244 [info] eventdns: Removing timeout for request f1b2c0
Feb 28 23:06:22.779 [info] eventdns: Got a SERVERFAILED from nameserver 18.244.0
.188; will allow the request to time out.
Feb 28 23:06:22.779 [info] eventdns: Removing timeout for request 5fd380
Feb 28 23:06:22.779 [info] eventdns: Got a SERVERFAILED from nameserver 18.244.0
.188; will allow the request to time out.
Feb 28 23:06:22.779 [info] eventdns: Removing timeout for request 116c5f0
Feb 28 23:06:22.779 [info] eventdns: Got a SERVERFAILED from nameserver 18.244.0
.188; will allow the request to time out.
Feb 28 23:06:22.779 [info] eventdns: Removing timeout for request 895e60
Feb 28 23:06:22.779 [info] eventdns: Got a SERVERFAILED from nameserver 18.244.0
.188; will allow the request to time out.
Feb 28 23:06:22.779 [info] eventdns: Removing timeout for request 9726d0
Feb 28 23:06:22.779 [info] eventdns: Got a SERVERFAILED from nameserver 18.244.0
.188; will allow the request to time out.

comment:9 Changed 12 years ago by nickm

flyspray2trac: bug closed.
Seems to be fixed in 0.1.2.13 and in svn.

comment:10 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.