Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#2746 closed defect (fixed)

libevent2 problems on CCC tor servers

Reported by: kaner Owned by:
Priority: Medium Milestone: Tor: 0.2.2.x-final
Component: Core Tor/Tor Version: Tor: 0.2.2.22-alpha
Severity: Keywords: tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

dannenberg reports the following problems with
0.2.22-alpha2 and libevent2:

Mar 07 14:45:19.645 [notice] Circuit build measurement period of 79360ms is more than twice the maximum build time we have ever observed. Capping it to 34274ms.
Mar 08 12:25:51.578 [warn] Warning from libevent: kevent: Bad file descriptor
Mar 08 12:25:51.579 [err] libevent call with kqueue failed: Bad file descriptor [9]

Child Tickets

Change History (17)

comment:1 Changed 8 years ago by kaner

Andreas says that its actually not dannenberg, but tor10.anonymizer.ccc.de to tor13.anonymizer.ccc.de that were having this problem.

The system is a FreeBSD 8.1-RELEASE.

comment:2 Changed 8 years ago by kaner

Summary: libevent2 problems on dannenberglibevent2 problems on CCC tor servers

comment:3 Changed 8 years ago by arma

I can't tell from the log line if this is: A) a bug in libevent2, B) a bug in freebsd's kqueue implementation, or C) a resource limitation that can be fixed by better ulimit choices.

comment:4 Changed 8 years ago by Sebastian

Component: Tor Directory AuthorityTor Relay

Can we learn more about the environment where the relays are run?

comment:5 Changed 8 years ago by nickm

Is this really with .22-alpha? And which libevent2 version?

The error implies that kevent() in libevent's kqueue.c is giving an error "EBADF", which implies that we are getting a failure because "the specified file descriptor is invalid." (See kqueue.c , in function kq_dispatch.)

That's quite weird! If a particular file descriptor that we were passing to kevent were bad, we would (in theory) be getting that error passed back in events[i].flags, not set with an error when kevent returns. So this sounds to me as if freebsd were complaining that the kqueue fd (the one that was returned from kqueue()) was bad!

Possibilities that come to mind if that analysis is right:

  • Bug in freebsd. (not so likely -- the freebsd kqueue code is usually pretty solid)
  • Something closed the kqueue fd.
  • Something corrupted the libevent kqop structure, so that it's looking at the wrong fd.

Reasons why that analysis might be wrong.

  • Maybe I am misunderstanding why kevent() would give EBADF.
  • Maybe there is a bug in the error-reporting code, and the error with which kevent is failing is not really EBADF.

comment:6 in reply to:  5 Changed 8 years ago by rransom

Replying to nickm:

Is this really with .22-alpha?

Yes.

And which libevent2 version?

Most likely libevent 2.0.10-stable (the version installed by the current FreeBSD devel/libevent2 port).

comment:7 Changed 8 years ago by nickm

Milestone: Tor: 0.2.2.x-final

comment:8 Changed 8 years ago by nickm

Hm. We should think about next steps on this. First thing to do is write test code and see if we can make kevent() report EBADF in the cases we think it should (and not in the cases where we think it shouldn't). We should also have a quick look at the kernel in question to see if there are other circumstances where kevent() can return EBADF that appear from reading the source.

Once *that's* done, next step is to make absolutely sure that libevent is reporting this bug right. We should look at the code, and make sure it's obviously correct. Also we should have a quick gander at which patches, if any, freebsd applies to libevent.

And once *that's* done, if we want to go on the theory that something is closing the kqueue fd, let's see if we can instrument close() in libevent and tor such that every time it's about to close, it asserts that the fd is not the kqueue fd. That'd be a hack thatwe would never want to merge, but if the bug *does* turn out to be a bogus close, we'll get a stack trace to tell us where it is.

Progress!

comment:9 Changed 8 years ago by arma

Do we know how often the bug shows up?

comment:10 in reply to:  9 Changed 8 years ago by joehall

Hi, I'm seeing this on the Mac PowerPC Vidalia bundle ("0.2.2.24-alpha-0.2.12-ppc"):

Apr 29 12:19:52.613 [Notice] Bandwidth soft limit reached; commencing hibernation. No new conncetions will be accepted
Apr 29 12:19:53.740 [Notice] Going dormant. Blowing away remaining connections.
Apr 29 12:19:53.743 [Warning] Warning from libevent: kevent: Bad file descriptor
Apr 29 12:19:53.745 [Error] libevent call with kqueue failed: Bad file descriptor [9]

comment:11 Changed 8 years ago by nickm

Having it show up right after the "going dormant" message is mildly suggestive, I guess, but it doesn't seem to have done so in the orriginal report's case. Joehall -- how often does this happen for you?

comment:12 Changed 8 years ago by nickm

Oh hey, here's a crazy idea. In compat.c, there's a line "#undef DEBUG_SOCKET_COUNTING". If we turn that into "#define DEBUG_SOCKET_COUNTING", it's supposed to track which sockets we open and close and look for descrepencies. It's not perfect by any means, and it only looks at fds which it thinks represents sockets, but it might turn up something here.

(This is an experiment only for people who are running into the bug and who know how to edit C and build from source.)

comment:13 in reply to:  11 Changed 8 years ago by joehall

Replying to nickm:

Having it show up right after the "going dormant" message is mildly suggestive, I guess, but it doesn't seem to have done so in the orriginal report's case. Joehall -- how often does this happen for you?

Hi, this happens everyday on 0.2.2.24 and 0.2.2.25 alphas, ... going to go back to 23 unless I can be helpful. best, Joe

comment:14 Changed 8 years ago by nickm

oho. I was wrong about kevent()!

Ordinarily, if there's a bad fd or something in its input array, it reports the error in its output array of events... but if the output array isn't big enough, kevent will return -1.

Libevent's kqueue.c backend needs to be made aware of this. Looks like it'll be time for a libevent 2.0.12-stable sometime in the next week or two.

comment:15 Changed 8 years ago by nickm

Resolution: fixed
Status: newclosed

Libevent 2.0.12-stable should have a fix for this when it comes out. The relevant patch is 28317a0 , which should apply cleanly to 2.0.11-stable and probably 2.0.10-stable too.

comment:16 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:17 Changed 7 years ago by nickm

Component: Tor RelayTor
Note: See TracTickets for help on using tickets.