Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#2933 closed defect (fixed)

Error from libevent: evdns.c:1360: Assertion req != port->pending_replies failed in server_port_flush

Reported by: mr-4 Owned by:
Priority: Medium Milestone:
Component: Core Tor/Tor Version: Tor: 0.2.2.24-alpha
Severity: Keywords: tor-client
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I am getting the above error when 1) the network interface tor uses (tun0) is temporarily disconnected for whatever reason (and openvpn tries to re-establish the connection) or 2) when the connection is pretty much congested (when I run a bittorrent client who uses the same tun0 device and also uses tor for dns resolution for example)

The error seems to be triggered by "assert(port->pending_replies != req);" (line 1311 of src/or/eventdns.c), though I have no idea what to do to fix this - it is very annoying!

I am using libevent 2.0.10 with tor 2.2.24 (compiled from source using the standard options - nothing fancy).

I have used tor 2.2.23 and libevent 1.x previously on the same system without any problems at all!

Child Tickets

Attachments (2)

evdns_circularity_patch.diff (424 bytes) - added by nickm 7 years ago.
libevent-2.0.10-stable-configure.patch (741 bytes) - added by mr-4 7 years ago.
libevent 2.0.10 patch (fedora src rpm)

Download all attachments as: .zip

Change History (20)

comment:1 Changed 7 years ago by Sebastian

On the libevent list you reported the error, but slightly differently. Where is the error triggered? eventdns.c shouldn't be used with libevent2 at all iirc, so I'm confused about this report.

comment:2 in reply to:  1 Changed 7 years ago by mr-4

Replying to Sebastian:

On the libevent list you reported the error, but slightly differently. Where is the error triggered? eventdns.c shouldn't be used with libevent2 at all iirc, so I'm confused about this report.

The error is triggered in one of the following two scenarios: when my openvpn connection is closed - about a minute after openvpn manages to reconnect me again I get the following in my syslog, after which tor bails out:

Apr 17 00:58:20 test1 openvpn[4773]: Initialization Sequence Completed
Apr 17 00:59:24 test1 Tor[6268]: Error from libevent: evdns.c:1360: Assertion req != port->pending_replies failed in server_port_flush

The second case is when my openvpn connection is pretty congested (i.e. it has a lot of traffic - mainly from my bittorent client - to deal with) this is what I get in my syslog (this is after I have restarted tor after it bailed out above and when there is initially heavy trafiic on my newly established openvpn connection):

Apr 17 01:01:23 test1 Tor[6635]: Error from libevent: evdns.c:1360: Assertion req != port->pending_replies failed in server_port_flush

After which tor bails out again.

Please note that when the traffic is fairly light-ish or openvpn does not disconnect me tor is happily running without any problems.

It is also true that I reported this on the libevent mailing list (I think it was yesterday) as I was not 100% sure whether this is a tor error or a libevent error (apologies, but I am not an expert in either of these packages!), though I am not aware what have I reported to be "different" from my original submission here.

comment:3 Changed 7 years ago by Sebastian

the difference is that here you reported an error in eventdns.c, which would indicate Tor's code, whereas on the libevent list (and in your comment here) you spoke of evdns.c which is a libevent file.

comment:4 in reply to:  3 Changed 7 years ago by mr-4

Replying to Sebastian:

the difference is that here you reported an error in eventdns.c, which would indicate Tor's code, whereas on the libevent list (and in your comment here) you spoke of evdns.c which is a libevent file.

I see, though in my defence in the subject line of this report I indicated that it is evdns.c!

Also in all fairness tor reports this error and then bails out, which I am not certain is the right thing to do.

So, is this a definite bug with libevent2 or is this something tor should be capable of handling and not bailing out on me (genuine question!)?

comment:5 Changed 7 years ago by mr-4

Could you let me know whether this is tor-related problem or libevent2 one so that I could submit a bug with the libevent people please?

As it stands I cannot use tor as it issues abort() as soon as it gets above error and it then exits - there is currently nothing I could do to alter that, short of downgrading to libevent 1.x.

Thank you!

comment:6 Changed 7 years ago by Sebastian

I believe it is probably an error of how Tor interfaces with libevent. I'm currently hoping Nick will have a good idea where this might coming from, my first few attempts to track this down haven't gone anywhere so far.

comment:7 Changed 7 years ago by nickm

Status: newneeds_review

Hm. I'm not currently able to figure out whether this is a Tor bug or a libevent bug. Let's try to deal with it here, though, since this is a bugtracker and there's more here already.

So looking at the code, it's calling the assert because it tries to call server_request_free() to remove a request, that request should be removed from any evdns_server_port that it's on (that is to say, the current port counts).

So there's some data corruption going on here. Let's see what it could be...

Okay, this part of server_request_free looks suspicious:

			if (req->next_pending)
				req->port->pending_replies = req->next_pending;
			else
				req->port->pending_replies = NULL;

pending_replies is a circular list, so req->next_pending should always be set if req is on the list. Instead , the check should probably be something like this:

			if (req->next_pending && req->next_pending != req)
				req->port->pending_replies = req->next_pending;
			else
				req->port->pending_replies = NULL;

I'm attaching a patch to apply to libevent. With this patch, do you get a) no error, b) the same error, c) a different error?

Changed 7 years ago by nickm

comment:8 in reply to:  7 Changed 7 years ago by mr-4

Replying to nickm:

I'm attaching a patch to apply to libevent. With this patch, do you get a) no error, b) the same error, c) a different error?

Thanks Nick for getting involved!

I will let you know when I get the chance to apply and test this patch (probably as early as tonight, time permitting). I take it I should apply this against libevent 2.0.10, right? Is there anything else you would like me to check/view/report apart from the above?

comment:9 Changed 7 years ago by nickm

Yup. Apply it to libevent-2.0.10, and make sure that the version of tor you're using is really linked against the updated libevent.

If the patch *doesn't* work, and you know how to use gdb, it would be good to get a stack trace here, and to see the values of "port", "port->pending_replies", and *port->pending_replies".

Otherwise, if the patch doesn't work, and you *don't* know how to use gdb, I'll try to come up with another patch to dump a bunch of debugging info or something.

comment:10 in reply to:  9 Changed 7 years ago by mr-4

Replying to nickm:

Yup. Apply it to libevent-2.0.10, and make sure that the version of tor you're using is really linked against the updated libevent.

If the patch *doesn't* work, and you know how to use gdb, it would be good to get a stack trace here, and to see the values of "port", "port->pending_replies", and *port->pending_replies".

Otherwise, if the patch doesn't work, and you *don't* know how to use gdb, I'll try to come up with another patch to dump a bunch of debugging info or something.

OK, I now inadvertently changed the description of this bug :((( Sorry about that, is there any way I could restore the previous description? Originally, I meant to post this instead:

Damn!

I just realised that I have to build this from source! Nick, if you are reading this I would (possibly) need some assistance!

The machine on which all this is going to run is very old i686 (Pentium2) box, which has a read-only (locked up) kernel code and libraries (it runs as part of my dmz). I normally build the entire image for this machine (Fedora kickstart file) on my dev machine (x86_64 Core2).

That means I would need to 1) be able to cross-compile libevent2 (never attempted that before); and 2) build Fedora rpm so that my kickstart could use it.

From what I remember Rawhide for FC15 has libevent2 (2.0.10), which means that I would be able to grab the source rpm. Even though I have successfully altered the .spec file for quite a few packages to enable them to cross-compile successfully (one reason I am banging on about libevent-devel to be changed to enable cross-compilation!) it is not a precise science.

Are there any peculiarities in libevent2 I should be aware of before I attempt to do cross-compilation of libevent2 (I will, of course, report back here if there are problems)?

As for your request above Nick, I haven't used gdb for about 8 years, so it is safe to say that I wouldn't be able to use it (grumble).

comment:11 Changed 7 years ago by nickm

AFAIK, libevent 2 should build cleanly cross-platform, though I haven't tried it in a while. If the SRPM business becomes inconvenient, you could just try cross-compiling a static libevent 2, and a tor linked to it statically.

comment:12 in reply to:  11 Changed 7 years ago by mr-4

Replying to nickm:

AFAIK, libevent 2 should build cleanly cross-platform, though I haven't tried it in a while. If the SRPM business becomes inconvenient, you could just try cross-compiling a static libevent 2, and a tor linked to it statically.

I just downloaded the source rpm (libevent-2.0.10-2.fc15.src.rpm) and see that Fedora applies one extra patch to disable various number & other type checks (long, void * etc), which, I think, are crucial for cross compilation to succeed (see that patch attached). I will try to first ignore that patch (i.e. build rpm without it) and if it doesn't work I then will attempt to cross-compile with it and see what happens.

Changed 7 years ago by mr-4

libevent 2.0.10 patch (fedora src rpm)

comment:13 Changed 7 years ago by mr-4

OK, just after an hour of moderate load I can confirm that the patch works - Tor doesn't fall over any more and is quite happy to serve floods of dns requests together with my bittorrent software, using the patched libevent2.

Nick, I was very successful in cross-compiling libevent with, I am pleased to confirm, minimal effort (just applied my "standard" set of patches on the .spec file for cross compilation and it worked first time) - I was even able to optimise one or two things in the rpm build process on that package as well.

The patch I attached in my last response above was, as it turned out, Fedora's attempt to deal with the multilib issue I highlighted to them more than 3 months ago.

That patch is very crude and it does not work, but since this is in the rawhide repository I wouldn't get too worried about this - not yet! Hope they will be able to fix it before FC15 comes about. Though it baffles me why are they not adopting the approach I suggested to them...

Anyway, tomorrow I intend to apply more solid loads on that machine and will post again. If everything is OK and everyone's happy this bug may be closed. Does everyone agree with this course of plan/action?

comment:14 Changed 7 years ago by nickm

Thant sounds plausible; let me know how it works out. In the meantime I'm applying my patch above to Libevent. Even if it doesn't fix _this_ bug, I am pretty sure that it will indeed fix _a_ bug in this code.

comment:15 Changed 7 years ago by mr-4

Right, very good news.

It turns out that the new version of tor + libevent2 (patched) + openvpn + my bittorrent client is even more reliable than using the previous version of tor with libevent 1.x!

I have today ran a custom-crafted test script to force tor to serve over 50 remote dns requests every minute, while resetting openvpn's connection (via SIGUSR1) every five minutes, combined with a consistent load of over 400KiB/s coming through my bittorent client on the vpn end - all this was kept going for over 30 minutes.

Solid as a rock!

When I used the old system image (with tor 2.2.23 and libevent 1.4) and did the same test, it tipped openvpn over the edge after about 15 minutes and then the whole thing fell like a pack of cards.

So, it seems that the patch did a good job and I am well-impressed with the performance of the new versions I've got, so that's a bonus!

Nick, thank you very much for helping out! I am happy that this particular bug has been truly squashed.

comment:16 Changed 7 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Thanks for your help testing this! Glad to see it got fixed. I'll mark the bug closed now.

comment:17 Changed 6 years ago by nickm

Keywords: tor-client added

comment:18 Changed 6 years ago by nickm

Component: Tor ClientTor
Note: See TracTickets for help on using tickets.