Opened 8 years ago

Closed 5 years ago

#3390 closed defect (worksforme)

Tor exit node crashed

Reported by: whabib Owned by:
Priority: High Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay
Cc: whabib@… Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

My tor exit node crashed out of the blue yesterday. I have since restarted. The router fingerprint is:

Krugman DFB1A14FD43679D57818AB706389892C4AB64A65

The line from the log file when it crashed was:

Jun 11 19:10:27.283 [err] Bug: routerlist.c:4783: routerlist_assert_ok: Assertion !memcmp(sd->extra_info_digest, d, DIGEST_LEN) failed; aborting.

I am running Tor 0.2.1.30 on CentOS linux.

Child Tickets

Change History (16)

comment:1 Changed 8 years ago by whabib

Cc: whabib@… added
Component: - Select a componentTor Relay

comment:2 Changed 8 years ago by arma

I wonder if this is another instance of the issue that Nickm commented starting with "Hoo boy". (I see the comment has survived into master too.)

comment:3 Changed 8 years ago by nickm

Milestone: Tor: 0.2.2.x-final
Priority: normalmajor

It's not an instance of that, I believe. The bug in the "hoo boy" comment is that nothing keeps two descriptors from having the same extrainfo_digest, and so there's no guarantee that a descriptor sd will appear in desc_by_eid_map[sd->extrainfo_digest], since some other descriptor might appear there.

But this bug is different: there's an entry sd in desc_by_eid_map[x] such sd->extrainfo_digest != x, which implies that desc_by_eid_map is inconsistent. This could happen because of three clauses:

  • Adding someting to desc_by_eid_map incorrectly
  • Changing an extrainfo_digest entry after adding it to desc_by_eid_map
  • Memory corruption.

Investigating further...

comment:4 Changed 8 years ago by nickm

.... as far as I can tell, none of the first 2 possible causes (not "clauses," apparently I can't type today) is happening. Revised possibilities seem to be:

  • Adding someting to desc_by_eid_map incorrectly -- except that everywhere we add to desc_by_eid_map, I believe we are doing it right.
  • Changing an extrainfo_digest entry after adding it to desc_by_eid_map -- except that nothing modifies an extrainfo_digest once it's set.
  • Freeing a descriptor without removing it from desc_by_eid_map, which poisons the whole routerinfo_t. This is likeliest to be happening with a routerinfo_t, since there's only two places that call signed_descriptor_free(), and they both look safe to me.
  • Generic corruption -- always conceivable, though if this is the cause here, it'll be hard to trace it down.

So the next step is to audit every routerinfo_free and make sure it's safe.

comment:5 Changed 8 years ago by nickm

suspicious: routerlist_insert_old and signed_descriptor_from_routerinfo both free a routerinfo without guaranteeing that it's been removed from desc_by_eid_map. If either one can be called on a routerinfo_t that's already in the desc_by_eid_map without first removing it, that would account for this bug.

comment:6 Changed 8 years ago by nickm

Hm, no. Every instance that calls signed_descriptor_from_routerinfo() seems to maintain desc_by_eid_map correctly, and everything that calls routerlist_insert_old seems to do so with a not-already-added descriptor. I'm hitting a wall here; will come back to look and think more later.

whabib -- if this ever happens again and you can get a stack trace, that would be great.

comment:7 Changed 8 years ago by rransom

wanoskarnet reports that routerlist_replace() overwrites map's pointer to a new cache_info if ri_new->cache_info.extra_info_digest and ri_old->cache_info.extra_info_digest are the same.

comment:8 Changed 8 years ago by nickm

Hm. We're talking about overwriting the desc_by_eid_map entry? I don't see how that can cause this bug. If both ri_new and ri_old have the same nonzero extra_info_digest, then:

  • First, we sdmap_set(rl->desc_by_eid_map, ri_new->cache_info.extra_info_digest, &ri_new->cache_info);
  • Next, if we decide to keep the old routerinfo, we make ri_old into an sd, and call sdmap_set(rl->desc_by_eid_map, sd->extra_info_digest, sd);. That makes the map entry point to sd, which shouldn't cause this particular assertion.
  • ...or if we decide NOT to keep the old routerinfo, and the descriptors are different, we call sdmap_remove(rl->desc_by_eid_map, ri_old->cache_info.extra_info_digest);. This would make there be no entry for this particular extrainfo_digest, which also wouldn't cause this assertion AFAICT.
  • OTOH, if we decide not to keep the old routerinfo, and the descriptors are the same, they necessarily have the same extrainfo_digest, so there can't be a dangling pointer in the map.

comment:9 Changed 8 years ago by nickm

Ah, and looking at backlog, it seems that wanoskarnet may have changed his mind shortly after you posted this. :/

comment:10 Changed 8 years ago by Sebastian

Status: newneeds_information

Did this ever happen again? Maybe we can get this reproduced somehow

comment:11 Changed 7 years ago by nickm

Milestone: Tor: 0.2.2.x-finalTor: 0.2.4.x-final

comment:12 Changed 7 years ago by nickm

Keywords: tor-relay added

comment:13 Changed 7 years ago by nickm

Component: Tor RelayTor

comment:14 Changed 7 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: unspecified

comment:15 Changed 7 years ago by whabib

Just for the record, I have not seen a recurrence of this bug in a real long time. We can close it, unless you guys are using it to track something else.

comment:16 Changed 5 years ago by dgoulet

Resolution: worksforme
Status: needs_informationclosed

2 years old and whatbib did mention that the bug has not been seen in a long time.

Note: See TracTickets for help on using tickets.