Any other configuration changes or hints?
Tor version 0.2.3.22, Vidalia 0.2.20, No idea about bundle, all Tor & Vidalia stuff installed via Synaptic package manager, OS ubuntu 64-bit 12.04 LTS, updated regularly, all possible unity software removed. I use OS's (perhaps related also to firefox somehow) default proxies.
I'm really a newbie in Tor: now I have been running a relay for about two months. In my previous main system (also a 64 bit Linux machine) I tried to run Tor, but it just kept crashing, so I gave up.
On the previous system, when it 'kept crashing', do you have any other hints for us?
Trac: Priority: normal to major Summary: Tor's request to send a bug report to Bug: microdesc_free() called, but md was still referenced 1 node(s); held_by_nodes == 1
When my old Tor crashed, it seemed random: sometimes it took 5 min, sometimes 1 hour. That was about 1½ years ago. I kept no records, I just thought I'll give Tor another try later.
I installed Ubuntu's updates suggested by the Update manager some 8 hours ago. Tor and Vidalia went completely down, as did Firefox (it had Tor enabled). I had to remove Firefox completely and re-install it to get it up. Vidalia and Tor are now up and running, but I had to start them as root (sudo was not enough), While trying, the whole system crashed several times, but recovered without any tricks, creating automatic bug reports for the Ubuntu team. I think the later is Ubuntu's bug: it just would not let me start anything with Gnome GUI as root, and as a result told me even:"...root does not exist, cant' do anything without it" (Sic!). Please keep me informed, there might be some serious bugs spreading around.
Update: latest message from Vidalia Message log:"Oct 24 13:57:27.610 [Notice] Heartbeat: Tor's uptime is 2 days 17:53 hours, with 36 circuits open. I've sent 10.95 GB and received 2.09 GB." Quite a lot sent compared to received, maybe even a security failure. Only the relay has been running after the crash. I generally use client services very little, basically I have only tested weather they work. Please keep me informed, if there's anything new I should do in order to keep my relay running as it should. I've been away for a couple of days, so I could no report the current status earlier.
The "Interrupt, not accepting new connections, shutting down in 30 seconds" thing is what happens if Tor received a SIGINT signal, or a control-c at the command line. It usually doesn't indicate a bug.
Thanks nickm. To my surprise, Tor and vidalia are now working as I expect: they started from the "onion"-button from Gnome when I was logged in as a normal user. Not a single warning or error message, the latest:"Oct 31 13:22:33.228 [Notice] Heartbeat: Tor's uptime is 22:54 hours, with 15 circuits open. I've sent 810.66 MB and received 790.18 MB." There's been an update by Ubuntu Update Manager: the current version is 0.2.3.24-rc. The one that I had problems with was 0.2.3.22-rc. Problem solved?
It happened again:Nov 06 20:00:46.279 [Warning] microdesc_free(): Bug: microdesc_free() called, but md was still referenced 1 node(s); held_by_nodes == 1. Latest normal log:Nov 06 19:22:33.228 [Notice] Heartbeat: Tor's uptime is 7 days 4:54 hours, with 58 circuits open. I've sent 16.31 GB and received 16.02 GB. I have no idea what's happening. Please help, I want to run Tor without causing any trouble to other users.
I want to run Tor without causing any trouble to other users.
For what it's worth, seeing this message is probably harmless. That is, we should track it down, but it's probably not causing any trouble to other users.
Tor 0.2.3.25; I don't recall offhand which bundle I'm using, but I keep them up to date; Ubuntu Linux 12.04.1 LTS; running as a non-exit relay. This is the first time I've encountered this particular error in my years of running Tor. The only thing I've been doing differently lately is running EtherApe.
I looked over all the microdesc_free() calls again, and none seems super likely here. Maaaybe the one in microdesc_cache_clean? But how would a currently live node reference a microdescriptor that was last listed over 7 days ago? And could something else be going on?
In an attempt to track this down, I did a quick patch to log the fname:lineno that's invoking microdesc_free(). See branch "bug7164_diagnostic" in my public repo. The branch is against 0.2.4.
I looked over all the microdesc_free() calls again, and none seems super likely here. Maaaybe the one in microdesc_cache_clean? But how would a currently live node reference a microdescriptor that was last listed over 7 days ago? And could something else be going on?
In an attempt to track this down, I did a quick patch to log the fname:lineno that's invoking microdesc_free(). See branch "bug7164_diagnostic" in my public repo. The branch is against 0.2.4.
I think it's fine to merge this diagnostic branch.
I got same error
tor version: 0.2.35
vidalia version: 0.2.21
OS: windows 7
bundle: I installed "Vidalia Relay Bundle" but i changed in settings to a exit relay!
I am a client and relay, mostly relay but use client once in a while, got it running but don't use it as much as relay!
I get the error about 10 times every hour!
Gonna update to 0.2.4 t test!
Trac: Status: needs_information to new Milestone: Tor: 0.2.5.x-final to Tor: 0.2.4.x-final Version: Tor: 0.2.3.22-rc to Tor: 0.2.4.19 Summary: Bug: microdesc_free() called, but md was still referenced 1 node(s); held_by_nodes == 1 to microdesc.c:378: Bug: microdesc_free() called, but md was still referenced 1 node(s); held_by_nodes == 1
I have asked the user for exact version information and OS version information.
[sex 14. mar 19:32:52 2014] Tor Software Error - The Tor software
encountered an internal bug. Please report the following error message to
the Tor developers at bugs.torproject.org: "microdesc_free(): Bug:
microdesc_free() called, but md was still referenced 1 node(s);
held_by_nodes == 1
"
So, there's a microdesc that is (probably) held by a node, but its last-listed is more than one week ago. Interesting!
In theory:
A node should not exist unless it has a routerstatus or a routerinfo.
A node should not have a microdescriptor unless it has a routerstatus.
Whenever a networkstatus is loaded, we should be updating the last_seen field of the microdescriptors.
So something has gone wrong with the theory.
I'm not too sure what -- if somebody has ideas, that would be great. I've tried to write an improved diagnostic branch. Please review "bug7164_diagnose_harder" in my public repository. It's more logs, not a fix.
Trac: Status: new to needs_review Milestone: Tor: 0.2.4.x-final to Tor: 0.2.5.x-final
Please don't mess with the milestones. The "024-backport" tag is what means "backport this to 0.2.4 when it's done". All bugfixes for 0.2.4 or 0.2.3 need to get tested in 0.2.4 (the latest branch) before they can get get backported.
Trac: Milestone: Tor: 0.2.4.x-final to Tor: 0.2.5.x-final
The line number suggests that this is happening in microdesc_cache_clean():
for (mdp = HT_START(microdesc_map, &cache->map); mdp != NULL; ) { if ((*mdp)->last_listed < cutoff) { ++dropped; victim = *mdp; mdp = HT_NEXT_RMV(microdesc_map, &cache->map, mdp); victim->held_in_map = 0; bytes_dropped += victim->bodylen; microdesc_free(victim); } else { ++kept; mdp = HT_NEXT(microdesc_map, &cache->map, mdp); } }}}}So, there's a microdesc that is (probably) held by a node, but its last-listed is more than one week ago. Interesting!In theory: * A node should not exist unless it has a routerstatus or a routerinfo. * A node should not have a microdescriptor unless it has a routerstatus. * Whenever a networkstatus is loaded, we should be updating the last_seen field of the microdescriptors.So something has gone wrong with the theory.I'm not too sure what -- if somebody has ideas, that would be great. I've tried to write an improved diagnostic branch. Please review "bug7164_diagnose_harder" in my public repository. It's more logs, not a fix.
Also, the wording of the log messages ("Microdescriptor seemed very old (last listed %d hours ago vs %d hour cutoff), but is still marked as being held by %d node(s). I found %d node(s) holding it.") suggests you only want to emit for old microdescriptors, but the enclosing test is just if (held_by_nodes) rather than if (is_old && held_by_nodes). Am I missing something here?
I think the actual logging code is okay as long as we're sure the tests are right; making them over-eager means a lot of searching the whole list of nodes in nodelist_find_nodes_with_microdesc() whenever we run microdesc_cache_clean()
The test that would match the old behavior would be just if (is_old), surely?
The old behavior was to call "microdesc_free()" if the microdescriptor was old... and then the warning would be produced because held_by_nodes was nonzero, and we wouldn't free the thing. The change here has the old behavior in the non-warning case, and the new behavior in the case where we would have given a warning.
Also, the wording of the log messages ("Microdescriptor seemed very old (last listed %d hours ago vs %d hour cutoff), but is still marked as being held by %d node(s). I found %d node(s) holding it.") suggests you only want to emit for old microdescriptors, but the enclosing test is just if (held_by_nodes) rather than if (is_old && held_by_nodes). Am I missing something here?
Should we sneak in a log severity downgrade in 0.2.4.21?
It's my understanding that the more useful log message change went into 0.2.5, leaving the 0.2.4 with nothing useful to us but their logs still yell at them.
7 days to trigger bug:
day 0: user launches Tor, fetches md (which later fails) and caches it with last_listed = 0 to disk;
day 1: user online; restarting, shutdowning etc;
day 2: user online, relay still announcing the same keys as for md from cache, no cached md changes;
day 3: the same;
day 4: the same;
day 5: user offline;
day 6: user offline;
day 7: user starting Tor, cached consensus is several days old; md's last_listed stored on disk is zero; update_microdescs_from_networkstatus can't to update last_listed field in memory anymore; if microdesc_cache_clean launched before new consensus arrived and "last_listed < today - 7days" then bug triggeres.
Corruption of nodelist_map could to explain this bug.
for (iter = HT_START(nodelist_map, &the_nodelist->nodes_by_id); iter; ) {
Probably ends before affected node->md fixed.
That corruption probably not affects HT_FIND or HT_INSERT because values of
(head)->hth_table[HT_ELT_HASH_(elm,field,hashfn) % head->hth_table_length] not broken.
Corruption probably happens during HT_GROW or something.
It's only theory that could be confirmed by new logs, node->rs will be NULL if nodelist_map was corrupted.
Corruption of nodelist_map
Just for clarity. All HT used by Tor with general mem alloc functions (malloc, realloc). It checks result of malloc or realloc but at low level only (by HT_GROW), and no checks result of HT_GROW (it returns -1 if something wrong and 0 if something ok and wrong too for overfilled by limits).
Also, if corruption is happening, we need to figure out why corruption only seems to be happening to this hashtable: the other ones seem to be working okay.
the other ones seem to be working okay.
Or maybe not. It's probably corruption of this hashtable leads to visible effects while corruption of another hashtables do triggers nothing.
New instance with 0.2.5.8-rc reported in #13481 (moved):
microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 168 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0.
microdesc_cache_clean(): Bug: [0]: ID=99C6B1CB2E5E1030F40FE4AA55A1AA3566037507. md=0x7ff53e51ce50, rs=0x7ff53e5e4d20, ri=0x7ff53fc19c20. Microdesc digest in RS does match. RS okay in networkstatus.
To me, these things are suggestive, or at least interesting:
There is an 'ri' value set.
This is a server.
There are many, many of these messages occurring.
And it seems that there are a couple of problems with our diagnostic:
If we're a relay, looking at node->rs is pointless here, since it's from the FLAV_NS consensus.. We need to look at the rs from our latest microdescriptor consensus.
Also, there's a typo: "Microdest digest in RS does match" should be "...does not match". Fortunately, we give a different message when it does.
I'm not technical enough a user to understand the details in the discussion above but I will give a couple more datapoints here just in case it helps in diagnosing the issue. I run a middle relay on version 2.5.10 (but I started seeing these about a week ago on 2.5.9). I had never seen them before but I kept getting them every hour (about 20 at a time). After a few days of this I restarted my relay both to update to 2.5.10 and to see if that cleared up the issue. After restart the relay worked normally for about a week and now I am seeing them again. Here is one pair of lines (again getting about 20 of these every hour).
Nov 12 07:25:07.334 [Warning] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 168 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 0 hours old. Hashtable badness is 0.
Nov 12 07:25:07.338 [Warning] microdesc_cache_clean(): Bug: [0]: ID=18E0BC920307C0B79F576EDB3473D74D4D1BCA94. md=0x7f8a1bcdaf60, rs=0x7f8a1d09c290, ri=0x7f8a1d86c200. Microdesc digest in RS does match. RS okay in networkstatus.
Everything in the logs looks normal before these errors start cropping up, the only thing even a bit out of place is that in the last heartbeat before the errors start I am seeing about 8000 TAP handshakes compared to 3 or 4 thousand normally (the number of circuits and NTor handshakes are normal though).
Anyway, hope this information helps diagnose the problem. I will keep my relay up like this for a couple more days before restarting in case there is any other kind of information you would like gathered from it.
P.S. I copied the logs into a text file and did a bit of sed/grep work on them and compiled a list of relay ID numbers from the ID= field. There are a total of 414 of them listed and not a single one appears twice in the list. Out of 6000 possible relays this represents a sizeable fraction of the network.
P.P.S now about 2 days later I am up to 1800 relays showing a bad entry, and still exactly 0 repeats.
Last Update - I am now up to about 4200 relays listed in the error logs, however it has started to "loop back around" and I now see about 1000 or so with 2 occurrences each, and the rest with one only; none with 3 or more.
New instance with 0.2.5.8-rc reported in #13481 (moved):
#13481 (moved) is probably related and not fully duplicate of this bug.
Bunch of "md was still referenced" should be able to trigger if to launch tor as client and to change it to relay mode after all stuff fetched then wait for a week or so.
nodelist_set_consensus() called only for usable_consensus_flavor() (FLAV_MICRODESC for client and FLAV_NS for relay). While client mode enabled nodelist_set_consensus() changes node->md, and skips it if relay mode enabled, then if node->rs exist for given ns consensus then such node can't be purged.
Probably valid explanation, if nothing missed.
Where networkstatus_get_latest_consensus depends we_use_microdescriptors_for_circuits and would to return current_ns_consensus if relay mode enabled (generally it depends another options yet).
Bunch of "md was still referenced" should be able to trigger if to launch tor as client and to change it to relay mode after all stuff fetched then wait for a week or so.
Yet way to trigger bug is to use bridges, if one bridge supports md and another bridge have no md support. Client switches to ns-consensus after bridge that supports md fails circuit. Then client leaving pointers (node->md) and md unchanged, so bug depends cached stuff (@last-listed values) and client uptime. However, this scenario, if it was confirmed yet, shouldn't be actual today, md supported since 0.2.3.1-alpha version and 0.2.3.x is outdated.
It's still keep buggy for case switching in between client only and relay modes while both current_md_consensus and current_ns_consensus cached. Everything then depends timing till new consensus update, and md's last-listed values.
--- microdesc.c.original 2014-12-16 10:16:08.393137000 -0800+++ microdesc.c 2014-12-18 08:30:27.637096984 -0800@@ -847,6 +847,8 @@ we_use_microdescriptors_for_circuits(const or_options_t *options) { int ret = options->UseMicrodescriptors;+ static int prev_ret_we_use_md = -1;+ networkstatus_t ns = NULL; if (ret == -1) { /* UseMicrodescriptors is "auto"; we need to decide: */ /* If we are configured to use bridges and none of our bridges@@ -859,6 +861,24 @@ * a partitioning issue here where bridges differ from clients. */ ret = !server_mode(options) && !options->FetchUselessDescriptors; }+ /* Detect if preferable consensus flavor changed,+ * update nodelist according to choosen consensus then. */+ if (prev_ret_we_use_md != -1 && prev_ret_we_use_md != ret) {++ /* We can't to call networkstatus_get_latest_consensus(),+ * it returns current_consensus that depends+ * call of we_use_microdescriptors_for_circuits() */+ if (ret) /* we use microdescriptors, we need md-consensus */+ ns = networkstatus_get_latest_consensus_by_flavor(FLAV_MICRODESC);+ else + ns = networkstatus_get_latest_consensus_by_flavor(FLAV_NS);++ /* update nodelist with any latest consensus */+ if (ns)+ nodelist_set_consensus(ns);+ }++ prev_ret_we_use_md = ret; return ret; }
--- nodelist.c.original 2014-10-10 06:06:24.000000000 -0700+++ nodelist.c 2014-12-18 08:23:30.969080066 -0800@@ -229,6 +229,11 @@ if (node->md) node->md->held_by_nodes++; }+ } else { /* No md-consensus used, releasing md used by node if need */+ if (node->md) {+ node->md->held_by_nodes--;+ node->md = NULL;+ } } node_set_country(node);
Next patch tries to resolve last edge case
More preferable and non-kludgy way is to drop check of any_bridge_supports_microdescriptors() by we_use_microdescriptors_for_circuits, assuming every bridge support md. Then to update nodelist by options_act if config options affecting preferable consensus flavor was changed.
whoo, just hit this in 0.2.8, when starting up with a very old data directory. It said:
Jul 13 11:46:43.000 [notice] Bootstrapped 100%: DoneJul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 1056 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: [0]: ID=3EFB929FA15E084A5F53C9C7A058C593A97DFD57. md=0x55b7d50ae050, rs=0x55b7d5443610, ri=(nil). Microdesc digest in RS matches. RS okay in networkstatus. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 1056 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: [0]: ID=179B10784BF8955C73313CCB195904AE133E5F53. md=0x55b7d50adc20, rs=0x55b7d5412f20, ri=(nil). Microdesc digest in RS matches. RS okay in networkstatus. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 1056 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: [0]: ID=1EA3292F7D9EB3A9CEBA0DE4C0C19895C7C449C5. md=0x55b7d5152820, rs=0x55b7d541af00, ri=(nil). Microdesc digest in RS matches. RS okay in networkstatus. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 1056 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: [0]: ID=7C0AA4E3B73E407E9F5FEB1912F8BE26D8AA124D. md=0x55b7d4f91a30, rs=0x55b7d548cde0, ri=(nil). Microdesc digest in RS matches. RS okay in networkstatus. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 1056 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: [0]: ID=E79699F226A6ED3B1D13B0F6B983D40779B8693E. md=0x55b7d50a9e90, rs=0x55b7d5513e50, ri=(nil). Microdesc digest in RS matches. RS okay in networkstatus. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 1056 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: [0]: ID=F378A4DD858B6B7BBB79C6EAB1CE6912AC0FF8BC. md=0x55b7d51523b0, rs=0x55b7d5522a20, ri=(nil). Microdesc digest in RS matches. RS okay in networkstatus. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 1056 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: [0]: ID=2679B51C906158F3DF4C59AFD73E2B1FDA6535E1. md=0x55b7d50aa890, rs=0x55b7d540d340, ri=(nil). Microdesc digest in RS matches. RS okay in networkstatus. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 1056 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.5-rc-dev )Jul 13 12:16:38.000 [warn] microdesc_cache_clean(): Bug: [0]: ID=B870A9F8085D9D63541CA1C73C82C5D2827919F1. md=0x55b7d50ac210, rs=0x55b7d54daeb0, ri=(nil). Microdesc digest in RS matches. RS okay in networkstatus. (on Tor 0.2.8.5-rc-dev )Jul 13 17:46:38.000 [notice] Heartbeat: Tor's uptime is 5:59 hours, with 0 circuits open. I've sent 527 kB and received 4.90 MB.
Huh. Those microdescriptors supposedly have corresponding routerstatus objects with matching md digests. They shouldn't be "very old" now; we should consider them "last listed" one hour ago.
Trac: Milestone: Tor: 0.2.??? to Tor: 0.2.9.x-final Status: needs_revision to new
Okay. A consensus arrives, and in directory.c, in connection_dir_client_reached_eof, we update the consensus and call "update_microdescs_from_networkstatus()", which will eventually update the last-listed times. But before that function updates the last-listed times, it calls get_microdesc_cache().
If get_microdesc_cache() is being called for the first time, it loads the microdescs from disk, and then calls microdesc_cache_clean(). We have a new consensus, but we have not yet finished update_microdescs_from_networkstatus.
If get_microdesc_cache() is being called for the first time, it loads the microdescs from disk, and then calls microdesc_cache_clean(). We have a new consensus, but we have not yet finished update_microdescs_from_networkstatus.
[warn] microdesc_cache_clean(): Bug: Microdescriptor seemed very old (last listed 3762 hours ago vs 168 hour cutoff), but is still marked as being held by 1 node(s). I found 1 node(s) holding it. Current networkstatus is 1 hours old. Hashtable badness is 0. (on Tor 0.2.8.6 )
Note that I can't really test this, I do not have a old data set of tor :S and using CollecTor files makes it a bit complicated since the archive of microdescriptors are ordered by ID and not in one single fat file.
echo "the report of my death was an exaggeration" | openssl sha256
Deciphered
get_microdesc_cache() is being called for the first time by nodelist_set_consensus(). md->held_by_nodes incremented by nodelist_set_consensus() too (nodelist_add_microdesc skip that part as no any nodes created yet). microdesc_cache_clean() is being called for the first time without held_by_nodes for any md. if no held_by_nodes then no bug.
Your case happens if consensus were set by networkstatus_note_certs_arrived(), it miss all update_*() functions.
Alas, it's not last edge case that trigger thus warnings. nodelist_set_consensus() must to clear all node->md like it does for node->rs to fix all remain cases.
The rest of this ticket seems to need a revision, based on the cypherpunks comments.
(Next time, please PGP encrypt an email for vulnerabilities, or make plain-text comments otherwise. It helps us fix all the issues at the same time.)