When we conclude a relay is unreachable, we give it free uptime
In researching #2714 (moved) (for #2709 (moved)), I noticed:
Mar 10 15:44:21.000 [info] dirserv_orconn_tls_done(): Found router MesBotEU1 to be reachable. Yay.
Mar 10 16:24:04.000 [info] run_connection_housekeeping(): Expiring non-open OR connection to fd 1245 (78.47.251.152:667).
Mar 10 16:44:40.000 [info] run_connection_housekeeping(): Expiring non-open OR connection to fd 990 (78.47.251.152:667).
Mar 10 16:50:01.000 [info] rep_hist_note_router_unreachable(): Router EABCA5F5D71D926C4A425E09C8C7F3AA46850EF6 is now non-Running: it had previously been Running since 2011-03-09 19:41:41. Its total weighted uptime is 1412377/1426655.
Mar 10 17:02:48.000 [info] dirserv_orconn_tls_done(): Found router MesBotEU1 to be reachable. Yay.
Mar 10 17:02:48.000 [info] rep_hist_note_router_reachable(): Router EABCA5F5D71D926C4A425E09C8C7F3AA46850EF6 is now Running; it had been down since 2011-03-10 16:50:01.
We do a reachability test every 21.3 minutes, so with REACHABLE_TIMEOUT at 45 minutes, that means you can fail one or two reachability tests and still get counted up. Fine.
But when you fail more than that, and we conclude you're down, should we assume you were up for the entire grace period?
Seems like we should conclude you went (retroactively) down at the first failed test -- 16:24:04 in this case.
While I'm at it, what happened to the reachability test around 16:04? I couldn't find any evidence of it in my logs. (But there are millions of lines here, so maybe I just didn't look carefully enough.)
(For the record, MesBotEU1 thinks it was up the whole time.)