CollecTor's RelayDescriptorDownloader only downloads server and extra-info descriptors that have been published up to 24 hours before the current system time. This makes sense, so that missing descriptors that cannot be obtained are not retried forever.

However, there are cases when a valid consensus or vote references a server descriptor that was published over 24 hours ago:

  • CollecTor may run at any time of the hour, at which point the valid-after time of current consensuses and votes may already be up to 1 hour behind the current system time.
  • The votes that a consensus is based on are generated 10 minutes before the valid-after time, and they may contain server descriptors that have been published in the past 24 hours.
  • Directory authorities may serve an older consensus than the current consensus, say, one that is already 2 hours older than the current one.

All in all, CollecTor should attempt to fetch descriptors that are 27:10 hours old, or let's say 30 hours for simplicity and to account for cases we didn't consider here.

The downside is that missing descriptors will be retried for 6 more hours, but that doesn't seem to be that much of a problem, given that missing descriptors will be retried in batches of up to 96.

Here's a trivial patch:

diff --git a/src/main/java/org/torproject/collector/relaydescs/ b/src/main/java/org/torproject/collector/relaydescs/
index f4e38f4..21b1ee4 100644
--- a/src/main/java/org/torproject/collector/relaydescs/
+++ b/src/main/java/org/torproject/collector/relaydescs/
@@ -185,7 +185,9 @@ public class RelayDescriptorDownloader {
    * Cut-off time for missing server and extra-info descriptors, formatted
    * "yyyy-MM-dd HH:mm:ss". This time is initialized as the current system
-   * time minus 24 hours.
+   * time minus 30 hours (24 hours for the maximum age of descriptors to be
+   * referenced plus 6 hours for the time between generating votes and
+   * processing a consensus).
   private String descriptorCutOff;
@@ -330,7 +332,7 @@ public class RelayDescriptorDownloader {
     long now = System.currentTimeMillis();
     this.currentValidAfter = format.format((now / (60L * 60L * 1000L))
         * (60L * 60L * 1000L));
-    this.descriptorCutOff = format.format(now - 24L * 60L * 60L * 1000L);
+    this.descriptorCutOff = format.format(now - 30L * 60L * 60L * 1000L);
     this.currentTimestamp = format.format(now);
     this.downloadAllDescriptorsCutOff = format.format(now
         - 23L * 60L * 60L * 1000L - 30L * 60L * 1000L);

Note: We could easily move this ticket to 1.1.0 and resolve it together with #8799.

I just tried out the patch and compared it to an unpatched version. The effect is that the patched version downloaded 14 additional server descriptors and 14 additional extra-info descriptor that were published between 24:03 and 24:30 hours before the current system time. I still think it makes sense to apply this change.

Please review commit 555ad13 in my task-19828 branch which is the same patch as above, based on master, plus a change log entry.

Please review commit 555ad13 in my task-19828 branch which is the same patch as above, based on master, plus a change log entry.

Merged. Closing.

