distributing descriptors accross CollecTor instances
Karsten's suggestions:
== Spam Prevention ... a working solution for fetching descriptors from other CollecTor hosts without risking being spammed forever: we simply add multiple @source annotations to a descriptor, one for each source (directory authority IP, other CollecTor host IP, etc.). If we later find out that one source was spamming us, we can easily delete all descriptors that only have the @source annotation with the spamming host's IP address.
Here's an example how the tor daemon annotates descriptors:
@uploaded-at 2016-04-18 18:49:25
@source "81.17.16.43"
router pairoj 81.17.16.43 443 0 80
platform Tor 0.2.6.10 on Linux
[...]
It's important that we'd only add those @source annotations to archived descriptors, not to recent descriptors, or we'd serve those descriptors as new every time we're adding a @source.
It would also be useful to have stats on the number of newly added @source annotations per hour, so that we learn if we're getting spammed, and to have a script for deleting descriptors that only have a given @source annotation.
== Statistics ... one nice thing we could do here is get statistics on descriptor completeness out of the box: we just count how many descriptors have @source annotations from known CollecTor mirrors vs. directory authorities or from wherever we're fetching from. That will tell us immediately how many descriptors we'd have missed without mirrors.