wiki:doc/CollecTor/DescriptorDistribution

Outline of CollecTor Descriptor Distribution

The sync-process will available for the modules relaydescs, bridgedescs, exitlists, and torperf. The additional functionality should be generalized as far as possible and module dependent functionality should be part of the module's code.

Configuration

  1. General settings: Add the properties SyncRelayDescriptors, SyncBridgeDescriptors, SyncExitLists, and SyncTorperfFiles to the respective properties sections. These properties have the enum type SyncType with the following values: Sync, NoSync, and SyncOnly. The property SyncFolder contains the top path for storing the downloaded descriptors.
  2. Choice of sync-sources: The properties SyncSourcesRelayDescriptors, SyncSourcesBridgeDescriptors, and SyncSourcesExitLists are added to the respective properties sections. Each containing an array of strings specifying a source name and source URL for each CollecTor instance to retrieve descriptors from.
  3. Choice of descriptors: The entire substructure of 'recent' will be fetched, i.e. recent/exit-lists/* for exitlists, recent/relay-descriptors/**/* for relaydescs, and recent/bridge-descriptors/**/* for bridgesdescs.
  4. Backup of replaced local files: There won't be a backup of replaced local files.

Fetching and Merging

If Sync* has the value NoSync, nothing is done. SyncOnly will not start the module and immediately begin fetching from the instances configured in SyncSources*. Sync will first run the module and then begin to sync.

Processing

  1. Retrieve descriptors from the CollecTor instances defined in SyncSources*. These descriptors are stored in SyncFolder under the host part of the instance's url, e.g. my-sync-folder/collector.torproject.org/recent/exit-lists for exitlists from the main instance.
  2. Following retrieval the fetched descriptors are examined:
    1. discard descriptor files that do not contain what they should (see comment:11) and log a warning with sync-source info and reason (see criteria).
    2. copy valid descriptors (see criteria) without a pre-existing local copy to the local *OutDirectory (cf. collector.properties) and 'recent' structure.
    3. if there is a local copy already, decide which copy to keep (see criteria).
      1. local copy is kept, log debug message with source and reason.
      2. local and fetched are identical, log debug message with source and reason.
      3. Maybe later: fetched copy should replace local descriptor. Copy fetched descriptor to local *OutDirectory and 'recent'. In all cases log debug message with source and reason.

Replacement criteria

As the replacement criteria are not fully defined yet and it is very likely that there will be more criteria in future a modular/pluggable approach seems useful, i.e.:

  1. define KeepCriterium and ReplaceCriterium interfaces
  2. register implementing classes with CollecTor in order to facilitate the selection steps described above.

The only initial ReplaceCriterium will never allow replacing. The only initial KeepCriterium is a valid descriptor is contained in the descriptor file. For the initial implementation it suffices to hard-code the *Criterium classes with the option to easily make that configurable later.

Last modified 16 months ago Last modified on Oct 8, 2016, 10:39:00 AM