CollecTor Development

This is a living and changing document to accompany the current project for improving CollecTor.

Areas of Work

During the course of this project the following sections will more and more turn into descriptions and documentation. Currently, they are a mixture of very defined improvements as well as sketches and wishes and questions.

Analyze Descriptor Completeness

The analysis will be based on log-files and the downloaded files and address the following questions:

How many descriptors are missing?

How could this loss be avoided?

  • actively monitor resources like available storage space (discussion in ticket #18865).
  • verify and improve runtime statistics in order to have a clearer picture (discussion in ticket #19169).
  • Extra-info descriptors dropped b/c of parsing problems are counted as missing. This should be avoided. ticket #19170.

Next Steps

Continue analysis when sync-process is deployed.

Provide Guide Documents

These guides should be based on the previous work in Onionoo and metrics-lib. In detail

  • Contributor's Guide: create as detailed in #18733 and place the new guide in a central location, which still needs to be identified; this could be a large document in the central place and a small document in CollecTor referencing the main document. (detailed discussion in #18730)
  • Release Process (definded in #18732)
  • Installation Guide for Operators (adapt the existing document), ticket #18734

Implement the Release Process

(according to the guide above)

Design Changes

This section describes improvements that ought to make CollecTor more maintainable, testable, and more efficient.

  1. Run collector with an internal scheduler instead of using external scheduling (e.g. crontab), #19018
  2. Add shutdown hook to provide a controlled way of stopping. Discussion #19016.
  3. Some parts of CollecTor's data processing are provided by bash scripts run via crontab. These should be integrated into the java application.

Improve CollecTor Operation and Setup

Once there is the executable jar including the shutdown hook implementation CollecTor should be started as a linux service, i.e., an appropriate shell script needs to be provided.

Further Sketches of Areas for Improvements

  • store unparsable descriptors rather than discarding them
    • add local storage for descriptors that cannot be parsed for review by the service operator and later reprocessing
  • synchronization between CollecTor instances see #18910 and DescriptorDistribution
  • improve the process of creating tarballs
    • reduce memory consumption throughout
  • consider using an embedded http server in order to reduce operating complexity


Release 1.1.0

Release date: tbd

Ticket Summary Status
#18910 distributing descriptors accross CollecTor instances closed
#19822 set up a CollecTor mirror for synchronization with the main CollecTor closed
#19831 Change default for compressing descriptors to true closed
#20162 reduce configuration parameters in closed
#20179 Require absolute path for `$TARBALLTARGETDIR` in `src/main/resources/` closed
#20380 Expand to a more complete operator's guide closed
#20408 Move index.json* to index/ subdirectory closed

Release 1.2.0

Release date: tbd

Ticket Summary Status
#8799 collector's downloads: avoid httpurl-connection closed
#19755 improve code quality of bridgedescs module closed
#19778 Bridge descriptor sanitizer runs out of memory after 13.5 days closed
#19934 CollecTor should use new metrics-lib json classes closed
#20514 CollecTor' torperf module: replace HttpURLConnection closed
#20515 CollecTor's relaydescs module should avoid httpurlconnection closed
#20516 CollecTor's exitlists module should avoid httpurlconnection closed
#21443 CollecTor does not delete exit lists after three days anymore closed
#22216 Decide whether to sanitize padding-counts lines closed
#22247 Remove deprecation warnings as soon as metrics-lib 1.7.0 is released closed
#22652 Adapt CollecTor to metrics-lib 1.9.0 closed
#22754 Reference checker should only read relay descriptors closed
#22833 Either include or retain "fingerprint" line in bridge network statuses with @type bridge-network-status 1.2 closed

Release 2.0.0

Release date: tbd

Ticket Summary Status
#20350 Replace shell script with Java module assigned

Past Releases

Release 1.0.2, October 7, 2016

Ticket Summary Severity
#19016 add shutdown hook Normal
#19317 Sanitize TCP ports in bridge descriptors Normal
#19894 print message when no module is activated Minor
#19895 make CollecTor stop after RunOnce Normal
#19924 base url should not be in quotes Minor
#20079 Change log thresholds from TRACE to INFO Normal

bugfix Release 1.0.1, August 22, 2016

Prevent out-of-memory error, cf. #19913.

First Release 1.0.0, August 11, 2016

Ticket Summary Severity
#18707 use java 7 Minor
#18719 provide executable jar Normal
#18727 refactor ernie before very first metrics-db release Normal
#18734 Installation Guide for Operators Normal
#18792 tweak build.xml for new tasks and java 7 Normal
#18793 add checkstyle task Normal
#18794 add cobertura task Normal
#18818 Stop using deprecated parts of metrics-lib. Normal
#18865 actively monitor resources like available storage space Normal
#18922 configure logging via properties file Normal
#18931 coding style polishing Normal
#18955 javadoc coverage checkstyle warnings Normal
#19005 make all data directories configurable Normal
#19015 use logging framework other than java.util.logging Normal
#19018 run CollecTor modules without crontab Normal
#19021 improve configuration process Normal
#19170 make parsing more robust (extra-info) Normal
#19373 write test that checks the default Normal
#19424 remove hard coded paths and set default properties to values used on the main CollecTor instance Minor
#19615 CollecTor should confirm to style guide Normal
#19641 investigate and fix MainTest Normal
#19651 add missing scripts to collector.git Normal
#19720 CollecTor should be re-configurable without restart Normal
#19727 correct exitlist Normal
#19771 investigate halt of scheduling for one of many tasks in collector's scheduler Normal
#19776 Make minor improvements to scheduler Normal
#19813 define release process and do release of milestone 1.0.0 Normal
#19829 Update directory authority addresses to recent tor.git Normal
#19830 Check if recent directory exists before checking available space Normal
#19840 Change path defaults to match those of main CollecTor instance Normal

All Tasks in Trac

Active Tasks

Results (1 - 10 of 34)

1 2 3 4
Ticket Summary Status Priority Severity Reporter Modified
#25161 Fix another memory problem with the webstats bulk import assigned Medium Normal karsten 39 hours ago
#20228 Append all votes with same valid-after time to a single file in `recent/` assigned Medium Normal karsten 3 days ago
#20549 Implement SanitizedBridgeServerDescriptor class that encapsulates the sanitizing logic for bridge server descriptors needs_review High Normal iwakeh 3 days ago
#18798 Analyze descriptor completeness assigned Medium Normal iwakeh 3 weeks ago
#19169 Verify, correct, and extend runtime statistics assigned Medium Normal iwakeh 3 weeks ago
#19282 Avoid truncating descriptors while storing them assigned Medium Normal karsten 3 weeks ago
#19828 Extend descriptorCutOff in CollecTor's RelayDescriptorDownloader by 6 hours assigned Low Normal karsten 3 weeks ago
#20224 Fix `BridgeDescriptorMappingsLimit` config option assigned Low Normal karsten 3 weeks ago
#20325 Perform available space check using the partition recent is located on assigned Low Normal iwakeh 3 weeks ago
#20489 Add various tests for recently fixed issues assigned Medium Normal iwakeh 3 weeks ago
1 2 3 4

Completed Tasks

Results (1 - 10 of 190)

1 2 3 4 5 6 7 8 9 10 11
Ticket Summary Priority Severity Reporter Modified
#25100 Make CollecTor's webstats module use less RAM and wall time High Normal karsten 13 days ago
#24983 Inaccessible semi-recent consensus files Medium Normal robgjansen 3 weeks ago
#22428 Add webstats module High Normal iwakeh 3 weeks ago
#24792 Broken links on new collector page Medium Normal pastly 6 weeks ago
#24621 Exclude lastModifiedMillis in index.json Medium Normal karsten 2 months ago
#19873 Re-evaluate module configuration and logging with regard to operation Medium Normal iwakeh 4 months ago
#23981 Fix NPE and include "bridge-distribution-request" lines in sanitized bridge descriptors High Normal karsten 4 months ago
#21414 Include currently running software versions in responses (, and on the website ( Medium Normal iwakeh 4 months ago
#15846 Publish (hashes of) historic Onionoo details documents Medium Normal karsten 5 months ago
#21139 add javadoc overview page to CollecTor Medium Normal iwakeh 5 months ago
1 2 3 4 5 6 7 8 9 10 11

Last modified 17 months ago Last modified on Oct 8, 2016, 10:35:23 AM