CollecTor Development

This is a living and changing document to accompany the current project for improving CollecTor.

Areas of Work

During the course of this project the following sections will more and more turn into descriptions and documentation. Currently, they are a mixture of very defined improvements as well as sketches and wishes and questions.

Analyze Descriptor Completeness

The analysis will be based on log-files and the downloaded files and address the following questions:

How many descriptors are missing?

How could this loss be avoided?

  • actively monitor resources like available storage space (discussion in ticket #18865).
  • verify and improve runtime statistics in order to have a clearer picture (discussion in ticket #19169).
  • Extra-info descriptors dropped b/c of parsing problems are counted as missing. This should be avoided. ticket #19170.

Next Steps

Continue analysis when sync-process is deployed.

Provide Guide Documents

These guides should be based on the previous work in Onionoo and metrics-lib. In detail

  • Contributor's Guide: create as detailed in #18733 and place the new guide in a central location, which still needs to be identified; this could be a large document in the central place and a small document in CollecTor referencing the main document. (detailed discussion in #18730)
  • Release Process (definded in #18732)
  • Installation Guide for Operators (adapt the existing document), ticket #18734

Implement the Release Process

(according to the guide above)

Design Changes

This section describes improvements that ought to make CollecTor more maintainable, testable, and more efficient.

  1. Run collector with an internal scheduler instead of using external scheduling (e.g. crontab), #19018
  2. Add shutdown hook to provide a controlled way of stopping. Discussion #19016.
  3. Some parts of CollecTor's data processing are provided by bash scripts run via crontab. These should be integrated into the java application.

Improve CollecTor Operation and Setup

Once there is the executable jar including the shutdown hook implementation CollecTor should be started as a linux service, i.e., an appropriate shell script needs to be provided.

Further Sketches of Areas for Improvements

  • store unparsable descriptors rather than discarding them
    • add local storage for descriptors that cannot be parsed for review by the service operator and later reprocessing
  • synchronization between CollecTor instances see #18910 and DescriptorDistribution
  • improve the process of creating tarballs
    • reduce memory consumption throughout
  • consider using an embedded http server in order to reduce operating complexity


Release 1.1.0

Release date: tbd

Ticket Summary Status
#18910 distributing descriptors accross CollecTor instances closed
#19822 set up a CollecTor mirror for synchronization with the main CollecTor closed
#19831 Change default for compressing descriptors to true closed
#20162 reduce configuration parameters in closed
#20179 Require absolute path for `$TARBALLTARGETDIR` in `src/main/resources/` closed
#20380 Expand to a more complete operator's guide closed
#20408 Move index.json* to index/ subdirectory closed

Release 1.2.0

Release date: tbd

Ticket Summary Status
#8799 collector's downloads: avoid httpurl-connection closed
#19755 improve code quality of bridgedescs module closed
#19778 Bridge descriptor sanitizer runs out of memory after 13.5 days closed
#19934 CollecTor should use new metrics-lib json classes closed
#20514 CollecTor' torperf module: replace HttpURLConnection closed
#20515 CollecTor's relaydescs module should avoid httpurlconnection closed
#20516 CollecTor's exitlists module should avoid httpurlconnection closed
#21443 CollecTor does not delete exit lists after three days anymore closed
#22216 Decide whether to sanitize padding-counts lines closed
#22247 Remove deprecation warnings as soon as metrics-lib 1.7.0 is released closed
#22652 Adapt CollecTor to metrics-lib 1.9.0 closed
#22754 Reference checker should only read relay descriptors closed
#22833 Either include or retain "fingerprint" line in bridge network statuses with @type bridge-network-status 1.2 closed

Release 2.0.0

Release date: tbd

Ticket Summary Status
#20350 Replace shell script with Java module assigned

Past Releases

Release 1.0.2, October 7, 2016

Ticket Summary Severity
#19016 add shutdown hook Normal
#19317 Sanitize TCP ports in bridge descriptors Normal
#19894 print message when no module is activated Minor
#19895 make CollecTor stop after RunOnce Normal
#19924 base url should not be in quotes Minor
#20079 Change log thresholds from TRACE to INFO Normal

bugfix Release 1.0.1, August 22, 2016

Prevent out-of-memory error, cf. #19913.

First Release 1.0.0, August 11, 2016

Ticket Summary Severity
#18707 use java 7 Minor
#18719 provide executable jar Normal
#18727 refactor ernie before very first metrics-db release Normal
#18734 Installation Guide for Operators Normal
#18792 tweak build.xml for new tasks and java 7 Normal
#18793 add checkstyle task Normal
#18794 add cobertura task Normal
#18818 Stop using deprecated parts of metrics-lib. Normal
#18865 actively monitor resources like available storage space Normal
#18922 configure logging via properties file Normal
#18931 coding style polishing Normal
#18955 javadoc coverage checkstyle warnings Normal
#19005 make all data directories configurable Normal
#19015 use logging framework other than java.util.logging Normal
#19018 run CollecTor modules without crontab Normal
#19021 improve configuration process Normal
#19170 make parsing more robust (extra-info) Normal
#19373 write test that checks the default Normal
#19424 remove hard coded paths and set default properties to values used on the main CollecTor instance Minor
#19615 CollecTor should confirm to style guide Normal
#19641 investigate and fix MainTest Normal
#19651 add missing scripts to collector.git Normal
#19720 CollecTor should be re-configurable without restart Normal
#19727 correct exitlist Normal
#19771 investigate halt of scheduling for one of many tasks in collector's scheduler Normal
#19776 Make minor improvements to scheduler Normal
#19813 define release process and do release of milestone 1.0.0 Normal
#19829 Update directory authority addresses to recent tor.git Normal
#19830 Check if recent directory exists before checking available space Normal
#19840 Change path defaults to match those of main CollecTor instance Normal

