wiki:doc/CollecTor/Improvements

https://collector.torproject.org/images/collector-logo.png https://collector.torproject.org/images/collector-wordmark.png

CollecTor Development

This is a living and changing document to accompany the current project for improving CollecTor.

Areas of Work

During the course of this project the following sections will more and more turn into descriptions and documentation. Currently, they are a mixture of very defined improvements as well as sketches and wishes and questions.

Analyze Descriptor Completeness

The analysis will be based on log-files and the downloaded files and address the following questions:

How many descriptors are missing?

How could this loss be avoided?

  • actively monitor resources like available storage space (discussion in ticket #18865).
  • verify and improve runtime statistics in order to have a clearer picture (discussion in ticket #19169).
  • Extra-info descriptors dropped b/c of parsing problems are counted as missing. This should be avoided. ticket #19170.

Next Steps

Continue analysis when sync-process is deployed.

Provide Guide Documents

These guides should be based on the previous work in Onionoo and metrics-lib. In detail

  • Contributor's Guide: create as detailed in #18733 and place the new guide in a central location, which still needs to be identified; this could be a large document in the central place and a small document in CollecTor referencing the main document. (detailed discussion in #18730)
  • Release Process (definded in #18732)
  • Installation Guide for Operators (adapt the existing document), ticket #18734

Implement the Release Process

(according to the guide above)

Design Changes

This section describes improvements that ought to make CollecTor more maintainable, testable, and more efficient.

  1. Run collector with an internal scheduler instead of using external scheduling (e.g. crontab), #19018
  2. Add shutdown hook to provide a controlled way of stopping. Discussion #19016.
  3. Some parts of CollecTor's data processing are provided by bash scripts run via crontab. These should be integrated into the java application.

Improve CollecTor Operation and Setup

Once there is the executable jar including the shutdown hook implementation CollecTor should be started as a linux service, i.e., an appropriate shell script needs to be provided.

Further Sketches of Areas for Improvements

  • store unparsable descriptors rather than discarding them
    • add local storage for descriptors that cannot be parsed for review by the service operator and later reprocessing
  • synchronization between CollecTor instances see #18910 and DescriptorDistribution
  • improve the process of creating tarballs
    • reduce memory consumption throughout
  • consider using an embedded http server in order to reduce operating complexity

Releases

Release 1.1.0

Release date: tbd

Ticket Summary Status
#18910 distributing descriptors accross CollecTor instances closed
#19822 set up a CollecTor mirror for synchronization with the main CollecTor closed
#19831 Change default for compressing descriptors to true closed
#20162 reduce configuration parameters in collector.properties closed
#20179 Require absolute path for `$TARBALLTARGETDIR` in `src/main/resources/create-tarballs.sh` closed
#20380 Expand INSTALL.md to a more complete operator's guide closed
#20408 Move index.json* to index/ subdirectory closed

Release 1.2.0

Release date: tbd

Ticket Summary Status
#8799 collector's downloads: avoid httpurl-connection closed
#19755 improve code quality of bridgedescs module closed
#19778 Bridge descriptor sanitizer runs out of memory after 13.5 days closed
#19934 CollecTor should use new metrics-lib json classes closed
#20514 CollecTor' torperf module: replace HttpURLConnection closed
#20515 CollecTor's relaydescs module should avoid httpurlconnection closed
#20516 CollecTor's exitlists module should avoid httpurlconnection closed
#21443 CollecTor does not delete exit lists after three days anymore closed
#22216 Decide whether to sanitize padding-counts lines closed
#22247 Remove deprecation warnings as soon as metrics-lib 1.7.0 is released closed
#22652 Adapt CollecTor to metrics-lib 1.9.0 closed
#22754 Reference checker should only read relay descriptors closed
#22833 Either include or retain "fingerprint" line in bridge network statuses with @type bridge-network-status 1.2 closed

Release 2.0.0

Release date: tbd

Ticket Summary Status
#20350 Replace create-tarball.sh shell script with Java module new

Past Releases

Release 1.0.2, October 7, 2016

Ticket Summary Severity
#19016 add shutdown hook Normal
#19317 Sanitize TCP ports in bridge descriptors Normal
#19894 print message when no module is activated Minor
#19895 make CollecTor stop after RunOnce Normal
#19924 collector.properties: base url should not be in quotes Minor
#20079 Change log thresholds from TRACE to INFO Normal

bugfix Release 1.0.1, August 22, 2016

Prevent out-of-memory error, cf. #19913.

First Release 1.0.0, August 11, 2016

Ticket Summary Severity
#18707 use java 7 Minor
#18719 provide executable jar Normal
#18727 refactor ernie before very first metrics-db release Normal
#18734 Installation Guide for Operators Normal
#18792 tweak build.xml for new tasks and java 7 Normal
#18793 add checkstyle task Normal
#18794 add cobertura task Normal
#18818 Stop using deprecated parts of metrics-lib. Normal
#18865 actively monitor resources like available storage space Normal
#18922 configure logging via properties file Normal
#18931 coding style polishing Normal
#18955 javadoc coverage checkstyle warnings Normal
#19005 make all data directories configurable Normal
#19015 use logging framework other than java.util.logging Normal
#19018 run CollecTor modules without crontab Normal
#19021 improve configuration process Normal
#19170 make parsing more robust (extra-info) Normal
#19373 write test that checks the default collector.properties Normal
#19424 remove hard coded paths and set default properties to values used on the main CollecTor instance Minor
#19615 CollecTor should confirm to style guide Normal
#19641 investigate and fix MainTest Normal
#19651 add missing scripts to collector.git Normal
#19720 CollecTor should be re-configurable without restart Normal
#19727 correct exitlist Normal
#19771 investigate halt of scheduling for one of many tasks in collector's scheduler Normal
#19776 Make minor improvements to scheduler Normal
#19813 define release process and do release of milestone 1.0.0 Normal
#19829 Update directory authority addresses to recent tor.git Normal
#19830 Check if recent directory exists before checking available space Normal
#19840 Change path defaults to match those of main CollecTor instance Normal

All Tasks in Trac

Active Tasks

Results (1 - 10 of 33)

1 2 3 4
Ticket Summary Status Priority Severity Reporter Modified
#22428 Add webstats module needs_information High Normal iwakeh 2 days ago
#21087 Separate truncated descriptor(s) from next complete descriptor new High Normal atagar 2 days ago
#20550 Implement SanitizedBridgeExtraInfoDescriptor class that encapsulates the sanitizing logic for bridge extra-info descriptors new Medium Normal iwakeh 2 days ago
#20549 Implement SanitizedBridgeServerDescriptor class that encapsulates the sanitizing logic for bridge server descriptors new Medium Normal iwakeh 2 days ago
#20546 Implement CleanUtils class for common file system operations assigned Medium Normal iwakeh 2 days ago
#20518 Make various architecture improvements and modernizations new Medium Normal iwakeh 2 days ago
#20489 Add various tests for recently fixed issues accepted Medium Normal iwakeh 2 days ago
#19169 Verify, correct, and extend runtime statistics accepted Medium Normal iwakeh 2 days ago
#20983 Stop sanitizing contact information from bridge descriptors new Medium Normal cypherpunks 3 days ago
#23421 Use persistence functionality throughout all modules needs_information Medium Normal iwakeh 3 days ago
1 2 3 4

Completed Tasks

Results (1 - 10 of 181)

1 2 3 4 5 6 7 8 9 10 11
Ticket Summary Priority Severity Reporter Modified
#21139 add javadoc overview page to CollecTor Medium Normal iwakeh 24 hours ago
#20080 Update the bridgedescs module Medium Normal karsten 3 days ago
#21759 Add persistence for torperf/onionperf Medium Normal iwakeh 3 days ago
#19621 use java 8 in CollecTor Medium Normal iwakeh 7 days ago
#23215 set annotation from descriptor during sync-runs Medium Normal iwakeh 2 weeks ago
#23286 use index-json classes from metrics-lib Medium Normal iwakeh 3 weeks ago
#23255 Fix a bug while sanitizing bridge network statuses without entries Very High Normal karsten 5 weeks ago
#21760 Onionperf deployment - CollecTor side Medium Normal iwakeh 2 months ago
#8799 collector's downloads: avoid httpurl-connection High Normal karsten 2 months ago
#20516 CollecTor's exitlists module should avoid httpurlconnection Medium Normal iwakeh 2 months ago
1 2 3 4 5 6 7 8 9 10 11

Last modified 12 months ago Last modified on Oct 8, 2016, 10:35:23 AM