wiki:doc/CollecTor/Improvements

https://collector.torproject.org/images/collector-logo.png https://collector.torproject.org/images/collector-wordmark.png

CollecTor Development

This is a living and changing document to accompany the current project for improving CollecTor.

Areas of Work

During the course of this project the following sections will more and more turn into descriptions and documentation. Currently, they are a mixture of very defined improvements as well as sketches and wishes and questions.

Analyze Descriptor Completeness

The analysis will be based on log-files and the downloaded files and address the following questions:

How many descriptors are missing?

How could this loss be avoided?

  • actively monitor resources like available storage space (discussion in ticket #18865).
  • verify and improve runtime statistics in order to have a clearer picture (discussion in ticket #19169).
  • Extra-info descriptors dropped b/c of parsing problems are counted as missing. This should be avoided. ticket #19170.

Next Steps

Continue analysis when sync-process is deployed.

Provide Guide Documents

These guides should be based on the previous work in Onionoo and metrics-lib. In detail

  • Contributor's Guide: create as detailed in #18733 and place the new guide in a central location, which still needs to be identified; this could be a large document in the central place and a small document in CollecTor referencing the main document. (detailed discussion in #18730)
  • Release Process (definded in #18732)
  • Installation Guide for Operators (adapt the existing document), ticket #18734

Implement the Release Process

(according to the guide above)

Design Changes

This section describes improvements that ought to make CollecTor more maintainable, testable, and more efficient.

  1. Run collector with an internal scheduler instead of using external scheduling (e.g. crontab), #19018
  2. Add shutdown hook to provide a controlled way of stopping. Discussion #19016.
  3. Some parts of CollecTor's data processing are provided by bash scripts run via crontab. These should be integrated into the java application.

Improve CollecTor Operation and Setup

Once there is the executable jar including the shutdown hook implementation CollecTor should be started as a linux service, i.e., an appropriate shell script needs to be provided.

Further Sketches of Areas for Improvements

  • store unparsable descriptors rather than discarding them
    • add local storage for descriptors that cannot be parsed for review by the service operator and later reprocessing
  • synchronization between CollecTor instances see #18910 and DescriptorDistribution
  • improve the process of creating tarballs
    • reduce memory consumption throughout
  • consider using an embedded http server in order to reduce operating complexity

Releases

Release 1.1.0

Release date: tbd

Ticket Summary Status
#18910 distributing descriptors accross CollecTor instances closed
#19822 set up a CollecTor mirror for synchronization with the main CollecTor closed
#19831 Change default for compressing descriptors to true closed
#20162 reduce configuration parameters in collector.properties closed
#20179 Require absolute path for `$TARBALLTARGETDIR` in `src/main/resources/create-tarballs.sh` closed
#20380 Expand INSTALL.md to a more complete operator's guide closed
#20408 Move index.json* to index/ subdirectory closed

Release 1.2.0

Release date: tbd

Ticket Summary Status
#8799 collector's downloads: avoid httpurl-connection closed
#19755 improve code quality of bridgedescs module closed
#19778 Bridge descriptor sanitizer runs out of memory after 13.5 days closed
#19934 CollecTor should use new metrics-lib json classes closed
#20514 CollecTor' torperf module: replace HttpURLConnection closed
#20515 CollecTor's relaydescs module should avoid httpurlconnection closed
#20516 CollecTor's exitlists module should avoid httpurlconnection closed
#21443 CollecTor does not delete exit lists after three days anymore closed
#22216 Decide whether to sanitize padding-counts lines closed
#22247 Remove deprecation warnings as soon as metrics-lib 1.7.0 is released closed
#22652 Adapt CollecTor to metrics-lib 1.9.0 closed
#22754 Reference checker should only read relay descriptors closed
#22833 Either include or retain "fingerprint" line in bridge network statuses with @type bridge-network-status 1.2 closed

Release 2.0.0

Release date: tbd

Ticket Summary Status
#20350 Replace create-tarball.sh shell script with Java module assigned

Past Releases

Release 1.0.2, October 7, 2016

Ticket Summary Severity
#19016 add shutdown hook Normal
#19317 Sanitize TCP ports in bridge descriptors Normal
#19894 print message when no module is activated Minor
#19895 make CollecTor stop after RunOnce Normal
#19924 collector.properties: base url should not be in quotes Minor
#20079 Change log thresholds from TRACE to INFO Normal

bugfix Release 1.0.1, August 22, 2016

Prevent out-of-memory error, cf. #19913.

First Release 1.0.0, August 11, 2016

Ticket Summary Severity
#18707 use java 7 Minor
#18719 provide executable jar Normal
#18727 refactor ernie before very first metrics-db release Normal
#18734 Installation Guide for Operators Normal
#18792 tweak build.xml for new tasks and java 7 Normal
#18793 add checkstyle task Normal
#18794 add cobertura task Normal
#18818 Stop using deprecated parts of metrics-lib. Normal
#18865 actively monitor resources like available storage space Normal
#18922 configure logging via properties file Normal
#18931 coding style polishing Normal
#18955 javadoc coverage checkstyle warnings Normal
#19005 make all data directories configurable Normal
#19015 use logging framework other than java.util.logging Normal
#19018 run CollecTor modules without crontab Normal
#19021 improve configuration process Normal
#19170 make parsing more robust (extra-info) Normal
#19373 write test that checks the default collector.properties Normal
#19424 remove hard coded paths and set default properties to values used on the main CollecTor instance Minor
#19615 CollecTor should confirm to style guide Normal
#19641 investigate and fix MainTest Normal
#19651 add missing scripts to collector.git Normal
#19720 CollecTor should be re-configurable without restart Normal
#19727 correct exitlist Normal
#19771 investigate halt of scheduling for one of many tasks in collector's scheduler Normal
#19776 Make minor improvements to scheduler Normal
#19813 define release process and do release of milestone 1.0.0 Normal
#19829 Update directory authority addresses to recent tor.git Normal
#19830 Check if recent directory exists before checking available space Normal
#19840 Change path defaults to match those of main CollecTor instance Normal

All Tasks in Trac

Active Tasks

Results (1 - 10 of 40)

1 2 3 4
Ticket Summary Status Priority Severity Reporter Modified
#21378 Archive bwauth bandwidth files new Medium Normal tom 3 days ago
#28324 Extend CollecTor to fetch recent, non-current consensuses and votes new Medium Normal karsten 4 days ago
#25644 Write white paper about CollecTor's data processing assigned Medium Normal iwakeh 8 days ago
#28320 Rewrite CollecTor relaydescs module using Stem/txtorcon new Medium Normal karsten 9 days ago
#28003 Consider refactoring various code that makes descriptors persistent new Medium Normal karsten 5 weeks ago
#27980 Missing server descriptors in recent/ but not in out/ needs_information High Normal karsten 5 weeks ago
#27716 Out of memory when loading in multiple years of relay descriptors new Medium Normal irl 2 months ago
#27055 Find out why syncing descriptors from collector2.tp.o did not time out new Medium Normal karsten 3 months ago
#2966 Include bridge country codes in sanitized bridge descriptors assigned Low Normal karsten 5 months ago
#21515 Add auxiliary data on Tor relays and bridges to CollecTor new Medium Normal karsten 5 months ago
1 2 3 4

Completed Tasks

Results (1 - 10 of 209)

1 2 3 4 5 6 7 8 9 10 11
Ticket Summary Priority Severity Reporter Modified
#28001 Release CollecTor 1.8.0 Medium Normal karsten 4 weeks ago
#27973 Update DirectoryAuthoritiesAddresses in default properties and on both instances Medium Normal karsten 5 weeks ago
#27390 Properly clean up sanitized web server logs in the recent/ directory Medium Normal karsten 5 weeks ago
#27076 Reconfigure collector2.tp.o to do less Medium Normal karsten 3 months ago
#26790 Release CollecTor 1.7.0 High Normal karsten 4 months ago
#24291 Rename CollecTor packages Medium Normal karsten 4 months ago
#26193 Tarballs are not compressed in a run following an aborted run Medium Normal karsten 4 months ago
#26786 CollecTor does not know about Serge Immediate Normal irl 4 months ago
#20224 Fix `BridgeDescriptorMappingsLimit` config option Low Normal karsten 4 months ago
#25624 Index 'contrib' directory Medium Normal iwakeh 6 months ago
1 2 3 4 5 6 7 8 9 10 11

Last modified 2 years ago Last modified on Oct 8, 2016, 10:35:23 AM