This should become a Tor Tech Report detailing Tor Metrics data collection, aggregation, and presentation as well as an overview of why and how data is collected compared to available other frameworks.
This is activity 1.1 of Sponsor 13 and covers the data pipeline up to activity 2 (see ticket #24217 (moved)).
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
In 2013 JSR 352 Batch Applications for the Java Platform was finalized. As the main implementations are Java EE 7 and Spring Batch these two should be covered by this activity. Other suitable frameworks can be found in streaming and data processing fields. These focus usually on real-time processing, which is not CollecTor's concern, but also provide solutions for the main batch processing tasks: retrieve from a source, process, and write the data. Thus, we should also take a look at Apache's Flink streaming framework that explicitly features its own Batch DataSet API. Flink is also well integrated into Apache's Java tooling/framework environment.
Thus, the list of batch frameworks we evaluate is Java EE and Spring (as JSR 352 implementations) and Flink.
We changed the plan a bit by evaluating a rewrite of CollecTor's relaydescs module in Python (#28320 (moved)). But the remaining report parts stayed the same. Keeping this ticket for writing the report after a working prototype in Python exists.
Trac: Sponsor: N/Ato Sponsor13 Summary: Write white paper about CollecTor's data processing (Sponsor13, 1) to Write white paper about CollecTor's data processing