Add a module to CollecTor to archive the statistics produced in #21315 (moved). The actual statistics and format should be discussed in that ticket. The discussion in #29315 (moved) may help to inform that.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I started on implementing a handler to return metrics from snowflake-broker@bamsoftware.com/metrics in #31376 (moved). I have more questions about what exactly you need here. Is it only the metrics from the last full 24 hour measurement period?
Where would I find the spec? I found this comment which is fine for me to start write some metrics-lib code. But we'll have to link to something more permanent once we start collecting these statistics; ideally tor-spec.git. Thanks!
Where would I find the spec? I found this comment which is fine for me to start write some metrics-lib code. But we'll have to link to something more permanent once we start collecting these statistics; ideally tor-spec.git. Thanks!
I'll work on adding it to tor-spec.git. Right now the source code is the ground truth on the spec.
One quick comment on the spec after writing the metrics-lib part: The "snowflake-stats-end" line should have multiplicity "[At start, exactly once.]" rather than "[At most once.]". Other than that, everything seemed reasonable from a parsing perspective. (I didn't start with the CollecTor part, yet.)
To answer your earlier question above: having just the last 24 hours of data might be problematic if the metrics host goes down for, say, a weekend. Ideally, there would be at least a week of statistics available. Or maybe just accumulate new statistics forever, given the tiny amount of statistics per day.
One quick comment on the spec after writing the metrics-lib part: The "snowflake-stats-end" line should have multiplicity "[At start, exactly once.]" rather than "[At most once.]". Other than that, everything seemed reasonable from a parsing perspective. (I didn't start with the CollecTor part, yet.)
Thanks! Filed a PR for torspec in #31407 (moved) with this change.
To answer your earlier question above: having just the last 24 hours of data might be problematic if the metrics host goes down for, say, a weekend. Ideally, there would be at least a week of statistics available. Or maybe just accumulate new statistics forever, given the tiny amount of statistics per day.
Okay great, I have a implementation of this in #31376 (moved) that will respond with all logged metrics data.
Couple ideas after implementing the CollecTor and metrics-web parts:
As indicated before, I'm using snowflake-stats-end as descriptor type identifier, which means that future data format versions will have to keep that line as their first line. A better choice would have been to use something like snowflake-stats $version or similar. (If it's any relief, we're forced to use [None..None](../compare/None...None){10} as descriptor type identifier for bandwidth files, so there would have been plenty of room to do worse.)
The current format only supports a single snowflake broker. Maybe this is acceptable for the snowflake design. But just in case that you'll one day want to add a second broker, you'll have to include some sort of broker identifier in the format.
The current format is not signed, which is somewhat related to not having a broker identifier in the format.
As a consequence of the above, CollecTor needs to make a decision whether it wants to archive a newly downloaded snowflake-stats snippet, if it already has another snippet with the same timestamp and different contents. Possible strategies for this specific case are to a) never overwrite, b) always overwrite, c) keep all versions by including a digest in the file name, d) maybe something else. I implemented a) for now.
I think we can start with what we have, without changing anything of the above. Of course, if you want to change something with regard to future maintenance effort, now's the time!
As indicated before, I'm using snowflake-stats-end as descriptor type identifier, which means that future data format versions will have to keep that line as their first line. A better choice would have been to use something like snowflake-stats $version or similar. (If it's any relief, we're forced to use [None..None](../compare/None...None){10} as descriptor type identifier for bandwidth files, so there would have been plenty of room to do worse.)
Oh this is a good point. I can make this change pretty easily now.
The current format only supports a single snowflake broker. Maybe this is acceptable for the snowflake design. But just in case that you'll one day want to add a second broker, you'll have to include some sort of broker identifier in the format.
This will definitely be the case for quite a while, I think a spec change (and version bump of the spec) will be a good way to handle this. How difficult will it be to have data with different spec versions on your end?
The current format is not signed, which is somewhat related to not having a broker identifier in the format.
As a consequence of the above, CollecTor needs to make a decision whether it wants to archive a newly downloaded snowflake-stats snippet, if it already has another snippet with the same timestamp and different contents. Possible strategies for this specific case are to a) never overwrite, b) always overwrite, c) keep all versions by including a digest in the file name, d) maybe something else. I implemented a) for now.
I didn't think about signing, that would take while to implement I think. I agree that the best decision here is (a) for now.
I think we can start with what we have, without changing anything of the above. Of course, if you want to change something with regard to future maintenance effort, now's the time!
The SnowflakeStatsUrl doesn't look right at all. There's an ampersat in it?
You mean the default URL in collector.properties? That's known, I need to put in the actual URL once that's available (#31376 (moved)).
'snowflakes' contains a single compressed tarball with snowflake statistics:
It is nice that we can have archives that after a while become read-only. Why are we not splitting this by month?
The reason is that these statistics are tiny and that processing these files would be simpler with a single tarball. But changing this is trivial, if you think that doing so has more advantages.
I would like to have things becoming immutable after we think they will not receive more data, as it makes it easier to manage archives in the longer term and do things like monitor for bitrot.
As indicated before, I'm using snowflake-stats-end as descriptor type identifier, which means that future data format versions will have to keep that line as their first line. A better choice would have been to use something like snowflake-stats $version or similar. (If it's any relief, we're forced to use [None..None](../compare/None...None){10} as descriptor type identifier for bandwidth files, so there would have been plenty of room to do worse.)
Oh this is a good point. I can make this change pretty easily now.
Okay, in that case please make this change and let me know where to find the new spec and a new sample.
The current format only supports a single snowflake broker. Maybe this is acceptable for the snowflake design. But just in case that you'll one day want to add a second broker, you'll have to include some sort of broker identifier in the format.
This will definitely be the case for quite a while, I think a spec change (and version bump of the spec) will be a good way to handle this. How difficult will it be to have data with different spec versions on your end?
Not difficult at all. Okay, let's keep this unchanged then.
The current format is not signed, which is somewhat related to not having a broker identifier in the format.
As a consequence of the above, CollecTor needs to make a decision whether it wants to archive a newly downloaded snowflake-stats snippet, if it already has another snippet with the same timestamp and different contents. Possible strategies for this specific case are to a) never overwrite, b) always overwrite, c) keep all versions by including a digest in the file name, d) maybe something else. I implemented a) for now.
I didn't think about signing, that would take while to implement I think. I agree that the best decision here is (a) for now.
I think we can start with what we have, without changing anything of the above. Of course, if you want to change something with regard to future maintenance effort, now's the time!