any KeyValue that is based on the last bandwidth file adds more code complexity, because it would mean to read and parse the last bandwidth file
i created 4 additional header KeyValues to report the reasons why the relays were exclued, that would make the KeyValue recent_measurement_exclusion_count unneeded, since it can be deduced from the previous.
relay_recent_measurement_exclusion_count is 0 all the relays reported, so it'd mean that any relay that gets measurements excluded doesn't make it to be reported.
By the initial headers obtained in the Tor network is already explained (or mostly) that sbws is reporting ~1000 relays less than torflow.
I wanted documentation for this pull request, similar to what nick wanted:
I think this is okay, but PLEASE document all new functions, fields, errors, etc. Right now I can only check that the code looks at least sort of reasonable -- not that it does what you want it to do. Documentation would let me know what you want the code to do, so I can check that it does that.
Here are some specific things to fix:
You made the bandwidth file header count relay exclusions, not result exclusions. That is a good choice: anyone who wants to know result totals can just add all the relay-level results. But the key names, key documentation, and comments need to say what you are counting.
It's not clear to me what each key is meant to be counting. Please update the spec in #29775 (moved), or write comments that define each kind of failure and each key (or both).
You made the bandwidth file header count relay exclusions, not result exclusions. That is a good choice: anyone who wants to know result totals can just add all the relay-level results. But the key names, key documentation, and comments need to say what you are counting.
It's not clear to me what each key is meant to be counting. Please update the spec in #29775 (moved), or write comments that define each kind of failure and each key (or both).
I added fixups extending documentation.
TBH, i was minimizing documentation because i think we should work on #28684 (moved), which would change part of the documentation (and code).
I don't think it is a good idea to start #28684 (moved) when you only have 10 paid days left. It will be harder to find and fix bugs when you don't have as much time.
You made the bandwidth file header count relay exclusions, not result exclusions. That is a good choice: anyone who wants to know result totals can just add all the relay-level results. But the key names, key documentation, and comments need to say what you are counting.
It's not clear to me what each key is meant to be counting. Please update the spec in #29775 (moved), or write comments that define each kind of failure and each key (or both).
I added fixups extending documentation.
TBH, i was minimizing documentation because i think we should work on #28684 (moved), which would change part of the documentation (and code).