The raw bwauth votes (sample: https://bwauth.ritter.vg/bwauth/bwscan.V3BandwidthsFile) contain information such as last measured time, circuit failures and (eventually) scanner information. This can be used for debugging purposes.
Blocked by #21377 (moved), possible next steps in [#comment:14 comment 14].
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
This is now fixed in sbws, though is still not being use by any DirAuth.
It is not planned to add this toTorflow.
We can correlate the votes with the Torflow bandwidth measurement files by the timestamp, right?, so i wonder if it would still be possible to archive the files produced by all the bwauths that are running Torflow.
Another more exotic idea (probably needs other ticket if it makes sense) is to collect the data from the testnet, since we have DirAuths running sbws there. Would this require lot of extra work?
This is now fixed in sbws, though is still not being use by any DirAuth.
It is not planned to add this toTorflow.
We can correlate the votes with the Torflow bandwidth measurement files by the timestamp, right?, so i wonder if it would still be possible to archive the files produced by all the bwauths that are running Torflow.
Another more exotic idea (probably needs other ticket if it makes sense) is to collect the data from the testnet, since we have DirAuths running sbws there. Would this require lot of extra work?
This ticket is about archiving the entire v3bw file from each directory authority.
It's not enough to archive the files from the testnet.
We could implement this ticket by making the bandwidth file part of the directory protocol. We have a spec for the bandwidth file format, so all we need to do is specify the URL for the file in torspec, and implement it in the code.
Trac: Summary: Archive bwauth votes to Archive bwauth bandwidth files
So do you mean to add to dir-spec.txt something like?:
"bandwidth-file-url" [At most once] The Bandwidth file URL used to obtain the measured bandwidth. These files SHOULD be available at: http://<hostname>/tor/bwfiles/<bwfile>
Since Torflow and sbws use different names for the bandwidth files, i guess it's fine to don't specify the name that file needs to have, but just have them available in a known directory/path?.
Should all files be available or just the one used for the last vote or for some period of time?
Edit:
We could also store a temporary copy of the exact file we used, and serve it from http:///tor/status-vote/current/bandwidth.z
But that is more complicated, so let's get the next URL working first.
Using the fixed URL http:///tor/status-vote/next/bandwidth.z sounds like it would be very easy to add this to CollecTor.
We have discussed in the Metrics team extending dir-spec.txt to allow to fetch "recent" files as well as just next/current. In the case that there is a wide CollecTor outage, and we miss a file, it would be good to have those files cached (on a best-effort basis, not necessarily persisted to disk) and available via some URL.
I don't know if karsten already had some ideas about what these URLs would look like, but we should perhaps consider this before implementing changes to dir-spec.txt.
Using the fixed URL http:///tor/status-vote/next/bandwidth.z sounds like it would be very easy to add this to CollecTor.
Thanks for the feedback!
We have discussed in the Metrics team extending dir-spec.txt to allow to fetch "recent" files as well as just next/current. In the case that there is a wide CollecTor outage, and we miss a file, it would be good to have those files cached (on a best-effort basis, not necessarily persisted to disk) and available via some URL.
How is this any different to losing descriptors or consensuses?
(Please answer this question on a separate ticket.)
I don't know if karsten already had some ideas about what these URLs would look like, but we should perhaps consider this before implementing changes to dir-spec.txt.
Please open a separate ticket for this feature. It's potentially a large feature. And it's not essential for the initial release of this feature.
Using the fixed URL http:///tor/status-vote/next/bandwidth.z sounds like it would be very easy to add this to CollecTor.
Thanks for the feedback!
We have discussed in the Metrics team extending dir-spec.txt to allow to fetch "recent" files as well as just next/current. In the case that there is a wide CollecTor outage, and we miss a file, it would be good to have those files cached (on a best-effort basis, not necessarily persisted to disk) and available via some URL.
How is this any different to losing descriptors or consensuses?
(Please answer this question on a separate ticket.)
I don't know if karsten already had some ideas about what these URLs would look like, but we should perhaps consider this before implementing changes to dir-spec.txt.
Please open a separate ticket for this feature. It's potentially a large feature. And it's not essential for the initial release of this feature.
If you open a separate ticket for historical directory documents, please make #26698 (moved) a child of that ticket. We'll need bandwidth file hashes to work out the exact file used in each vote.
I just went through the long discussion above and tried to identify next steps. irl's list of needed changes looks pretty good. I'll add some thoughts to these steps below that we need to discuss when implementing this.
Teach RelayDescriptorDownloader to download the new URL (in the downloadDescriptors function)
We can either attempt to fetch this file from each authority every time, or we can have a config option which authorities should have them. In the future, we can switch to fetching only those files that are referenced from votes, unless for some reason we want to have non-referenced files, too.
The relaydescs module runs twice per hour, so it's going to download the file twice every hour. Again, if we only fetch referenced files, we wouldn't download the same file more than once. But it sounds like the initial version will be rather simple in this regard. Which is fine.
I assume there are no plans that authorities serve bandwidth files of other authorities? That's different for votes which are cached by other authorities. Should be fine, but something to consider for the future.
While we're waiting for #21377 (moved), can we have a sample file to start writing some parsing code?
Teach ArchiveWriter where it should put the files in CollecTor's hierachy
Let's discuss what should go into the file name. Timestamp, fingerprint, and digest? Maybe something similar to the vote file name format (with some parts shortened): 2018-11-05-09-00-00-vote-EFCBE720[...]-0D97EDB6[...]?
As part of this step, we might have to teach metrics-lib to recognize the new descriptor type. I believe that CollecTor will store it anyway, but it's going to complain loudly. Just in case it acts up, we can teach metrics-lib to just recognize the descriptor type without providing getters for descriptor contents.
Trac: Description: The raw bwauth votes (sample: https://bwauth.ritter.vg/bwauth/bwscan.V3BandwidthsFile) contain information such as last measured time, circuit failures and (eventually) scanner information. This can be used for debugging purposes.
The raw bwauth votes (sample: https://bwauth.ritter.vg/bwauth/bwscan.V3BandwidthsFile) contain information such as last measured time, circuit failures and (eventually) scanner information. This can be used for debugging purposes.
Blocked by #21377 (moved), possible next steps in [#comment:14 comment 14]. Cc: teor to teor, metrics-team Priority: Low to Medium
I just went through the long discussion above and tried to identify next steps. irl's list of needed changes looks pretty good. I'll add some thoughts to these steps below that we need to discuss when implementing this.
Teach RelayDescriptorDownloader to download the new URL (in the downloadDescriptors function)
We can either attempt to fetch this file from each authority every time, or we can have a config option which authorities should have them.
I suggest "each authority every time", because a hard-coded config will miss some of the bandwidth files on new bandwidth authorities.
In the future, we can switch to fetching only those files that are referenced from votes, unless for some reason we want to have non-referenced files, too.
Tor 0.3.5? and later add bandwidth file headers to each vote, and we may add a bandwidth file hash in future. Once all authorities upgrade, you can fetch the bandwidth file if the vote contains headers.
The relaydescs module runs twice per hour, so it's going to download the file twice every hour. Again, if we only fetch referenced files, we wouldn't download the same file more than once.
I am not sure if we plan on implementing "referenced files" in Tor. Can you explain what you mean?
But it sounds like the initial version will be rather simple in this regard. Which is fine.
I think Juga has written code for a more complex version. But we will focus on getting the simple version working first.
I assume there are no plans that authorities serve bandwidth files of other authorities? That's different for votes which are cached by other authorities. Should be fine, but something to consider for the future.
Votes are posted, fetched, and cached by authorities so that each authority can create a consensus.
There's no equivalent for bandwidth files, so we probably won't implement bandwidth file caching.
But if you tell us you really need it, we could work something out.
We're working on version 1.2.0 of the format for sbws 1.0 in #28085 (moved). When sbws 1.0 is ready, we will update the spec with sample data from the latest sbws.
Teach ArchiveWriter where it should put the files in CollecTor's hierachy
Let's discuss what should go into the file name. Timestamp, fingerprint, and digest? Maybe something similar to the vote file name format (with some parts shortened): 2018-11-05-09-00-00-vote-EFCBE720[...]-0D97EDB6[...]?
As part of this step, we might have to teach metrics-lib to recognize the new descriptor type. I believe that CollecTor will store it anyway, but it's going to complain loudly. Just in case it acts up, we can teach metrics-lib to just recognize the descriptor type without providing getters for descriptor contents.
In the future, we can switch to fetching only those files that are referenced from votes, unless for some reason we want to have non-referenced files, too.
Tor 0.3.5? and later add bandwidth file headers to each vote, and we may add a bandwidth file hash in future. Once all authorities upgrade, you can fetch the bandwidth file if the vote contains headers.
I guess I was thinking of 0.3.5 then. I'm not aware of any other plans.
The relaydescs module runs twice per hour, so it's going to download the file twice every hour. Again, if we only fetch referenced files, we wouldn't download the same file more than once.
I am not sure if we plan on implementing "referenced files" in Tor. Can you explain what you mean?
Same as above: bandwidth files referenced from votes.
I assume there are no plans that authorities serve bandwidth files of other authorities? That's different for votes which are cached by other authorities. Should be fine, but something to consider for the future.
Votes are posted, fetched, and cached by authorities so that each authority can create a consensus.
There's no equivalent for bandwidth files, so we probably won't implement bandwidth file caching.
But if you tell us you really need it, we could work something out.
We're working on version 1.2.0 of the format for sbws 1.0 in #28085 (moved). When sbws 1.0 is ready, we will update the spec with sample data from the latest sbws.
Thanks! I didn't look just yet, but this should be a good start to write some code.
In the future, we can switch to fetching only those files that are referenced from votes, unless for some reason we want to have non-referenced files, too.
Tor 0.3.5? and later add bandwidth file headers to each vote, and we may add a bandwidth file hash in future. Once all authorities upgrade, you can fetch the bandwidth file if the vote contains headers.
I guess I was thinking of 0.3.5 then. I'm not aware of any other plans.
The ticket for putting the bandwidth file hash in the votes is #26698 (moved).
Will you use a hexadecimal hash when you archive the bandwidth files?
If so, maybe we should switch to a hexadecimal hash in the vote.
(I said base64 when I did the initial design, but consistency is more important than saving a few bytes.)
Will you use a hexadecimal hash when you archive the bandwidth files?
If so, maybe we should switch to a hexadecimal hash in the vote.
(I said base64 when I did the initial design, but consistency is more important than saving a few bytes.)
Either works for us. We're converting base64 to hex and back in other places of the code, and it's fine to do that in this case, too.
Will you use a hexadecimal hash when you archive the bandwidth files?
If so, maybe we should switch to a hexadecimal hash in the vote.
(I said base64 when I did the initial design, but consistency is more important than saving a few bytes.)
Either works for us. We're converting base64 to hex and back in other places of the code, and it's fine to do that in this case, too.
I think we might use hex for the sake of the humans looking for files on collector.