Tor codebase supports annotating Tor directory documents with information that is useful to Tor implementation. However, in torspec there is no description about rationale and technical details of this feature. There should be.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
Stem now has a couple tickets from irl that are pending this (#28503 (closed) and #28502 (closed)). I gave a summary of the situation here.
TL;DR annotations are not presently part of descriptors. Cached server descriptors have a couple (@downloaded-at and @source) and CollecTor adds one of its own (@type). However, none of these have been formalized in a spec and as such are not part of the descriptor per say. Just ancillary data of those services that must be stripped off to produce a valid descriptor.
For me there's a few questions here, this first of which I think is particularly important...
What is the goal here? I'm assuming there's a set of annotations we would like to add?
Do we want these to become a valid, optional part of the descriptor? That is to say, something parsers (tor, stem, metrics-lib) should consider valid and ignore if unneeded?
Finally, what do we want the spec to be? I suspect it'll be pretty simple. I could draft something if you'd like though first I'd particularly like to know the answer to #1.
to have metadata about where descriptors came from (@source). This could be:
an IP address: we learned this from a directory cache/authority (tor does this)
a full URL: we learned this from a directory cache/authority or a remote CollecTor instance (CollecTor would do this)
"file": we learned this from files imported locally
to have metadata about when we learned the descriptor (@downloaded-at):
the timestamp format used by tor looks fine for this
to know how to parse back a serialised descriptor (@type):
these are defined by CollecTor, not implemented in tor
tor would not need to understand all of these annotations. The cached-* files are written by tor and it's not expected that users would provide other data there.
Other parsers would need to understand these annotations, but they are not part of the descriptor unless the descriptor is on disk. They would never appear on the wire as nickm commented above.
Unknown annotations should be ignored. It may be useful to warn if there are unrecognised annotations, or provide some raw access to them, but it's not required. We would want stem to provide access to them. If we have metadata available we're going to want to store it, and everything should not break when we do.
We might want to use the definition of acceptable Unicode from [of proposal 285] instead of VCHAR, but I don't see a need for doing this for ANNOTATION-NAME.
I believe that the CollecTor protocol is the only place that we actually talk about descriptors stored on disk, so perhaps it would be good to add this in a section to the end of that.
Yup, agreed. What I meant was that I'd like for us to loosen the specification a bit so these can be included in descriptors without making them invalid. For example, for the spec to say something like...
All descriptors MAY be proceeded by any numberof annotations of the form... "@" Keyword SP 1*(VCHAR)For example... @type server-descriptor 1.0 @downloaded-at 2012-03-14 16:31:05 @source "145.53.65.130"These annotations MUST not have any behavioralimpact for the descriptor. Annotations consistof ancillary cross-service data and MUST beignoreable.
With this the proper behavior of all parsers (tor, stem, metrics-lib, etc) would be to read, ignore, or strip these annotations as they see fit. The only thing that is disallowed is to reject annotations as making the descriptor invalid (ie, what all our implementations do right now since they aren't a valid part of the descriptor).
We should also make clear that they are not to be included when calculating digests or when generating signatures.
They're not really part of the descriptor. They're more an encapsulation header. Think IP packets going through a GRE tunnel but the tunnel is time and they travel by being stored on a disk.
Interesting, I was almost certain that this was all long specified. But you're right, that's not the case. Huh.
Note that metrics-lib already handles annotations as suggested here, that is, skip them for digest computation, provide a list of them to the application, etc. Of course, if there was a specification of other annotations than @type, it could attempt to parse those other annotations, too.
Initially I think it is good to have a spec for the annotations that already exist, which are type, downloaded-at and source. For other annotations later we could update the spec.
It is good to know that metrics-lib is already coping with unknown annotations.
karsten: Do you think that CollecTor protocol is the right place for this spec? Or do you think that it is better to put it in dir-spec as an appendix? We'd probably be the main users of these annotations.
karsten: Do you think that CollecTor protocol is the right place for this spec? Or do you think that it is better to put it in dir-spec as an appendix? We'd probably be the main users of these annotations.
I think that the CollecTor protocol, in its current form, should go away. See also #25625 (moved).
I like the idea of making this an appendix in dir-spec.
Minor ask if/when this is done: I'd appreciate if we add a @type for detached signatures and individual router status entries. The lack of a @type annotation for these makes them a bit clunkier for me to deal with (since stem users specify how they'd like to parse documents by providing a type string).
{{{
All descriptors MAY be proceeded by any number
of annotations of the form...
}}}
When we do specify it, let's also try to be clear on which annotations are allowed to be sent across the network. Like, if you try to fetch a descriptor from a directory mirror, and it gives you "@apple red" at the top of the descriptor it returns, do you (Tor) dutifully store that to your cached-descriptors file even if you don't know what it means (hopefully no)? Or do you silently discard the annotation because hey, annotations are allowed, but we didn't expect one there (plausible plan)? Or we discard the whole descriptor because somebody tried to include an annotation when they shouldn't have (maybe, if we want to be strict)?