Opened 12 months ago

Last modified 6 weeks ago

#28067 new enhancement

Annotations should be documented in dir-spec.txt

Reported by: rl1987 Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Normal Keywords:
Cc: irl, karsten Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Tor codebase supports annotating Tor directory documents with information that is useful to Tor implementation. However, in torspec there is no description about rationale and technical details of this feature. There should be.

Child Tickets

Change History (15)

comment:1 Changed 12 months ago by nickm

This should be documented, but I don't think that dir-spec is the right place: annotations should never travel over the network.

comment:2 Changed 12 months ago by rl1987

Perhaps dir-spec.txt should have an appendix that documents the annotations?

comment:3 Changed 12 months ago by nickm

I'd be fine with that.

comment:4 Changed 11 months ago by irl

Cc: irl added

CollecTor uses annotations too. It would be nice if these were the same annotations as used by tor.

comment:5 Changed 11 months ago by atagar

Cc: karsten added

Stem now has a couple tickets from irl that are pending this (#28503 and #28502). I gave a summary of the situation here.

TL;DR annotations are not presently part of descriptors. Cached server descriptors have a couple (@downloaded-at and @source) and CollecTor adds one of its own (@type). However, none of these have been formalized in a spec and as such are not part of the descriptor per say. Just ancillary data of those services that must be stripped off to produce a valid descriptor.

For me there's a few questions here, this first of which I think is particularly important...

  1. What is the goal here? I'm assuming there's a set of annotations we would like to add?
  1. Do we want these to become a valid, optional part of the descriptor? That is to say, something parsers (tor, stem, metrics-lib) should consider valid and ignore if unneeded?
  1. Finally, what do we want the spec to be? I suspect it'll be pretty simple. I could draft something if you'd like though first I'd particularly like to know the answer to #1.

comment:6 Changed 11 months ago by irl

For Metrics, the goal is:

  • to have metadata about where descriptors came from (@source). This could be:
    • an IP address: we learned this from a directory cache/authority (tor does this)
    • a full URL: we learned this from a directory cache/authority or a remote CollecTor instance (CollecTor would do this)
    • "file": we learned this from files imported locally
  • to have metadata about when we learned the descriptor (@downloaded-at):
    • the timestamp format used by tor looks fine for this
  • to know how to parse back a serialised descriptor (@type):
    • these are defined by CollecTor, not implemented in tor

tor would not need to understand all of these annotations. The cached-* files are written by tor and it's not expected that users would provide other data there.

Other parsers would need to understand these annotations, but they are not part of the descriptor unless the descriptor is on disk. They would never appear on the wire as nickm commented above.

Unknown annotations should be ignored. It may be useful to warn if there are unrecognised annotations, or provide some raw access to them, but it's not required. We would want stem to provide access to them. If we have metadata available we're going to want to store it, and everything should not break when we do.

We can define annotations as (RFC5234):

  AT = "@"
  ANNOTATION-NAME = (ALPHA / DIGIT / "-") [ANNOTATION-NAME]
  ANNOTATION-VALUE = VCHAR [ANNOTATION-VALUE]
  ANNOTATION = AT ANNOTATION-NAME SP ANNOTATION-VALUE CRLF

We might want to use the definition of acceptable Unicode from §2.3 of proposal 285 instead of VCHAR, but I don't see a need for doing this for ANNOTATION-NAME.

I believe that the CollecTor protocol is the only place that we actually talk about descriptors stored on disk, so perhaps it would be good to add this in a section to the end of that.

https://spec.torproject.org/collector-protocol

comment:7 Changed 11 months ago by atagar

Sounds good to me!

They would never appear on the wire

Yup, agreed. What I meant was that I'd like for us to loosen the specification a bit so these can be included in descriptors without making them invalid. For example, for the spec to say something like...

All descriptors MAY be proceeded by any number
of annotations of the form...

  "@" Keyword SP 1*(VCHAR)

For example...

  @type server-descriptor 1.0
  @downloaded-at 2012-03-14 16:31:05
  @source "145.53.65.130"

These annotations MUST not have any behavioral
impact for the descriptor. Annotations consist
of ancillary cross-service data and MUST be
ignoreable.

With this the proper behavior of all parsers (tor, stem, metrics-lib, etc) would be to read, ignore, or strip these annotations as they see fit. The only thing that is disallowed is to reject annotations as making the descriptor invalid (ie, what all our implementations do right now since they *aren't* a valid part of the descriptor).

Last edited 11 months ago by atagar (previous) (diff)

comment:8 Changed 11 months ago by irl

We should also make clear that they are not to be included when calculating digests or when generating signatures.

They're not really part of the descriptor. They're more an encapsulation header. Think IP packets going through a GRE tunnel but the tunnel is time and they travel by being stored on a disk.

comment:9 Changed 11 months ago by karsten

Interesting, I was almost certain that this was all long specified. But you're right, that's not the case. Huh.

Note that metrics-lib already handles annotations as suggested here, that is, skip them for digest computation, provide a list of them to the application, etc. Of course, if there was a specification of other annotations than @type, it could attempt to parse those other annotations, too.

comment:10 Changed 11 months ago by irl

Initially I think it is good to have a spec for the annotations that already exist, which are type, downloaded-at and source. For other annotations later we could update the spec.

It is good to know that metrics-lib is already coping with unknown annotations.

karsten: Do you think that CollecTor protocol is the right place for this spec? Or do you think that it is better to put it in dir-spec as an appendix? We'd probably be the main users of these annotations.

comment:11 in reply to:  10 Changed 11 months ago by karsten

Replying to irl:

karsten: Do you think that CollecTor protocol is the right place for this spec? Or do you think that it is better to put it in dir-spec as an appendix? We'd probably be the main users of these annotations.

I think that the CollecTor protocol, in its current form, should go away. See also #25625.

I like the idea of making this an appendix in dir-spec.

comment:12 Changed 11 months ago by irl

Aaah, I had forgotten about #25625.

Ok then, I agree that this should be an appendix in dir-spec.

comment:13 Changed 11 months ago by atagar

Minor ask if/when this is done: I'd appreciate if we add a @type for detached signatures and individual router status entries. The lack of a @type annotation for these makes them a bit clunkier for me to deal with (since stem users specify how they'd like to parse documents by providing a type string).

comment:14 Changed 11 months ago by atagar

Actually, on reflection no reason to wait. Filed a separate issue requesting this: #28615

comment:15 in reply to:  7 Changed 6 weeks ago by arma

Replying to atagar:

All descriptors MAY be proceeded by any number
of annotations of the form...

When we do specify it, let's also try to be clear on which annotations are allowed to be sent across the network. Like, if you try to fetch a descriptor from a directory mirror, and it gives you "@apple red" at the top of the descriptor it returns, do you (Tor) dutifully store that to your cached-descriptors file even if you don't know what it means (hopefully no)? Or do you silently discard the annotation because hey, annotations are allowed, but we didn't expect one there (plausible plan)? Or we discard the whole descriptor because somebody tried to include an annotation when they shouldn't have (maybe, if we want to be strict)?

Note: See TracTickets for help on using tickets.