Tweak Stem's documentation

added component::archived/stem descriptors owner::atagar priority::medium resolution::fixed status::closed type::defect labels

I agree with amj703, and I have a few more tweaks. (Repeating amj703's issue here, so that it doesn't get lost):

The API documentation for stem.descriptor.networkstatus.NetworkStatusDocumentV3 states that the "params" variable has type "list", but it has type "dict".
BridgeNetworkStatusDocuments should contain RouterStatusEntryV2s, not V3s. Maybe this isn't exactly a documentation issue though.
The Descriptor documentation should state that @type bridge-extra-info 1.1 is supported, not just 1.0. There's a difference between supporting 1.0 and ignoring the 1.1 parts and supporting 1.1.
All fingerprints and digests, which are in most cases 160 bit value strings, should state exactly whether they're base64 or lower-case hex or upper-case hex encoded.
RouterStatusEntries should state exactly what fields the referenced "thin" document has and what fields it's missing. For example, none of the entries can possibly contain anything from the status footer, e.g., bandwidth weights. Maybe the example on the Network Status Documents page could list "print router.document.valid_after" as valid example and "print router.document.bandwidth_weights" as invalid example.
The addresses_v6 field in RouterStatusEntryV3 doesn't support port ranges, but only port lists.

Trac:
Summary: stem documentation error for NetworkStatusDocumentV3 to Tweak Stem's documentation

The API documentation for stem.descriptor.networkstatus.NetworkStatusDocumentV3 states that the "params" variable has type "list", but it has type "dict".

Good catch, amj703. Fixed.

https://gitweb.torproject.org/stem.git/commitdiff/0b1d47d92d447614095cf43fdb35214650db8d82

BridgeNetworkStatusDocuments should contain RouterStatusEntryV2s, not V3s. Maybe this isn't exactly a documentation issue though.

Hmmm, I definitely thought they were v3 at the time but I'm not sure why...

https://gitweb.torproject.org/stem.git/commitdiff/b236ac4e0ba830352c447537be6cf59d85650ae0

Changed...

https://gitweb.torproject.org/stem.git/commitdiff/7921a46ba8655baa7c4b81d900a9854444675564

The Descriptor documentation should state that @type bridge-extra-info 1.1 is supported, not just 1.0. There's a difference between supporting 1.0 and ignoring the 1.1 parts and supporting 1.1.

Please see the bit above:

"Descriptor types include the following, including further minor versions (ie. if we support 1.0 then we also support 1.1 and above)..."

Stem expects @type annotations to have a minor version, but ignores the value. I list minor versions in the table so users can copy and paste them for the 'descriptor_type' argument.

All fingerprints and digests, which are in most cases 160 bit value strings, should state exactly whether they're base64 or lower-case hex or upper-case hex encoded.

Mind writing a patch for this? Personally I scratched my head for quite a while trying to figure out what format users would find the most convenient for these values.

RouterStatusEntries should state exactly what fields the referenced "thin" document has and what fields it's missing.

Hm? The thin document should only be missing the routers attribute (as documented). Footer attributes like bandwidth_weights should be present.

The addresses_v6 field in RouterStatusEntryV3 doesn't support port ranges, but only port lists.

Mind providing an example router status entry that's being misparsed?

Replying to atagar:

The Descriptor documentation should state that @type bridge-extra-info 1.1 is supported, not just 1.0. There's a difference between supporting 1.0 and ignoring the 1.1 parts and supporting 1.1.

Please see the bit above:

"Descriptor types include the following, including further minor versions (ie. if we support 1.0 then we also support 1.1 and above)..."

Stem expects @type annotations to have a minor version, but ignores the value. I list minor versions in the table so users can copy and paste them for the 'descriptor_type' argument.

I understand that stem ignores the minor version, but isn't this information valuable to users? For example, if 1.1 adds a field that gets parsed by Stem, knowing whether that field will be in Stem's parsed object is useful to users.

All fingerprints and digests, which are in most cases 160 bit value strings, should state exactly whether they're base64 or lower-case hex or upper-case hex encoded.

Mind writing a patch for this? Personally I scratched my head for quite a while trying to figure out what format users would find the most convenient for these values.

Personally, I find hex most useful, but anything works as long as it's clearly documented. Yes, I can write a patch. But before I do that: we discussed in private mail that it's still break-the-API time. It would be convenient if those fields that are used as references between descriptors (e.g., from status entry to server descriptor) used the same encoding. If you agree, I'd submit a patch that documents encodings more clearly and that converts fingerprint and server descriptor digest in status entries from base64 to upper-case hex.

RouterStatusEntries should state exactly what fields the referenced "thin" document has and what fields it's missing.

Hm? The thin document should only be missing the routers attribute (as documented). Footer attributes like bandwidth_weights should be present.

Oh, okay. Never mind then.

The addresses_v6 field in RouterStatusEntryV3 doesn't support port ranges, but only port lists.

Mind providing an example router status entry that's being misparsed?

I don't know if something is misparsed, it's just that the documentation isn't correct. RouterStatusEntryV3 says: "addresses_v6 (dict) – * relay’s IPv6 OR addresses, this is a mapping of IPv6 addresses to a listing of [(min port, max port)...] it accepts". In theory, that field should be equivalent to "address_alt (list) – alternative for our address/or_port attributes, each entry is a tuple of the form (address (str), port (int), is_ipv6 (bool))" in ServerDescriptor. Note that "addresses_v6" is also not correct, it should be "address_alt" or "addresses_alt".

All fingerprints and digests, which are in most cases 160 bit value strings, should state exactly whether they're base64 or lower-case hex or upper-case hex encoded.

Mind writing a patch for this? Personally I scratched my head for quite a while trying to figure out what format users would find the most convenient for these values.

Personally, I find hex most useful, but anything works as long as it's clearly documented. Yes, I can write a patch. But before I do that: we discussed in private mail that it's still break-the-API time. It would be convenient if those fields that are used as references between descriptors (e.g., from status entry to server descriptor) used the same encoding. If you agree, I'd submit a patch that documents encodings more clearly and that converts fingerprint and server descriptor digest in status entries from base64 to upper-case hex.

I totally agree with Karsten here. I had to investigate how the various fingerprints (which, I believe, are encoded in Base64 in the consensus but in hex in the descriptors) were stored. As a related issue, the family field of descriptors contains fingerprints as a hex string but prepended with a '

' (as described in dir-spec.txt). stem.descriptor.server_descriptor.ServerDescriptor just reads these in, and thus includes the '

'. That may be reasonable behavior, but it would be very helpful to document that these fingerprint won't exactly match the values in stem.descriptor.server_descriptor.ServerDescriptor.fingerprint.

I understand that stem ignores the minor version, but isn't this information valuable to users? For example, if 1.1 adds a field that gets parsed by Stem, knowing whether that field will be in Stem's parsed object is useful to users.

Yup, it is though I think that it's a little redundant to say what minor versions mean on both the metrics site and this table. Patches welcome if there's a particular change that you think would benefit these docs.

Personally, I find hex most useful, but anything works as long as it's clearly documented. Yes, I can write a patch.

I spotted this last week that Thomas' visionion script converts the router entry digest to hex so it can be matched against the digest in the server descriptors...

b2a_hex(a2b_base64("%s==" % (entry.digest, ))).upper()

https://github.com/tomlurge/visionion/blob/master/import/import.py#L77

That's unfortunate. I'd love to get a patch so we avoid the need for this kind of conversion!

I don't know if something is misparsed, it's just that the documentation isn't correct.

Oops! I see what you mean. Fixed in part...

https://gitweb.torproject.org/stem.git/commitdiff/6c99a28e83490537615de9388484c574aa1b85dd

In looking at the spec addition that added 'a' lines (https://gitweb.torproject.org/torspec.git/commitdiff/e2aafe8) I have a couple questions...

Can 'a' lines have non-IPv6 addresses? If they're IPv6-only then we can make things a little nicer for our users (and should note it in the spec).
It looks like the part where it says "(Only included when the vote or consensus is generated with consensus-method 13 or later.)" is wrong. It was added in method consensus-method 14.

As a related issue, the family field of descriptors contains fingerprints as a hex string but prepended with a '
$' (as described in dir-spec.txt). stem.descriptor.server_descriptor.ServerDescriptor just reads these in, and thus includes the '$
'.

Unfortunately the family field can contain both fingerprints and nicknames, so the '$' is needed to differentiate the two. Mind providing a patch for what you would like the docs to say?

Cheers! -Damian

Replying to atagar:

I understand that stem ignores the minor version, but isn't this information valuable to users? For example, if 1.1 adds a field that gets parsed by Stem, knowing whether that field will be in Stem's parsed object is useful to users.

Yup, it is though I think that it's a little redundant to say what minor versions mean on both the metrics site and this table. Patches welcome if there's a particular change that you think would benefit these docs.

See the attached patch. Please tweak as you see fit.

Personally, I find hex most useful, but anything works as long as it's clearly documented. Yes, I can write a patch.

I spotted this last week that Thomas' visionion script converts the router entry digest to hex so it can be matched against the digest in the server descriptors...

b2a_hex(a2b_base64("%s==" % (entry.digest, ))).upper()

https://github.com/tomlurge/visionion/blob/master/import/import.py#L77

That's unfortunate. I'd love to get a patch so we avoid the need for this kind of conversion!

I know. I wrote that script above. :) Is b2a_hex and a2b_base64 what you'd use, too, or are there other methods that you prefer?

Are you fine with making sure that all hex strings are upper-case? (I actually didn't check if they already are upper-case.) Making sure they are and documenting it means developers can leave out a redundant upper() check in their code.

I don't know if something is misparsed, it's just that the documentation isn't correct.

Oops! I see what you mean. Fixed in part...

https://gitweb.torproject.org/stem.git/commitdiff/6c99a28e83490537615de9388484c574aa1b85dd

In looking at the spec addition that added 'a' lines (https://gitweb.torproject.org/torspec.git/commitdiff/e2aafe8) I have a couple questions...

Can 'a' lines have non-IPv6 addresses? If they're IPv6-only then we can make things a little nicer for our users (and should note it in the spec).

IPv4 addresses are okay, too. The current Tor code doesn't have IPv4 addresses in a or or-address lines, but that can be added at any point.

It looks like the part where it says "(Only included when the vote or consensus is generated with consensus-method 13 or later.)" is wrong. It was added in method consensus-method 14.

What is the second "it" in your sentence?

As a related issue, the family field of descriptors contains fingerprints as a hex string but prepended with a '
$' (as described in dir-spec.txt). stem.descriptor.server_descriptor.ServerDescriptor just reads these in, and thus includes the '$
'.

Unfortunately the family field can contain both fingerprints and nicknames, so the '$' is needed to differentiate the two. Mind providing a patch for what you would like the docs to say?

(amj703 said this, not I.) To make things even more fun, family entries can contain fingerprint and nickname at the same time. And to make things even more fun than that, there's one variant for named and one for unnamed relays. Here's how metrics-lib documents the method that returns parsed family entries:

  /* Return nicknames, ($-prefixed) fingerprints, $fingerprint=nickname,
   * or $fingerprint~nickname tuples contained in the family line of this
   * relay, or null if the descriptor does not contain a family line. */
  public List<String> getFamilyEntries();

Trac:
0001-Try-to-make-minor-descriptor-versions-clearer.patch

See the attached patch. Please tweak as you see fit.

Thanks, pushed.

I know. I wrote that script above. :)

Ahhhh. :P

Is b2a_hex and a2b_base64 what you'd use, too, or are there other methods that you prefer?

No, that's what I'd use. The '=' padding worries me a bit (can we be sure that we'll always need to pad by two?), but otherwise looks good.

Are you fine with making sure that all hex strings are upper-case?

Yup. We should standardize on something and most are uppercase, so I agree that we should use that.

IPv4 addresses are okay, too. The current Tor code doesn't have IPv4 addresses in a or or-address lines, but that can be added at any point.

Oh well. Changed...

https://gitweb.torproject.org/stem.git/commitdiff/c6a9cde0f4eb5c627f5ba41f9d263be0c5854ae8

What is the second "it" in your sentence?

See the spec change that I cited: https://gitweb.torproject.org/torspec.git/commitdiff/e2aafe8

... it says consensus-method 14 everywhere except once.

Cheers! -Damian

Replying to atagar:

Is b2a_hex and a2b_base64 what you'd use, too, or are there other methods that you prefer?

No, that's what I'd use. The '=' padding worries me a bit (can we be sure that we'll always need to pad by two?), but otherwise looks good.

We know the expected number of bits for these fields. I'll take another look at base64 to make sure we need exactly two ='s.

Are you fine with making sure that all hex strings are upper-case?

Yup. We should standardize on something and most are uppercase, so I agree that we should use that.

Okay. I'll prepare and test a patch.

What is the second "it" in your sentence?

See the spec change that I cited: https://gitweb.torproject.org/torspec.git/commitdiff/e2aafe8

... it says consensus-method 14 everywhere except once.

Looks like this has already been fixed in 5ddddcb.

Okay. I'll prepare and test a patch.

Hi Karsten. Any progress on this? Besides this please be thinking about any backward incompatible changes you think would make life nicer - after our release in March things will be far less flexible for quite some time. :)

Trac:
Keywords: N/A deleted, descriptors added
Status: new to needs_information

Trac:
0001-Change-fingerprints-and-digests-to-upper-case-hex.patch

See attached patch.

Trac:
Status: needs_information to needs_review

Thanks! Looks great, pushed...

https://gitweb.torproject.org/stem.git/commitdiff/be9a5323a37ea0f1b7d497d7fc33e101453eb2cf

Trac:
Resolution: N/A to fixed
Status: needs_review to closed

closed

Tweak Stem's documentation

Child items ...

Activity