Opened 3 years ago

Closed 2 years ago

#21934 closed defect (fixed)

Allow extra arguments in lines containing comma-separated key-value lists

Reported by: karsten Owned by: metrics-team
Priority: Medium Milestone: metrics-lib 1.7.0
Component: Metrics/Library Version:
Severity: Normal Keywords:
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In the context of simulating changes to directory-request statistics I added a few extra arguments to dirreq-v3-resp lines to include parameters like bin_size=8. But metrics-lib refuses to parse those descriptors, throwing a DescriptorParseException. I think this violates the specification which says:

   For forward compatibility, each item MUST allow extra arguments at the
   end of the line unless otherwise noted.  So if an item's description below
   is given as:
       "thing" int int int NL
   then implementations SHOULD accept this string as well:
       "thing 5 9 11 13 16 12" NL
   but not this string:
       "thing 5" NL
   and not this string:
       "thing 5 10 thing" NL
   .

   Whenever an item DOES NOT allow extra arguments, we will tag it with
   "no extra arguments".

I put in a quick fix which might even become the real fix:

diff --git a/src/main/java/org/torproject/descriptor/impl/ParseHelper.java b/src/main/java/org/torproject/descriptor/impl/ParseHelper.java
index f73a591..2de05b5 100644
--- a/src/main/java/org/torproject/descriptor/impl/ParseHelper.java
+++ b/src/main/java/org/torproject/descriptor/impl/ParseHelper.java
@@ -341,10 +341,6 @@ public class ParseHelper {
     if (partsNoOpt.length < index) {
       throw new DescriptorParseException("Line '" + line + "' does not "
           + "contain a key-value list at index " + index + ".");
-    } else if (partsNoOpt.length > index + 1 ) {
-      throw new DescriptorParseException("Line '" + line + "' contains "
-          + "unrecognized values beyond the expected key-value list at "
-          + "index " + index + ".");
     } else if (partsNoOpt.length > index) {
       if (!commaSeparatedKeyValueListPatterns.containsKey(keyLength)) {
         String keyPattern = "[0-9a-zA-Z?<>\\-_]"

I can confirm that this solves my issue. But does it produce any new issues by accepting lines that it shouldn't accept? And are there similar issues where we're throwing an exception instead of simply ignoring extra arguments?

Child Tickets

Change History (6)

comment:1 Changed 2 years ago by iwakeh

dir-spec defines the field dirreq-v3-resp more narrowly (the 'nul' there should be 'num'):

    "dirreq-v3-resp" status=nul,... NL
        [At most once.]

Which I would take to mean only status-types are allowed on the left of equal and a number to the right, but of these an arbitrary number. But, response statuses can be added any time, so the patch above seems to do the right thing. The further parsing doesn't verify the status strings, which is correct for 'dirreq-v3-resp'.

There seems to be an ambiguity in defining these things, for example:

"cell-circuits-per-decile" num NL
        [At most once.]

In this case only a single value makes sense, but other than not using the dots it is not expressed clearly in dir-spec. Could a value be added (e.g. the median) here without changing the protocol?

In order to check for all fields it might be necessary to go through all fields one by one and check.

What comes to mind is using a grammar for checking the spec definitions.
Were there reasons once upon a time not to use a grammar and automated checking for the spec?

comment:2 Changed 2 years ago by karsten

Status: newneeds_review

So, I spent more time on this ticket than I should have. :) Here's what I found:

According to the specification it's valid to not only add a new response status, but also to append one or more entirely new parameters. Example:

dirreq-v3-resp ok=88,not-modified=24,busy=8,not-enough-sigs=0,not-found=0,unavailable=0 delta_f=1 epsilon=0.30 bin_size=8

If this were not permitted, there'd have to be a [No extra arguments] tag.

The same applies to cell-circuits-per-decile which could indeed be extended to include another parameter than the number. For example:

cell-circuits-per-decile 105472 delta_f=10240 epsilon=0.30 bin_size=1024

In total, I found 17 lines that don't allow extra arguments:

  • Server descriptors:
    • "identity-ed25519"
    • "onion-key"
    • "onion-key-crosscert"
    • "ntor-onion-key-crosscert"
    • "signing-key"
    • "router-signature"
    • "caches-extra-info"
    • "allow-single-hop-exits"
    • "tunnelled-dir-server"
  • Extra-info descriptors:
    • "router-signature"
  • Key certificates/votes:
    • "dir-identity-key"
    • "dir-signing-key"
    • "dir-key-crosscert"
    • "dir-key-certification"
  • Microdescriptors:
    • "onion-key"
  • Consensuses:
    • "consensus-method"
  • Votes and consensuses:
    • "directory-footer"

The list of lines that do allow extra arguments is, of course, much longer. Now, I don't want to touch all code in metrics-lib for this relatively minor change, and I don't think we have to. The worst thing that can happen when an extra argument is added that metrics-lib doesn't understand is that it rejects the whole descriptor. When we find out we can still update metrics-lib and reprocess the descriptor. But we don't lose data, because CollecTor stores descriptors anyway.

I suggest we focus on a subset that we care most about, which is extra-info descriptor lines related to statistics. Please review my branch task-21934.

Forward compatibility, yay!

comment:3 Changed 2 years ago by karsten

Milestone: metrics-lib 1.7.0

Assigning to the 1.7.0 milestone, which seems realistic.

comment:4 Changed 2 years ago by karsten

Possibly related question that came to mind the other day: do we need to include lines with extra arguments in Descriptor#getUnrecognizedLines()? Or do we need another method Descriptor#getPartiallyUnrecognizedLines()? Or can we safely drop those extra arguments on the floor? See also TorperfResults#getUnrecognizedKeys() which is somewhat related here, but not exactly the same.

comment:5 Changed 2 years ago by iwakeh

Status: needs_reviewmerge_ready

This fix prevents a parsing failure for valid descriptors, which is an improvement. Maybe open a new ticket for discussing the possibilities for providing the unrecognized parts?
Currently, at least the bytes of the entire descriptor are available for further processing.

The patch looks fine, all checks and tests pass. merge ready!

comment:6 in reply to:  5 Changed 2 years ago by karsten

Resolution: fixed
Status: merge_readyclosed

Replying to iwakeh:

This fix prevents a parsing failure for valid descriptors, which is an improvement. Maybe open a new ticket for discussing the possibilities for providing the unrecognized parts?
Currently, at least the bytes of the entire descriptor are available for further processing.

Good idea. Added as #22208.

The patch looks fine, all checks and tests pass. merge ready!

Great, merged to master. Thanks for reviewing! Closing.

Note: See TracTickets for help on using tickets.