Opened 4 months ago

Closed 3 months ago

#30373 closed defect (fixed)

Most headers non-compliant with spec

Reported by: irl Owned by:
Priority: Medium Milestone: Tor: 0.4.1.x-final
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: bandwidth-file-spec, tor-spec
Cc: metrics-team, juga Actual Points:
Parent ID: Points: 1
Reviewer: asn Sponsor:

Description

Keyword is imported from dir-spec and

KeyValue ::= Keyword "=" Value
Keyword = KeywordChar+
KeywordChar ::= 'A' ... 'Z' | 'a' ... 'z' | '0' ... '9' | '-'

but... most headers have underscores in them.

My parser wasn't strict enough to catch this the first time. ):

I suggest this is fixed with a specification update to reflect reality rather than creating complexity we have to maintain forever parsing two sets of keys.

Child Tickets

Attachments (1)

graphviz-c87f837aa8efe0a4df10bd6fa2ac8ca0b38a7e0d.png (73.3 KB) - added by irl 4 months ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 4 months ago by juga

Does it sounds fine that we just re-define Keyword and KeywordChar in the bandwidth file to include _?.

I think we would need to increment the version though.

comment:2 Changed 4 months ago by juga

Component: Core Tor/sbwsCore Tor/Tor
Keywords: bandwidth-file-spec tor-spec added
Points: 1
Version: Tor: unspecified

Specification changes belong to Core Tor/Tor

comment:3 in reply to:  1 ; Changed 4 months ago by irl

Replying to juga:

Does it sounds fine that we just re-define Keyword and KeywordChar in the bandwidth file to include _?.

I think we would need to increment the version though.

I don't think the version needs changing, this is a "typo fix" in my mind. Anyone that has an implementation that works with the current sbws files is already complying with the change that we haven't made yet.

Keyword and KeywordChar are already defined, so I wouldn't want to cause confusion be re-defining them, even though it's a different spec people have ideas in their heads about what these mean.

Maybe just a new Key:

Key ::= (KeywordChar | "_")+

comment:4 in reply to:  3 ; Changed 4 months ago by juga

Replying to irl:

Replying to juga:

Does it sounds fine that we just re-define Keyword and KeywordChar in the bandwidth file to include _?.

I think we would need to increment the version though.

I don't think the version needs changing, this is a "typo fix" in my mind. Anyone that has an implementation that works with the current sbws files is already complying with the change that we haven't made yet.

A complying version is 1.0.0. A parser that is ignoring anything (as it should) that contains _ will still parse the timestamp, which is the only required thing in 1.0.0.

Keyword and KeywordChar are already defined, so I wouldn't want to cause confusion be re-defining them, even though it's a different spec people have ideas in their heads about what these mean.

Maybe just a new Key:

Key ::= (KeywordChar | "_")+

I agree and that sounds good, for some reason i thought we were redefining something else, but checked and i don't think so.

comment:5 in reply to:  4 ; Changed 4 months ago by irl

Replying to juga:

Replying to irl:

Replying to juga:

Does it sounds fine that we just re-define Keyword and KeywordChar in the bandwidth file to include _?.

I think we would need to increment the version though.

I don't think the version needs changing, this is a "typo fix" in my mind. Anyone that has an implementation that works with the current sbws files is already complying with the change that we haven't made yet.

A complying version is 1.0.0. A parser that is ignoring anything (as it should) that contains _ will still parse the timestamp, which is the only required thing in 1.0.0.

The finite state automaton that I derived from the spec is shown here:


The spec requires that Keywords have only KeywordChar in them, so basically any file with headers *fails* to parse. Where Keywords are expected, they are not found. So no, the files with headers are not compliant with 1.0.0.

I think we fix this by fixing the spec though. Clearly everyone managed just fine until now with having _ in the headers, and people wrote parsers that handle them. The parsers I'm adding to bushel are just super strict, and designed to catch issues like this.

Keyword and KeywordChar are already defined, so I wouldn't want to cause confusion be re-defining them, even though it's a different spec people have ideas in their heads about what these mean.

Maybe just a new Key:

Key ::= (KeywordChar | "_")+

I agree and that sounds good, for some reason i thought we were redefining something else, but checked and i don't think so.

Can you come up with a patch for the spec that replaces Keyword with Key? We're essentially just pretending this is the way it was all along, because no one implemented the spec as it's defined.

comment:6 Changed 4 months ago by dgoulet

Milestone: Tor: unspecified

comment:7 in reply to:  5 Changed 3 months ago by juga

Status: newneeds_review

Replying to irl:

Replying to juga:

Replying to irl:

Replying to juga:

Does it sounds fine that we just re-define Keyword and KeywordChar in the bandwidth file to include _?.

I think we would need to increment the version though.

I don't think the version needs changing, this is a "typo fix" in my mind. Anyone that has an implementation that works with the current sbws files is already complying with the change that we haven't made yet.

A complying version is 1.0.0. A parser that is ignoring anything (as it should) that contains _ will still parse the timestamp, which is the only required thing in 1.0.0.

The finite state automaton that I derived from the spec is shown here:

Did you used a tool to automatically generate that?.
You reminded me to do something i thought time ago: define the grammar using a tool that can check it [0]. If we come out with a nice tool, it would be very useful for ensuring our grammars are correct and not repeating code in several places.

The spec requires that Keywords have only KeywordChar in them, so basically any file with headers *fails* to parse. Where Keywords are expected, they are not found. So no, the files with headers are not compliant with 1.0.0.

ok, i see.

I think we fix this by fixing the spec though. Clearly everyone managed just fine until now with having _ in the headers, and people wrote parsers that handle them. The parsers I'm adding to bushel are just super strict, and designed to catch issues like this.

Keyword and KeywordChar are already defined, so I wouldn't want to cause confusion be re-defining them, even though it's a different spec people have ideas in their heads about what these mean.

Maybe just a new Key:

Key ::= (KeywordChar | "_")+

I agree and that sounds good, for some reason i thought we were redefining something else, but checked and i don't think so.

Can you come up with a patch for the spec that replaces Keyword with Key? We're essentially just pretending this is the way it was all along, because no one implemented the spec as it's defined.

https://github.com/torproject/torspec/pull/82

[0] https://github.com/juga0/bandwidth_file_grammar/blob/master/bw_file_grammar.py#L4

comment:8 Changed 3 months ago by dgoulet

Reviewer: asn

comment:9 Changed 3 months ago by asn

Status: needs_reviewmerge_ready

LGTM! Thanks!

comment:10 Changed 3 months ago by nickm

Milestone: Tor: unspecifiedTor: 0.4.1.x-final

comment:11 Changed 3 months ago by nickm

Resolution: fixed
Status: merge_readyclosed

merged!

Note: See TracTickets for help on using tickets.