Opened 9 years ago

Closed 4 weeks ago

#1270 closed defect (wontfix)

Spec conformance issues on get_next_token() reported by outofwords

Reported by: Sebastian Owned by: nickm
Priority: Low Milestone: Tor: unspecified
Component: Core Tor/Tor Version: 0.2.1.22
Severity: Normal Keywords: tor-client parser tor-spec
Cc: Sebastian, nickm Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description (last modified by nickm)

I scrapped this from IRC. Don't really have time to clean it up or
implement any of the proposed fixes atm, but so we don't forget:

ideafix http://paste.debian.net/62206/
for extrainfo wrongly fix. yet one try.
http://paste.debian.net/62207/ ideafix for sure.
nah, yet cert need to fix, but hopes idea is trasmitted.
and rend stuff.
get_next_token() have a fragile logic, even if it "We go ahead whether there are arguments or not, so that tok->args is always set" while *s == eol. tok->args = memarea_memdup(area, args, 0x00) for such case. so any access to args[0] is invalid in theory.
if any try to access anything from tok->args[] with tok->n_args=0, it clearly should crash. bug should not be hidden. so here clarified func http://paste.debian.net/62216/
hah nah not such.
http://paste.debian.net/62219/ clear fix of get_next_token()
no, it's incomplete. ignore it.
http://paste.debian.net/62245/ I don't know why it need, but my brain can't release it.
Fix a spec conformance issue (complete version): http://paste.debian.net/62253/
"ArgumentChar ::= any printing ASCII character except NL." space is printing character, isn't?
so "KeywordLine ::= Keyword NL | Keyword WS ArgumentChar+ NL" means Keyword space space space NL is valid isn't?
then "router-signature" NL Signature NL means that "router-signature \n" valid. is true?
and "router-signature\t\n" is not valid. ouch.
so code is unrealy hard to properly implement of such spec.
I am give up with tor

Please do clarify work for the spec or for the code about ArgumentChar. Can be "\nrouter-signature \n" valid? If not here is my idea for fix the code. http://paste.debian.net/62277/


[Automatically added by flyspray2trac: Operating System: All]

Child Tickets

Change History (32)

comment:1 Changed 9 years ago by Sebastian

oh great, now they're all expired. yay.

comment:2 Changed 9 years ago by Sebastian

Seems like outofwords refused to repaste the patches. Sad.

comment:3 Changed 9 years ago by nickm

Reopening; the bug is sorta intelligible even if the reporter is unhelpful about it.

I think it has to do with trailing and redundant spaces in our format, which the spec allows inconsistently and our
parser processes inconsistently.

comment:4 Changed 9 years ago by Sebastian

So I got the patch repasted. http://sebastianhahn.net/tor/bug1270patch
for a more persistent resource. Will review and put in a branch.

At the same time, outofwords requests a clarification of the spec.

comment:6 Changed 9 years ago by nickm

See my get_next_token and dir-spec branches. For the security relevant part, drop me a line and I'll send you the
patch.

comment:7 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-final
Owner: set to nickm
Status: newassigned

We should shake these out and merge them for 0.2.2.x. We put them on the back burner for a while since the security part turned out not to be actually security-critical, I think, but we should still clean this code up.

comment:8 Changed 9 years ago by nickm

Description: modified (diff)

Now see branch "get_next_token_v2" in my public repostory, rebased to work against current master.

comment:9 Changed 9 years ago by yetonetime

What about A_PURPOSE keyword, such string parses by lex_token() as no keyword with zero keywordlen. get_next_token() convert A_PURPOSE token to "(unknown)".

There are still so much 'opportunities', as about number of spaces for different documents and so on.

comment:10 Changed 9 years ago by Sebastian

branch dir-spec comments:

in the second commit message, you say 'The ambiguity was in the parse of "Foo Bar": is the second space part of

WS, or part of ArgumentChar* ? We now say it's part of the WS.' Don't you mean "Foo Bar"?

You use OptArgument to specify KeywordLine, but it it not defined anywhere else. I guess you meant to say
just Argument there?

It seems overly verbose to say "This keyword MUST have no arguments, not even whitespace." after saying "It is also an error for whitespace to appear after a keyword that does not accept arguments."

comment:11 Changed 9 years ago by Sebastian

While I'm at it, branch typo in my repo has a fix for a typo that I noticed while reviewing

comment:12 Changed 9 years ago by Sebastian

The first problem I noticed with get_next_token is that it doesn't work for annotations. For example, a line starting with @downloaded-at will not be recognized (because @ isn't a keyword_char, and then the @ will be treated as an out of place character.

comment:13 Changed 9 years ago by Sebastian

While I was looking over this thing, _ename_ appeared and offered this patch (I slightly edited to make it apply, I did not edit any of its content): https://gitweb.torproject.org/sebastian/tor.git/commitdiff/d74f5fa02c55e019f53ffb56cb6d5115167fd1a4 (it goes on top of get_next_token_v2 in Nick's repo)

comment:14 in reply to:  13 Changed 9 years ago by Sebastian

Replying to Sebastian:

While I was looking over this thing, _ename_ appeared and offered this patch (I slightly edited to make it apply, I did not edit any of its content): https://gitweb.torproject.org/sebastian/tor.git/commitdiff/d74f5fa02c55e019f53ffb56cb6d5115167fd1a4 (it goes on top of get_next_token_v2 in Nick's repo)

Since I cannot edit this comment: _ename_ points out that the patch was offered by outofwords instead. Sorry.

comment:15 Changed 9 years ago by Sebastian

Comments on get_next_token_v2:

Some places check for get_next_token's return value, some don't. We should add the check to all places.

5 and 6 are magic numbers (usually either strlen("-----") or strlen("-----\n") that are used frequently. Maybe adding some #defines would be nicer?
+ /* Skip to the first character after the newline... */
+ s += 6;

strcmpstart_len() belongs in util.c, not routerparse.c

The dir-spec change says "It is also an error for whitespace to appear after a keyword that does not accept arguments." yet the comment for args_len in token_boundaries_t explains how the structure looks for this case. We also don't reject such documents later, so this is a spec violation.

We also don't do any verification that the arguments for keywords are really ascii only.

In lex_token(), we never accept @ as part of a keyword. This breaks when we use it for annotations.

What is b for in this code?
+ char b[64];
+ out->end_of_token = s;
+ strlcpy(b, s, MIN(eos-s,64));
+ return 1;

Add a comment to explain what's going on in

+ if (nl < s+6
0 != memcmp("-----\n", nl-5, 6)) {

took me a bit to see what it is trying to do

We should check we're not reading to far here
+ /* Point s (and the object body) to the first char on the next line.*/
+ s = nl+1;
and also here
+ /* Skip to the first character after the newline... */
+ s += 6;

comment:16 Changed 9 years ago by Sebastian

wanoskarnet points out that I was just confused with my last comment about the pointer safety.

comment:17 Changed 9 years ago by Sebastian

Status: assignedneeds_review

For branch dir-spec, we should also allow non-ascii in the version line. See #1258 for more info

comment:18 in reply to:  17 Changed 9 years ago by Sebastian

Replying to Sebastian:

For branch dir-spec, we should also allow non-ascii in the version line. See #1258 for more info

I meant platform line of course.

comment:19 Changed 9 years ago by nickm

Updated dir-spec fixes in branch dir_spec_v2.

comment:20 Changed 9 years ago by nickm

outofwords's changes look good. Somewhere along the line, looks like unit tests stopped liking me in this. Time to track that down... ah. See get_next_token_v3.

Now to address sebastian's comments.

comment:21 Changed 9 years ago by nickm

Okay, I've tried to fix what was wrong, and make what was right more obviously right. Latest branches are dir_spec_v2 and get_next_token_v3 in my public.

comment:22 Changed 9 years ago by Sebastian

Both branches look good to me now.

What got me confused initially was the comment

/** Try to scan a token from <b>s</b>, reading no farther then <b>eos</b>.

because we never actually read eos. Maybe that could be clarified somehow, though it probably isn't so important.

comment:23 Changed 9 years ago by nickm

So while there _is_ a conformance bug here, the ration of the complexity of the fix to the impact of the of the conformance bug is high enough that I'm not sure it's a great idea to put this into 0.2.2.x unless somebody feels very strongly.

I am inclined merge this early in 0.2.3.x and consider a backport.

Alternatively, we could consider moving to flex, which would get us faster and more reliable lexing in theory. (In practice, it would replace our C-is-hard-to-write bugs with flex-is-hard-to-write bugs. We have much more experience in C.)

comment:24 Changed 9 years ago by nickm

Milestone: Tor: 0.2.2.x-finalTor: 0.2.3.x-final

Bumping to 0.2.3.x-final

comment:25 Changed 9 years ago by arma

"damn the torpedos"

(aka I'm unlikely to review this usefully, so don't block on me)

comment:26 Changed 8 years ago by nickm

Milestone: Tor: 0.2.3.x-finalTor: unspecified

comment:27 Changed 7 years ago by nickm

Keywords: tor-client added

comment:28 Changed 7 years ago by nickm

Component: Tor ClientTor

comment:29 Changed 2 years ago by nickm

Cc: Sebastian,nickmSebastian, nickm
Keywords: parser tor-spec added
Severity: Normal

comment:30 Changed 2 years ago by nickm

Status: needs_reviewneeds_revision

So, the code here is pretty darn well tested and fuzzed, even if it isn't conformant. At this point, we're probably better of specifying what we do.

comment:31 Changed 2 years ago by nickm

Summary: Spec conformance issues reported by outofwordsSpec conformance issues on get_next_token() reported by outofwords

comment:32 Changed 4 weeks ago by nickm

Resolution: Nonewontfix
Status: needs_revisionclosed

If we do any changes here, they'll come as a result of rewriting the whole business in rust, or using automatically generated code, or something like that.

Note: See TracTickets for help on using tickets.