I scrapped this from IRC. Don't really have time to clean it up or
implement any of the proposed fixes atm, but so we don't forget:
ideafix http://paste.debian.net/62206/
for extrainfo wrongly fix. yet one try.
http://paste.debian.net/62207/ ideafix for sure.
nah, yet cert need to fix, but hopes idea is trasmitted.
and rend stuff.
get_next_token() have a fragile logic, even if it "We go ahead whether there are arguments or not, so that tok->args is always set" while *s == eol. tok->args = memarea_memdup(area, args, 0x00) for such case. so any access to args[0] is invalid in theory.
if any try to access anything from tok->args[] with tok->n_args=0, it clearly should crash. bug should not be hidden. so here clarified func http://paste.debian.net/62216/
hah nah not such.
http://paste.debian.net/62219/ clear fix of get_next_token()
no, it's incomplete. ignore it.
http://paste.debian.net/62245/ I don't know why it need, but my brain can't release it.
Fix a spec conformance issue (complete version): http://paste.debian.net/62253/
"ArgumentChar ::= any printing ASCII character except NL." space is printing character, isn't?
so "KeywordLine ::= Keyword NL | Keyword WS ArgumentChar+ NL" means Keyword space space space NL is valid isn't?
then "router-signature" NL Signature NL means that "router-signature \n" valid. is true?
and "router-signature\t\n" is not valid. ouch.
so code is unrealy hard to properly implement of such spec.
I am give up with tor
Please do clarify work for the spec or for the code about ArgumentChar. Can be "\nrouter-signature \n" valid? If not here is my idea for fix the code. http://paste.debian.net/62277/
[Automatically added by flyspray2trac: Operating System: All]
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
We should shake these out and merge them for 0.2.2.x. We put them on the back burner for a while since the security part turned out not to be actually security-critical, I think, but we should still clean this code up.
Trac: Milestone: N/Ato Tor: 0.2.2.x-final Owner: N/Ato nickm Status: new to assigned Parent: N/AtoN/A Keywords: N/Adeleted, N/Aadded
Now see branch "get_next_token_v2" in my public repostory, rebased to work against current master.
Trac: Description: I scrapped this from IRC. Don't really have time to clean it up or
implement any of the proposed fixes atm, but so we don't forget:
ideafix http://paste.debian.net/62206/
for extrainfo wrongly fix. yet one try.
http://paste.debian.net/62207/ ideafix for sure.
nah, yet cert need to fix, but hopes idea is trasmitted.
and rend stuff.
get_next_token() have a fragile logic, even if it "We go ahead whether there are arguments or not, so that tok->args is always set" while *s == eol. tok->args = memarea_memdup(area, args, 0x00) for such case. so any access to args[0] is invalid in theory.
if any try to access anything from tok->args[] with tok->n_args=0, it clearly should crash. bug should not be hidden. so here clarified func http://paste.debian.net/62216/
hah nah not such.
http://paste.debian.net/62219/ clear fix of get_next_token()
no, it's incomplete. ignore it.
http://paste.debian.net/62245/ I don't know why it need, but my brain can't release it.
Fix a spec conformance issue (complete version): http://paste.debian.net/62253/
"ArgumentChar ::= any printing ASCII character except NL." space is printing character, isn't?
so "KeywordLine ::= Keyword NL | Keyword WS ArgumentChar+ NL" means Keyword space space space NL is valid isn't?
then "router-signature" NL Signature NL means that "router-signature \n" valid. is true?
and "router-signature\t\n" is not valid. ouch.
so code is unrealy hard to properly implement of such spec.
I am give up with tor
Please do clarify work for the spec or for the code about ArgumentChar. Can be "\nrouter-signature \n" valid? If not here is my idea for fix the code. http://paste.debian.net/62277/
[Automatically added by flyspray2trac: Operating System: All]
to
I scrapped this from IRC. Don't really have time to clean it up or
implement any of the proposed fixes atm, but so we don't forget:
ideafix http://paste.debian.net/62206/
for extrainfo wrongly fix. yet one try.
http://paste.debian.net/62207/ ideafix for sure.
nah, yet cert need to fix, but hopes idea is trasmitted.
and rend stuff.
get_next_token() have a fragile logic, even if it "We go ahead whether there are arguments or not, so that tok->args is always set" while *s == eol. tok->args = memarea_memdup(area, args, 0x00) for such case. so any access to args[0] is invalid in theory.
if any try to access anything from tok->args[] with tok->n_args=0, it clearly should crash. bug should not be hidden. so here clarified func http://paste.debian.net/62216/
hah nah not such.
http://paste.debian.net/62219/ clear fix of get_next_token()
no, it's incomplete. ignore it.
http://paste.debian.net/62245/ I don't know why it need, but my brain can't release it.
Fix a spec conformance issue (complete version): http://paste.debian.net/62253/
"ArgumentChar ::= any printing ASCII character except NL." space is printing character, isn't?
so "KeywordLine ::= Keyword NL | Keyword WS ArgumentChar+ NL" means Keyword space space space NL is valid isn't?
then "router-signature" NL Signature NL means that "router-signature \n" valid. is true?
and "router-signature\t\n" is not valid. ouch.
so code is unrealy hard to properly implement of such spec.
I am give up with tor
Please do clarify work for the spec or for the code about ArgumentChar. Can be "\nrouter-signature \n" valid? If not here is my idea for fix the code. http://paste.debian.net/62277/
[Automatically added by flyspray2trac: Operating System: All]
What about A_PURPOSE keyword, such string parses by lex_token() as no keyword with zero keywordlen. get_next_token() convert A_PURPOSE token to "(unknown)".
There are still so much 'opportunities', as about number of spaces for different documents and so on.
in the second commit message, you say 'The ambiguity was in the parse of "Foo Bar": is the second space part of
WS, or part of ArgumentChar* ? We now say it's part of the WS.' Don't you mean "Foo Bar"?
You use OptArgument to specify KeywordLine, but it it not defined anywhere else. I guess you meant to say
just Argument there?
It seems overly verbose to say "This keyword MUST have no arguments, not even whitespace." after saying "It is also an error for whitespace to appear after a keyword that does not accept arguments."
The first problem I noticed with get_next_token is that it doesn't work for annotations. For example, a line starting with @downloaded-at will not be recognized (because @ isn't a keyword_char, and then the @ will be treated as an out of place character.
Some places check for get_next_token's return value, some don't. We should add the check to all places.
5 and 6 are magic numbers (usually either strlen("-----") or strlen("-----\n") that are used frequently. Maybe adding some #defines would be nicer?
/* Skip to the first character after the newline... */
s += 6;
strcmpstart_len() belongs in util.c, not routerparse.c
The dir-spec change says "It is also an error for whitespace to appear after a keyword that does not accept arguments." yet the comment for args_len in token_boundaries_t explains how the structure looks for this case. We also don't reject such documents later, so this is a spec violation.
We also don't do any verification that the arguments for keywords are really ascii only.
In lex_token(), we never accept @ as part of a keyword. This breaks when we use it for annotations.
What is b for in this code?
char b[64];
out->end_of_token = s;
strlcpy(b, s, MIN(eos-s,64));
return 1;
Add a comment to explain what's going on in
if (nl < s+6 || 0 != memcmp("-----\n", nl-5, 6)) {
took me a bit to see what it is trying to do
We should check we're not reading to far here
/* Point s (and the object body) to the first char on the next line.*/
s = nl+1;
and also here
/* Skip to the first character after the newline... */
outofwords's changes look good. Somewhere along the line, looks like unit tests stopped liking me in this. Time to track that down... ah. See get_next_token_v3.
Okay, I've tried to fix what was wrong, and make what was right more obviously right. Latest branches are dir_spec_v2 and get_next_token_v3 in my public.
So while there is a conformance bug here, the ration of the complexity of the fix to the impact of the of the conformance bug is high enough that I'm not sure it's a great idea to put this into 0.2.2.x unless somebody feels very strongly.
I am inclined merge this early in 0.2.3.x and consider a backport.
Alternatively, we could consider moving to flex, which would get us faster and more reliable lexing in theory. (In practice, it would replace our C-is-hard-to-write bugs with flex-is-hard-to-write bugs. We have much more experience in C.)
If we do any changes here, they'll come as a result of rewriting the whole business in rust, or using automatically generated code, or something like that.
Trac: Resolution: None to wontfix Status: needs_revision to closed