Opened 7 years ago

Closed 7 years ago

#7191 closed defect (fixed)

smartlist_bsearch_idx() is broken for short lists

Reported by: andrea Owned by: andrea
Priority: High Milestone: Tor: 0.2.3.x-final
Component: Core Tor/Tor Version: Tor: 0.2.4.3-alpha
Severity: Keywords: tor-relay denial-of-service
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Per asn:

---begin quote---

Hi Andrea,

this is a possible bug I was discussing with Nick. He is pretty busy these
days, so a third set of eyes could be useful:

<asn> hi
<asn> fwiw smartlist_bsearch_idx() seems a bit sloppy
<asn> it doesn't handle the case where the sl is empty (smartlist_len(sl)

  • 1, underflows)

<asn> and if sl has one element, there is still the danger of underflowing
'hi = mid-1;'.
<asn> from what I see, the function is only used with smartlist carrying
the whole routerlist, so it's "safe" till tor has only one relay.
<nickm> ...at which point we've got other problems, yeah.
<nickm> still a good idea to fix it
<nickm> hang on
<nickm> it's used in smartlist_bsearch, which is used in other places too
<asn> i think smartlist_bsearch() is also only used with the whole
routerlist.
<nickm> you mean networkstatus
<nickm> the routerlist is the list of routerinfo_t we know
<nickm> there are enough places where it's used that I think we should
have more eyes looking at it before we accidentally 0day ourselves. I'll
look through the code by thursday; you can also ask athena on
#tor-internal if you like
<asn> btw, the interface of smartlist_bsearch_idx() doesn't allow
particularly elegant error handling :(

--- end quote ---

This function is broken for lists of length zero or one and doesn't check the pointer arguments for nullness properly.

Child Tickets

Attachments (1)

container.c.gcov.txt (7.5 KB) - added by andrea 7 years ago.
Gcov output for test suite with proposed 7191 fix

Download all attachments as: .zip

Change History (25)

comment:1 Changed 7 years ago by andrea

Filing this ticket after discussing with nickm and concluding it isn't exploitable under reasonable conditions.

comment:2 in reply to:  1 ; Changed 7 years ago by nickm

Milestone: Tor: 0.2.4.x-finalTor: 0.2.3.x-final

Replying to andrea:

Filing this ticket after discussing with nickm and concluding it isn't exploitable under reasonable conditions.

Right. To be clear, you'd need to be an authority to exploit this, and you could only exploit it against other authorities. V2 authorities could exploit it against anybody cacheing v2 directory info.

I do claim this might be worth backporting to 0.2.3, but would like arma and andrea's feedback there.

comment:3 in reply to:  2 Changed 7 years ago by andrea

Replying to nickm:

I do claim this might be worth backporting to 0.2.3, but would like arma and andrea's feedback there.

Backporting sounds fine; the fix should be pretty local and easy to do this for.

comment:4 Changed 7 years ago by andrea

Status: newneeds_review

See my bug7191 branch.

comment:5 Changed 7 years ago by nickm

Needs to be rebased onto maint-0.2.3. (Example sequence: "git fetch origin ; git checkout -b bug7191_023; git rebase master --onto origin/maint-0.2.3")

For 0.2.3 purposes, I wonder if some of the assertions couldn't turn into LD_BUG entries. What do you think?

I'd like to see the unit tests expanded to the point where every branch of this function is covered (as tested with gcov); is that now the case?

Is there some well-documented binary search whose implementation we now match? It seems silly not to crib from Dijkstra or Knuth or whomever, given that we already screwed this up once, unless there's a great reason.

While we're bulletproofing this, it looks like (lo + hi)/2 will hit an integer overflow if we get an array of over one billion elements on an architecture where int is 32 bits. Might as well fix that too.

comment:6 in reply to:  description Changed 7 years ago by cypherpunks

Replying to andrea:

#tor-internal if you like

Where can I find it?

Changed 7 years ago by andrea

Attachment: container.c.gcov.txt added

Gcov output for test suite with proposed 7191 fix

comment:7 in reply to:  5 ; Changed 7 years ago by andrea

Replying to nickm:

Needs to be rebased onto maint-0.2.3. (Example sequence: "git fetch origin ; git checkout -b bug7191_023; git rebase master --onto origin/maint-0.2.3")

Okay.

For 0.2.3 purposes, I wonder if some of the assertions couldn't turn into LD_BUG entries. What do you think?

Hmm - if we adopt a policy like that, won't it make it a pain in the ass to change all asserts to LD_BUG logs when dev branches become stable in the future? Maybe we should have a macro that turns into an assert or an LD_BUG as appropriate?

I'd like to see the unit tests expanded to the point where every branch of this function is covered (as tested with gcov); is that now the case?

I've attached gcov output; looks like the only case we still need is a list element that compares less than the search key at the end of the list (line 620, etc.); I think that'll need a longer list to test. I'll add something to the test suite for it.

Is there some well-documented binary search whose implementation we now match? It seems silly not to crib from Dijkstra or Knuth or whomever, given that we already screwed this up once, unless there's a great reason.

Hmm - I didn't look. Possible.

While we're bulletproofing this, it looks like (lo + hi)/2 will hit an integer overflow if we get an array of over one billion elements on an architecture where int is 32 bits. Might as well fix that too.

Yeah.

comment:8 Changed 7 years ago by cypherpunks

You don't care to answer even. Great.

comment:9 in reply to:  8 Changed 7 years ago by nickm

Replying to cypherpunks:

You don't care to answer even. Great.

I don't understand your question, and I don't like to be hassled about stuff just because you didn't get an answer in 3 hours. You're looking for what, my analysis of the bug? I'd be happy to post that. The #tor-internal channel? It's an invite-only channel on OFTC.

comment:10 in reply to:  7 ; Changed 7 years ago by nickm

Replying to andrea:

For 0.2.3 purposes, I wonder if some of the assertions couldn't turn into LD_BUG entries. What do you think?

Hmm - if we adopt a policy like that, won't it make it a pain in the ass to change all asserts to LD_BUG logs when dev branches become stable in the future? Maybe we should have a macro that turns into an assert or an LD_BUG as appropriate?

Maybe we should forget that I said "for 0.2.3 purposes" then and just look whether any of these asserts could/should be LD_BUG.

"No they shouldn't'" is one possible answer.

comment:11 in reply to:  7 Changed 7 years ago by nickm

Replying to andrea:

Is there some well-documented binary search whose implementation we now match? It seems silly not to crib from Dijkstra or Knuth or whomever, given that we already screwed this up once, unless there's a great reason.

Hmm - I didn't look. Possible.

Knuth (AOCP 6.2.1) has, approximately:

Given a table of records R_1..R_N whose keys are in increasing order K_1 < K_2 < ... K_N, this algorithm searches for a given argument K.

1. Set lower to 1, upper to N.
2. (invariant: if K is in the table, K_lower <= K <= K_upper.) If upper < lower, terminate unsuccessfully*. Otherwise let i = floor((lower+upper ) / 2).
3. If K < K_i, goto 4. If K > K_i, goto 5. else K=K_i; terminate successfully.
4. Set upper = i - 1 and goto 2.
5. Set lower = i + 1 and goto 2.

So, this one outputs a match, but doesn't trivially output an insertion point.

Python has a nice simple:

lo = 0; hi = len(a)
while lo < hi:
   mid = (lo+hi)//2
   if a[mid] < x: lo = mid + 1
   else: hi = mid
return lo

and that one *does* return an insertion point.

comment:12 Changed 7 years ago by nickm

Keywords: tor-relay denial-of-service added

For completeness: there's a DOS opportunity here, but I am pretty sure you need to be a directory server, or able to replace somebody's geoip file, to do it. A networkstatus vote with 0 or 1 entries, or a geoip file with 0 or 1 entries, or a networkstatus consensus with 0 or 1 entries, or a v2 networkstatus with 0 or 1 entries would all provoke a crash.

I am pretty sure that in the networkstatus cases above, there isn't a way to provoke these against a regular client or relay except by controlling the consensus of authorities -- in which case you already win.

The v2 networkstatus one means that any of the v2 authorities can take down any node that's fetching or caching v2 networkstatus information, including the other authorities.

The authorities might also be able to crash each other during the voting process; I'm not sure there.

There shouldn't be a way to wind up with a hostile geoip file.

Given the authorities' collectively status, I'm not going to run in circles shouting here, but we need to decide whether there's an 0.2.2 backport.

comment:13 in reply to:  12 Changed 7 years ago by andrea

Replying to nickm:

For completeness: there's a DOS opportunity here, but I am pretty sure you need to be a directory server, or able to replace somebody's geoip file, to do it. A networkstatus vote with 0 or 1 entries, or a geoip file with 0 or 1 entries, or a networkstatus consensus with 0 or 1 entries, or a v2 networkstatus with 0 or 1 entries would all provoke a crash.

I am pretty sure that in the networkstatus cases above, there isn't a way to provoke these against a regular client or relay except by controlling the consensus of authorities -- in which case you already win.

The v2 networkstatus one means that any of the v2 authorities can take down any node that's fetching or caching v2 networkstatus information, including the other authorities.

The authorities might also be able to crash each other during the voting process; I'm not sure there.

There shouldn't be a way to wind up with a hostile geoip file.

Given the authorities' collectively status, I'm not going to run in circles shouting here, but we need to decide whether there's an 0.2.2 backport.

Hmm - how long has it been since that function has even been changed? I'm going to guess "a long time" and that backporting would be a pretty easy matter of replacing it in the old branch, no merging required. If so, it seems like an obvious thing to do.

comment:14 Changed 7 years ago by nickm

It appears to have been unchanged since 2007.

comment:15 in reply to:  10 Changed 7 years ago by andrea

Replying to nickm:

Maybe we should forget that I said "for 0.2.3 purposes" then and just look whether any of these asserts could/should be LD_BUG.

"No they shouldn't'" is one possible answer.

To repeat and amplify from IRC: all the asserts in the new smartlist_bsearch_idx() I wrote are either "someone passed us a NULL argument" or "something is inconsistent with what the logic of the function requires should be the case, so this can only happen if this is horribly buggy." I don't think there's any reasonable thing the function can return in those cases, so asserting is the right choice.

comment:16 in reply to:  14 Changed 7 years ago by andrea

Replying to nickm:

It appears to have been unchanged since 2007.

Then I'm in favor of backporting.

comment:17 in reply to:  12 ; Changed 7 years ago by arma

Replying to nickm:

I am pretty sure you need to be a directory server

By server do you mean authority?

comment:18 in reply to:  17 Changed 7 years ago by nickm

Replying to arma:

Replying to nickm:

I am pretty sure you need to be a directory server

By server do you mean authority?

Yes.

comment:19 Changed 7 years ago by nickm

There's talk on #tor-dev of doing a minimal version of this for 0.2.2 and/or 0.2.3, and doing the full fix in less stable versions. For my idea of what a minimal one might look like, see branch "bug7191_minimal" in my public repository, which is TOTALLY UNTESTED AND UNREVIEWED.

comment:20 Changed 7 years ago by andrea

I have a new bug7191_v2 branch that changes the mid calculation to avoid arithmetic overflow if len > max_int / 2, and eliminates one case that turned out to be impossible. The improved test suite has 100% coverage of this version with gcov.

comment:21 in reply to:  19 Changed 7 years ago by andrea

Replying to nickm:

There's talk on #tor-dev of doing a minimal version of this for 0.2.2 and/or 0.2.3, and doing the full fix in less stable versions. For my idea of what a minimal one might look like, see branch "bug7191_minimal" in my public repository, which is TOTALLY UNTESTED AND UNREVIEWED.

That patch looks correct to me. Maybe it'd be worth you cherrypicking my test suite improvements to check it?

comment:22 Changed 7 years ago by nickm

Good idea; I tried to backport them in that branch. If I did so right, the branch seems to pass its test.

comment:23 in reply to:  22 Changed 7 years ago by andrea

Replying to nickm:

Good idea; I tried to backport them in that branch. If I did so right, the branch seems to pass its test.

Looks right to me.

comment:24 Changed 7 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

okay. bug7191_minimal_v2 goes into 0.2.2 and 0.2.3; bug7191_v2 goes into master. yet more testing and review would not be amiss though.

Note: See TracTickets for help on using tickets.