Opened 7 years ago

Closed 7 years ago

#7801 closed defect (fixed)

Our one use of tor_weak_random() is subtly wrong

Reported by: nickm Owned by:
Priority: Low Milestone: Tor: 0.2.4.x-final
Component: Core Tor/Tor Version:
Severity: Keywords: tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

In relay.c , we try to use tor_weak_random() to generate a 1/N random event using the pattern:

   if ((tor_weak_random() % N) == 0)

But that's subtly wrong. Many popular libcs' versions of random() use a linear congruential generator with a modulus that's a power of two, for which the low-order bits tend to have a much shorter period than the high-order bits. So we'd probably be better off with something more like:

  if (tor_weak_random() < TOR_RAND_MAX / N)

modulo rounding issues. Perhaps a tor_rand_int(long maxval) would be smarter still.

This isn't too big a problem, since if we're ever in a place where we can't tolerate not-too-random values, we shouldn't be using tor_weak_random(). Still, it's worth fixing.

Child Tickets

Change History (29)

comment:1 in reply to:  description Changed 7 years ago by cypherpunks

Replying to nickm:

But that's subtly wrong. Many popular libcs' versions of random() use a linear congruential generator with a modulus that's a power of two, for which the low-order bits tend to have a much shorter period than the high-order bits.

It's actually total wrong, and not because random() implementations.

    int num_streams = 0;
    for (conn = first_conn; conn; conn = conn->next_stream) {
      num_streams++;
      if ((tor_weak_random() % num_streams)==0)
        chosen_stream = conn;

Every first checking of condition happens with num_streams = 1;
Any (random) number % 1 == 0
As result chosen_stream = first_conn. always.

comment:2 Changed 7 years ago by cypherpunks

While you exams circuit_resume_edge_reading_helper, FYI:

for (conn=chosen_stream; conn; conn = conn->next_stream) {
  if (conn->_base.marked_for_close || conn->package_window <= 0)

will segfault if func called with first_conn == NULL.
It likely remotely exploitable.

comment:3 Changed 7 years ago by cypherpunks

wrong code part quote. correct one:

  for (conn = first_conn; conn != chosen_stream; conn = conn->next_stream) {
    if (conn->_base.marked_for_close || conn->package_window <= 0)
      continue;

comment:4 Changed 7 years ago by cypherpunks

ok, no segfault happens actually. some flips during code reading occurred.

comment:5 Changed 7 years ago by nickm

Yup. Note that the code isn't

    int num_streams = 0;
    for (conn = first_conn; conn; conn = conn->next_stream) {
      num_streams++;
      if ((tor_weak_random() % num_streams)==0) {
        chosen_stream = conn; break;
      }
    }  

Instead, it was:

    int num_streams = 0;
    for (conn = first_conn; conn; conn = conn->next_stream) {
      num_streams++;
      if ((tor_weak_random() % num_streams)==0)
        chosen_stream = conn;
      // no break here.
    }

So the first time through the loop, chosen_stream is always first_conn, but the second time through the loop, it becomes the second connection with P=1/2, and so on. I've tried to make this even more explicit in the comment.

And wrt the segfault, it can't happen because if first_conn is NULL, chosen_stream will be NULL. So in the first loop (The one after "FY:I") the loop condition will be false immediately, since conn will be set to NULL. And in the second loop (the one after "correct one:"), conn=first_conn will set conn to NULL, and conn!=chosen_stream will compare conn with NULL, so again the loop body is never executed.

comment:6 Changed 7 years ago by nickm

Status: newneeds_review

Branch "bug7801" is my public repository. Please review.

comment:7 Changed 7 years ago by cypherpunks

#if INT_MAX > TOR_RAND_MAX
#error tor_weak_random_range requires INT_MAX <= TOR_RAND_MAX
#endif

comment:8 Changed 7 years ago by cypherpunks

http://msdn.microsoft.com/en-us/library/2dfe3bzd.aspx

The constant RAND_MAX is the maximum value that can be returned by the rand function. RAND_MAX is defined as the value 0x7fff.

comment:9 Changed 7 years ago by nickm

Wow, what a completely useless rand().

comment:10 Changed 7 years ago by nickm

Status: needs_reviewneeds_revision

To be constructive: looks like it's time to drop in a LCG.

comment:11 Changed 7 years ago by nickm

Status: needs_revisionneeds_review

Okay, just dropped in an LCG implementation for _WIN32.

comment:12 Changed 7 years ago by cypherpunks

To be clear, it's not only windows limits RAND_MAX to so slow values. No guarantees to find RAND_MAX >= INT_MAX for all another platforms too. Only one guarantee that RAND_MAX is no less than 0x7fff.

comment:13 Changed 7 years ago by nickm

Agreed, but I can't find any other platform we support that has that that limitation. If there is, they'll see an #error, and we can write LCG code for them too.

comment:14 Changed 7 years ago by nickm

Hm. looks like it'll break on irix. Not sure if I care about irix.

comment:15 Changed 7 years ago by rransom

Is there a valid reason to put this much effort into keeping a bad RNG in Tor instead of #define tor_weak_random() crypto_rand_int(INT_MAX + 1)?

comment:16 Changed 7 years ago by rransom

#define tor_weak_random() crypto_rand_int(INT_MAX) would be faster, because crypto_rand_int does an abysmal job of handling values of max of the form ((1<<n) - 1).

comment:17 Changed 7 years ago by cypherpunks

#2210

We used random() because we didn't want to drain cryptographic entropy for this, since it's not necessary.

comment:18 in reply to:  17 Changed 7 years ago by rransom

Replying to cypherpunks:

#2210

We used random() because we didn't want to drain cryptographic entropy for this, since it's not necessary.

I asked for a valid reason. You can't ‘drain entropy’ from OpenSSL's RNG.

comment:19 Changed 7 years ago by nickm

I'd assume it was performance.

Doing a 4-byte RAND_bytes() is about 100-200X slower than random() on my laptop, and probably similarly different elsewhere. That's enough to turn the "pick a random element of the list of streams" algorithm (for 1000 streams) from a negligable 8 microsec to a hefty 800 microsec -- comparable to an entire onion handshake.

That said, it would probably be faster still just to count the streams, then do one call to a rng function.

comment:20 in reply to:  14 ; Changed 7 years ago by cypherpunks

Replying to nickm:

Hm. looks like it'll break on irix. Not sure if I care about irix.

What about (Open) Solaris?

comment:21 in reply to:  20 Changed 7 years ago by nickm

Replying to cypherpunks:

Replying to nickm:

Hm. looks like it'll break on irix. Not sure if I care about irix.

What about (Open) Solaris?

http://www.unix.com/man-page/opensolaris/3/random/ suggests that opensolaris has a decent range of outputs from random().

comment:22 Changed 7 years ago by cypherpunks

If manuals says truth then except Linux nothing else used RAND_MAX to limit values returned by random(). It seems like for Solaris RAND_MAX defined as 32767 (can't prove but "internet" says so), while random() still implemented properly like documented without explicit maximum values.

comment:24 Changed 7 years ago by cypherpunks

Some Solaris versions do not care to define RAND_MAX either.
Tor do not used RAND_MAX actually

#define TOR_RAND_MAX (RAND_MAX)

Except define that compiler do not care till actual using by code.
Around 5 of Tor relays running Solaris with some unclear file headers. Do you want to drop them?

comment:25 Changed 7 years ago by cypherpunks

you just merge it and close ticket. lets drop yet one good platform if you suddenly need to use RAND_MAX for suddenly weak PRNG. no matter that fix is broken non generalized kludge. Lets drop every platform except Linux.

comment:26 Changed 7 years ago by nickm

Status: needs_reviewneeds_revision

comment:27 Changed 7 years ago by nickm

Status: needs_revisionneeds_review

okay, "bug7801_v2" in my public repository is a rewrite. It uses a built-in LCG everywhere, and turns "weak rng" into a first-class type.

comment:28 Changed 7 years ago by nickm

Added a comment at andrea's request. I'll merge this rsn unless there are objections/problems.

comment:29 Changed 7 years ago by nickm

Resolution: fixed
Status: needs_reviewclosed

Merging. Thanks, all!

Note: See TracTickets for help on using tickets.