Opened 8 years ago

Closed 8 years ago

# Our one use of tor_weak_random() is subtly wrong

Reported by: Owned by: nickm Low Tor: 0.2.4.x-final Core Tor/Tor tor-relay

### Description

In relay.c , we try to use tor_weak_random() to generate a 1/N random event using the pattern:

```   if ((tor_weak_random() % N) == 0)
```

But that's subtly wrong. Many popular libcs' versions of random() use a linear congruential generator with a modulus that's a power of two, for which the low-order bits tend to have a much shorter period than the high-order bits. So we'd probably be better off with something more like:

```  if (tor_weak_random() < TOR_RAND_MAX / N)
```

modulo rounding issues. Perhaps a tor_rand_int(long maxval) would be smarter still.

This isn't too big a problem, since if we're ever in a place where we can't tolerate not-too-random values, we shouldn't be using tor_weak_random(). Still, it's worth fixing.

## Child Tickets

### comment:1 in reply to:  description Changed 8 years ago by cypherpunks

Replying to nickm:

But that's subtly wrong. Many popular libcs' versions of random() use a linear congruential generator with a modulus that's a power of two, for which the low-order bits tend to have a much shorter period than the high-order bits.

It's actually total wrong, and not because random() implementations.

```    int num_streams = 0;
for (conn = first_conn; conn; conn = conn->next_stream) {
num_streams++;
if ((tor_weak_random() % num_streams)==0)
chosen_stream = conn;
```

Every first checking of condition happens with num_streams = 1;
Any (random) number % 1 == 0
As result chosen_stream = first_conn. always.

### comment:2 Changed 8 years ago by cypherpunks

While you exams circuit_resume_edge_reading_helper, FYI:

```for (conn=chosen_stream; conn; conn = conn->next_stream) {
if (conn->_base.marked_for_close || conn->package_window <= 0)
```

will segfault if func called with first_conn == NULL.
It likely remotely exploitable.

### comment:3 Changed 8 years ago by cypherpunks

wrong code part quote. correct one:

```  for (conn = first_conn; conn != chosen_stream; conn = conn->next_stream) {
if (conn->_base.marked_for_close || conn->package_window <= 0)
continue;
```

### comment:4 Changed 8 years ago by cypherpunks

ok, no segfault happens actually. some flips during code reading occurred.

### comment:5 Changed 8 years ago by nickm

Yup. Note that the code isn't

```    int num_streams = 0;
for (conn = first_conn; conn; conn = conn->next_stream) {
num_streams++;
if ((tor_weak_random() % num_streams)==0) {
chosen_stream = conn; break;
}
}
```

Instead, it was:

```    int num_streams = 0;
for (conn = first_conn; conn; conn = conn->next_stream) {
num_streams++;
if ((tor_weak_random() % num_streams)==0)
chosen_stream = conn;
// no break here.
}
```

So the first time through the loop, chosen_stream is always first_conn, but the second time through the loop, it becomes the second connection with P=1/2, and so on. I've tried to make this even more explicit in the comment.

And wrt the segfault, it can't happen because if first_conn is NULL, chosen_stream will be NULL. So in the first loop (The one after "FY:I") the loop condition will be false immediately, since conn will be set to NULL. And in the second loop (the one after "correct one:"), conn=first_conn will set conn to NULL, and conn!=chosen_stream will compare conn with NULL, so again the loop body is never executed.

### comment:6 Changed 8 years ago by nickm

Status: new → needs_review

Branch "bug7801" is my public repository. Please review.

### comment:7 Changed 8 years ago by cypherpunks

```#if INT_MAX > TOR_RAND_MAX
#error tor_weak_random_range requires INT_MAX <= TOR_RAND_MAX
#endif
```

### comment:8 Changed 8 years ago by cypherpunks

```The constant RAND_MAX is the maximum value that can be returned by the rand function. RAND_MAX is defined as the value 0x7fff.
```

### comment:9 Changed 8 years ago by nickm

Wow, what a completely useless rand().

### comment:10 Changed 8 years ago by nickm

Status: needs_review → needs_revision

To be constructive: looks like it's time to drop in a LCG.

### comment:11 Changed 8 years ago by nickm

Status: needs_revision → needs_review

Okay, just dropped in an LCG implementation for _WIN32.

### comment:12 Changed 8 years ago by cypherpunks

To be clear, it's not only windows limits RAND_MAX to so slow values. No guarantees to find RAND_MAX >= INT_MAX for all another platforms too. Only one guarantee that RAND_MAX is no less than 0x7fff.

### comment:13 Changed 8 years ago by nickm

Agreed, but I can't find any other platform we support that has that that limitation. If there is, they'll see an #error, and we can write LCG code for them too.

### comment:14 follow-up:  20 Changed 8 years ago by nickm

Hm. looks like it'll break on irix. Not sure if I care about irix.

### comment:15 Changed 8 years ago by rransom

Is there a valid reason to put this much effort into keeping a bad RNG in Tor instead of `#define tor_weak_random() crypto_rand_int(INT_MAX + 1)`?

### comment:16 Changed 8 years ago by rransom

`#define tor_weak_random() crypto_rand_int(INT_MAX)` would be faster, because `crypto_rand_int` does an abysmal job of handling values of `max` of the form `((1<<n) - 1)`.

### comment:17 follow-up:  18 Changed 8 years ago by cypherpunks

We used random() because we didn't want to drain cryptographic entropy for this, since it's not necessary.

### comment:18 in reply to:  17 Changed 8 years ago by rransom

Replying to cypherpunks:

We used random() because we didn't want to drain cryptographic entropy for this, since it's not necessary.

I asked for a valid reason. You can't ‘drain entropy’ from OpenSSL's RNG.

### comment:19 Changed 8 years ago by nickm

I'd assume it was performance.

Doing a 4-byte RAND_bytes() is about 100-200X slower than random() on my laptop, and probably similarly different elsewhere. That's enough to turn the "pick a random element of the list of streams" algorithm (for 1000 streams) from a negligable 8 microsec to a hefty 800 microsec -- comparable to an entire onion handshake.

That said, it would probably be faster still just to count the streams, then do one call to a rng function.

### comment:20 in reply to:  14 ; follow-up:  21 Changed 8 years ago by cypherpunks

Replying to nickm:

Hm. looks like it'll break on irix. Not sure if I care about irix.

What about (Open) Solaris?

### comment:21 in reply to:  20 Changed 8 years ago by nickm

Replying to cypherpunks:

Replying to nickm:

Hm. looks like it'll break on irix. Not sure if I care about irix.

What about (Open) Solaris?

http://www.unix.com/man-page/opensolaris/3/random/ suggests that opensolaris has a decent range of outputs from random().

### comment:22 Changed 8 years ago by cypherpunks

If manuals says truth then except Linux nothing else used RAND_MAX to limit values returned by random(). It seems like for Solaris RAND_MAX defined as 32767 (can't prove but "internet" says so), while random() still implemented properly like documented without explicit maximum values.

### comment:24 Changed 8 years ago by cypherpunks

Some Solaris versions do not care to define RAND_MAX either.
Tor do not used RAND_MAX actually

```#define TOR_RAND_MAX (RAND_MAX)
```

Except define that compiler do not care till actual using by code.
Around 5 of Tor relays running Solaris with some unclear file headers. Do you want to drop them?

### comment:25 Changed 8 years ago by cypherpunks

you just merge it and close ticket. lets drop yet one good platform if you suddenly need to use RAND_MAX for suddenly weak PRNG. no matter that fix is broken non generalized kludge. Lets drop every platform except Linux.

### comment:26 Changed 8 years ago by nickm

Status: needs_review → needs_revision

### comment:27 Changed 8 years ago by nickm

Status: needs_revision → needs_review

okay, "bug7801_v2" in my public repository is a rewrite. It uses a built-in LCG everywhere, and turns "weak rng" into a first-class type.

### comment:28 Changed 8 years ago by nickm

Added a comment at andrea's request. I'll merge this rsn unless there are objections/problems.

### comment:29 Changed 8 years ago by nickm

Resolution: → fixed needs_review → closed

Merging. Thanks, all!

Note: See TracTickets for help on using tickets.