Opened 4 years ago

Closed 19 months ago

#17799 closed defect (worksforme)

Use a better PRNG unless OpenSSL starts using a better one on their own.

Reported by: teor Owned by: nickm
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: unspecified
Severity: Normal Keywords: tor-relay, tor-client, prng, crypto, review-group-34
Cc: Actual Points: 5
Parent ID: Points: 5
Reviewer: asn Sponsor:

Description

#17694 hashes important PRNG output with some system randomness before use, so that observed PRNG outputs are resistant to PRNG state analysis.

But almost all of Tor's use of PRNG outputs is observable from one or more locations outside Tor, whether in salts or nonces sent to other machines on the wire, or in the random choices made in guard, directory, and path selection.

We could hash all of the bytes coming from the PRNG to avoid this state exposure. (Although we might not need to use the system randomness source each time.)

Child Tickets

Attachments (3)

dieharder_test1.txt (11.5 KB) - added by asn 3 years ago.
dieharder_test2.txt (10.9 KB) - added by asn 3 years ago.
fix-typos+change-unsigned-to-size_t.patch (3.3 KB) - added by cypherpunks 3 years ago.
Fix typos in comments. Change "remaining" from unsigned to size_t, since it refers to memory.

Download all attachments as: .zip

Change History (72)

comment:1 Changed 4 years ago by nickm

Here is the design I would suggest for such a thing.

Replace crypto_rand() with a construction that fills an internal buffer by taking bytes from RAND_byes() and then passing them through SHAKE128 to expand them. Then yield bytes from that buffer as required. As yielding them, clear the buffer. When the buffer is exhausted, refill it.

If we want, we can seed the initial buffer with crypto_strongest_rand(), and seed subsequent buffers with a mixture of the tail-end of the previous buffer and RAND_bytes().

This should meet the design criteria described above, and provide better performance and security than the current openssl nonsense.

I can take care of this once the SHA3 branch (#17783) is merged.

comment:2 Changed 4 years ago by nickm

Status: newneeds_revision

I accidentally another prng.

Branch is shake_prng based on Yawning's branch for #17783.

Ready for quick looks, but not for serious review. It needs more documentation, real testing, free-on-exit, windows testing, etc.

comment:3 Changed 4 years ago by yawning

I don't mind this in principle.

But if we're *that* scared of OpenSSL's RAND_bytes(), we may as well wrap our newfangled PRNG construct in the appropriate OpenSSL struct and set it as the engine so that TLS and all the OpenSSL internal entropy requests use our thing as well.

We're essentially replacing a SHA based DRBG with a SHAKE based one, which still feels rather silly to me. (The strongest_random call gets a pass because we don't go through OpenSSL there).

Last edited 4 years ago by yawning (previous) (diff)

comment:4 Changed 4 years ago by nickm

See the second patch on my branch. :)

comment:5 in reply to:  4 Changed 4 years ago by yawning

Replying to nickm:

See the second patch on my branch. :)

Spiffy. Minor quibble with the code, don't use KECCAK_MAX_RATE like that (Yeah, I should have renamed it/hid it). Since you're instantiating SHAKE128, KECCAK_TARGET_TO_RATE(128) is what you want (Or 168, which is the SHAKE128 rate in bytes).

It'll still work, but you're lowering your security level. My improved version of the branch will be more opaque to avoid this confusion/pitfall and provide shake128_init/absorb/squeeze/free functions.

comment:6 Changed 4 years ago by nickm

I added a whole bunch of fixes and changes in shake_prng, quite likely breaking something.

Benchmark says it's 6.6x faster than openssl's PRNG for short-ish outputs. IMO this means there's no need to jump into the wacky world of libottery's additional hacks.

The main change still needed would be getting #17783 merged, and then rebasing this and cleanup up the branch.

comment:7 Changed 4 years ago by yawning

The Keccak authors have a paper regarding how to construct a sponge based PRNG. http://sponge.noekeon.org/SpongePRNG.pdf

comment:8 Changed 4 years ago by nickm

The Keccak sponge function is an invertible permutation, right? If so, unless I'm crazy or missing something, I don't think their design would provide backtracking resistance except when new entropy is added. Looks a teeny bit faster though, but probably not a huge amount.

comment:9 in reply to:  8 Changed 4 years ago by yawning

Replying to nickm:

The Keccak sponge function is an invertible permutation, right? If so, unless I'm crazy or missing something, I don't think their design would provide backtracking resistance except when new entropy is added. Looks a teeny bit faster though, but probably not a huge amount.

Indeed, though see 4.3. Compared to the construct you use, the main difference seemed performance related, I linked the paper mainly for reference. Rebasing against my take2 branch should be easy, let me know if it's not and I can fix it further.

comment:10 Changed 4 years ago by nickm

I think their performance is likely to be worse, but their space usage is likely to be better. If I'm doing the math right, and we're using SHAKE-128 or equivalent, and we assume that we want to have 2256 security against backtracking attacks, I get 166.7 bytes per Keccak-f call with 4K memory usage, and they get 152 bytes,with somewhere between 200 and 400 bytes of memory usage.

(I'm also assuming that reseed events are comparatively uncommon. If that's wrong, oops!)

comment:11 Changed 4 years ago by nickm

Shake_prng_v2 is rebased and mostly squashed.

comment:12 Changed 4 years ago by nickm

(I just did a force-update on it to clean up a couple of things; please don't hate me.)

comment:13 Changed 4 years ago by nickm

Status: needs_revisionneeds_review

I made a couple more changes to shake_prng_v2. Notably the pointer arithmetic is much simpler.

comment:14 Changed 4 years ago by nickm

Rebased and squashed as shake_prng_v3.

comment:15 Changed 4 years ago by nickm

Owner: set to nickm
Status: needs_reviewassigned

I wrote the code here, so setting myself as owner.

comment:16 Changed 4 years ago by nickm

Status: assignedneeds_review

... but they still need_review

comment:17 Changed 4 years ago by nickm

Milestone: Tor: 0.2.8.x-finalTor: 0.2.9.x-final

comment:18 Changed 4 years ago by nickm

Points: small/medium-remaining

comment:19 Changed 4 years ago by nickm

Keywords: TorCoreTeam201604 added

Every postponed needs_review ticket should get a review in April

comment:20 Changed 4 years ago by asn

Reviewer: asn

comment:21 Changed 4 years ago by yawning

Casual review/thoughts:

  • REPLACE_OPENSSL_RAND probably should be enabled by default once we actually deploy this, because this isn't any worse than OpenSSL's current RNG, and is likely better, with a compile time flag to not do this.
  • (Abstraction) Should we have a generic ability to allocate mlock()ed pages useable by other places of code? (I think yes, because common rlimit configs allow for up to 64 KiB worth of locked pages, and we can keep long term keying material in such for example).
  • (Paranoia) The code should call getrlimit() to ensure that the mlock() has a chance of succeeding. Failure in either case should be loud, hard, and always fatal on platforms where locked memory is supported. A config option to allow using a non-mlocked() backing store perhaps.
  • (Paranoia) Is there any reason (that isn't performance) to not hit up the syscall entropy source on supported platforms on each refill call?
  • (Paranoia) The memset() call in shake_prng_getbytes_raw() should be a memwipe(). Since this routine is called with the mutex held, wiping the consumed portion of the buffer probably can be moved out of the while() loop to above the invariant check (Tiny optimization).
  • (Bug) Child side of a fork() does not inherit mlock() status. Need to re-mlock() in the post fork handler.
  • We should run this through dieharder/testU01/the NIST suite or similar, just to say we did. Most CSPRNGs (even broken/horribad ones) will pass both tests, but it's better than nothing.

comment:22 Changed 3 years ago by nickm

Status: needs_reviewneeds_revision

comment:23 Changed 3 years ago by asn

Some small stuff, accompanying yawning's review:

  • I did not entirely understand why sh is a special structure inside shake_prng_t? It seems like other fields like remaining and ptr are only useful when combined with sh.buf, but then why aren't they also in sh? Would it be terrible to kill sh, and spill its contents into shake_prng_t? Alternatively, maybe we can replace sh with a more readable variable name?
  • When we call openssl_RAND_bytes() we now assert that the retval is > 0. In the past, we asserted that retval is >= 0. I don't know how exactly the retvals of openssl_RAND_bytes() work so I'm not sure if this is a bug or a feature.

Changed 3 years ago by asn

Attachment: dieharder_test1.txt added

Changed 3 years ago by asn

Attachment: dieharder_test2.txt added

comment:24 in reply to:  21 Changed 3 years ago by asn

Replying to yawning:

  • We should run this through dieharder/testU01/the NIST suite or similar, just to say we did. Most CSPRNGs (even broken/horribad ones) will pass both tests, but it's better than nothing.

That seemed like a good idea, so I tried to do it.

I moded my Tor so that it writes a file with about 1.9GB of random data (/tmp/rng.bin) on startup. You can find the patch in my branch shake_prng_v3_diehard_gen at my repo. I pushed it in case there is some bug in my method.

Then I installed dieharder (there is package in debian) and ran that file through dieharder. Then I generated another random file, and run that through dieharder again. In this ticket I attach the two resulting reports.

FWIW, I ran dieharder like this: $ dieharder -a -g 201 -f tmp/rng.bin

WRT the results, the PRNG seems to be doing fine for most of the tests. Although for some reason there are a few WEAK and FAILED tests in both tries. Specifically from the first file, I got:

         rgb_bitdist|  10|    100000|     100|0.00264377|   WEAK
      rgb_lagged_sum|  24|   1000000|     100|0.00011748|   WEAK

and from the second file I got:

      rgb_lagged_sum|   9|   1000000|     100|0.00196700|   WEAK
      rgb_lagged_sum|  19|   1000000|     100|0.00355957|   WEAK
      ...
      rgb_lagged_sum|  24|   1000000|     100|0.00000000|  FAILED

The two tests that seem to fail are rgb_bitdist and especially rgb_lagged_sum.

I will try to run the test again, and maybe verify my tor patch.

Maybe someone else wants to reproduce as well.

comment:25 in reply to:  21 Changed 3 years ago by nickm

Replying to yawning:

Casual review/thoughts:

  • REPLACE_OPENSSL_RAND probably should be enabled by default once we actually deploy this, because this isn't any worse than OpenSSL's current RNG, and is likely better, with a compile time flag to not do this.

Ack; let's add a new ticket.

  • (Abstraction) Should we have a generic ability to allocate mlock()ed pages useable by other places of code? (I think yes, because common rlimit configs allow for up to 64 KiB worth of locked pages, and we can keep long term keying material in such for example).

Agreed; this should be a separate ticket. It needs to be done carefully, though, so that there isn't just one address range in memory where all the sensitive stuff lives.

  • (Paranoia) The code should call getrlimit() to ensure that the mlock() has a chance of succeeding. Failure in either case should be loud, hard, and always fatal on platforms where locked memory is supported. A config option to allow using a non-mlocked() backing store perhaps.

Hm. I think I'm okay with just tor_assert(r==0) on the mlock call. We can give a nicer warning if anybody ever sees it; I suspect nobody will.

  • (Paranoia) Is there any reason (that isn't performance) to not hit up the syscall entropy source on supported platforms on each refill call?

Not really.

  • (Paranoia) The memset() call in shake_prng_getbytes_raw() should be a memwipe(). Since this routine is called with the mutex held, wiping the consumed portion of the buffer probably can be moved out of the while() loop to above the invariant check (Tiny optimization).

I don't think that the compiler can eliminate that memset, but sure. I've added a fixup commit.

  • (Bug) Child side of a fork() does not inherit mlock() status. Need to re-mlock() in the post fork handler.

Huh, I didn't know that. Added another fixup commit for that.

  • We should run this through dieharder/testU01/the NIST suite or similar, just to say we did. Most CSPRNGs (even broken/horribad ones) will pass both tests, but it's better than nothing.

comment:26 in reply to:  23 Changed 3 years ago by nickm

Replying to asn:

Some small stuff, accompanying yawning's review:

  • I did not entirely understand why sh is a special structure inside shake_prng_t? It seems like other fields like remaining and ptr are only useful when combined with sh.buf, but then why aren't they also in sh? Would it be terrible to kill sh, and spill its contents into shake_prng_t? Alternatively, maybe we can replace sh with a more readable variable name?

Added a fixup commit to explain why it's there.

  • When we call openssl_RAND_bytes() we now assert that the retval is > 0. In the past, we asserted that retval is >= 0. I don't know how exactly the retvals of openssl_RAND_bytes() work so I'm not sure if this is a bug or a feature.

retval==0 is an error too; looks like that's a separate bug. (Looking at the openssl code, I don't think it can happen though.)

comment:27 Changed 3 years ago by nickm

wrt dieharder, see the "WARNING!" section in the dieharder manpage.

comment:28 Changed 3 years ago by nickm

I also added a program to fill stdout with random bytes as src/test/test-rng , for testing with dieharder.

I've been doing:

./src/test/test-rng --emit | dieharder -a -g 200

comment:29 Changed 3 years ago by nickm

On the first dieharder run I tried, I only got this:

          sts_serial|   1|    100000|     100|0.99922038|   WEAK   

I'm going to try two more times, with different input sizes, and with -Y 1

comment:30 Changed 3 years ago by nickm

On the -Y1, everything passed. Now trying with a shorter internal buffer size, to avoid use of the large-case items

comment:31 Changed 3 years ago by nickm

Status: needs_revisionneeds_review

damn, found a major bug. This is why we test, and care about coverage, I guess. Now even more concerned about merging precipitously though. Added a fixup commit.

comment:32 Changed 3 years ago by cypherpunks

#define PRNG_CARRYFORWARD (PRNG_SHAKE_BITS * 2)

Did you mean something like (PRNG_SHAKE_BITS / 8 * 2), by any chance?

I'll keep reading.

comment:33 Changed 3 years ago by cypherpunks

static void *
new_prng_page(void)
{
  const size_t sz = sizeof(shake_prng_t);
  void *result = mmap(NULL, sz,
                      PROT_READ | PROT_WRITE,
                      MAP_ANON | MAP_PRIVATE,
                      -1, 0);
  tor_assert(result);

Bug: Failure is indicated with MAP_FAILED (-1).

In free_prng_page you test for !page, so if you don't like null addresses maybe you would want to

tor_assert(result && result != MAP_FAILED)

Nit: The Linux manpage says MAP_ANON is deprecated in favor of MAP_ANONYMOUS.

I have other comments, but I'll wait until I've read more as there are things I don't fully understand yet.

comment:34 Changed 3 years ago by nickm

Thanks for those; I've fixed the two issues above in a fixup commit.

comment:35 Changed 3 years ago by cypherpunks

Here are the rest of my comments. I wrote them inline while reading so I'm going to put a diff here. I hope that's not a problem. Also, I'm embedding it, instead of attaching, so that people can quote, if they want to reply. You can comfortably read it in a diff viewer (emacs/vim/meld/whatever).

Remarks:

  • I'm not familiar with Tor's code base, forgive me if I question "obvious" things.
  • This is a read-only review.
  • I didn't look too hard into the openssl hookup thingie.
  • Winblows code was ignored.
--- crypto_rng.c	1970-01-01 00:00:00.000000000 +0000
+++ crypto_rng-review.c	1970-01-01 00:00:00.000000000 +0000
@@ -60,7 +60,7 @@
 /* Use SHAKE256 */
 #define PRNG_SHAKE_BITS ( 256 )
 
-/* Ideal rate to add or remove bytes bytes to avoid needless Keccak-f calls */
+/* Ideal rate to add or remove bytes to avoid needless Keccak-f calls */
 #define PRNG_SHAKE_RATE ( KECCAK_RATE(PRNG_SHAKE_BITS) )
 
 /* How many Keccak-f calls do we do per buffer fill? Designed to keep the
@@ -113,7 +113,7 @@ typedef struct shake_prng_t {
   /**
    * How many bytes are left for the user in buf?
    * Invariant: we never return to user code with remaining == 0.
-   * Invariant: remaining <= sizeof(sh.buf)
+   * Invariant: remaining <= sizeof(shake_output.buf)
    */
   uint16_t remaining;
   /**
@@ -140,6 +140,8 @@ typedef struct shake_prng_t {
     uint8_t buf[PRNG_SHAKE_RATE * PRNG_SHAKE_COUNT - PRNG_CARRYFORWARD];
   } shake_output;
 } shake_prng_t;
+// cpunk: Why is it important for the integer members of shake_prng_t to be
+// fixed-size?
 
 /**
  * After how many refills of the buffer should we reseed from the OS prng?
@@ -167,10 +169,13 @@ typedef struct shake_prng_t {
  */
 #define PRNG_LIBC_BYTES 64
 
-/** This mutex protects the field the_prng, and all members of the_prng. */
-static tor_mutex_t prng_mutex;
-/** This field is the singleton prng allocated for Tor. */
+/** This points to the singleton shake_prng_t allocated for Tor. */
+// -> cpunk: Bitrotted: it's a pointer and to a structure, not field.
 static shake_prng_t *the_prng = NULL;
+/** This mutex protects all members of the_prng. */
+// -> cpunk: It clearly isn't being used to protect the pointer itself; also,
+//    moved below the declaration of the_prng, since the comment references it.
+static tor_mutex_t prng_mutex;
 
 static void shake_prng_reseed(shake_prng_t *prng);
 static void shake_prng_refill(shake_prng_t *prng,
@@ -194,6 +199,7 @@ static void shake_prng_test_invariants(c
 /**
  * Allocate and return memory for use in a shake PRNG. Set *<b>sz_out</b> to
  * the actual number of bytes allocated.
+ * -> cpunk: What sz_out?
  */
 static void *
 new_prng_page(void)
@@ -235,6 +241,7 @@ free_prng_page(void *page)
   memwipe(page, 0, sz);
 #ifdef HAVE_MLOCK
   munlock(page, sz);
+  // cpunk: munmap will do this, just saying.
 #endif
   munmap(page, sz);
 }
@@ -296,6 +303,8 @@ free_prng_page(void *page)
 void
 crypto_init_shake_prng(void)
 {
+  tor_assert(the_prng == NULL);
+
   /* Initialize the mutex */
   tor_mutex_init_nonrecursive(&prng_mutex);
 
@@ -304,10 +313,10 @@ crypto_init_shake_prng(void)
   /* We're trying to be about one page. */
   tor_assert(sizeof(prng->shake_output) <= 4096);
 
-  /* Technically, the C compiler is allowed to pad prng->sh for alignment.
-   * It shouldn't; nobody reasonable would define alignof(char) to something
-   * other than 1.  But let's make sure that the structure is as big as we
-   * want.
+  /* Technically, the C compiler is allowed to pad prng->shake_output for
+   * alignment.  It shouldn't; nobody reasonable would define alignof(char) to
+   * something other than 1.  But let's make sure that the structure is as big
+   * as we want.
    */
   tor_assert(sizeof(prng->shake_output) % PRNG_SHAKE_RATE == 0);
 
@@ -315,8 +324,17 @@ crypto_init_shake_prng(void)
   shake_prng_reseed(prng);
 
   shake_prng_test_invariants(prng);
+  // cpunk: shake_prng_reseed() already checked the invariants before
+  // returning.  Explicitly stating post- and pre-conditions would help in
+  // reducing redundance (besides being a basic good thing everyone should do).
 
   /* THEN put it in the static variable. */
+  // -> cpunk: What's so remarkable about THEN setting it?  The functions
+  //    called above don't look at the_prng; also, this part is obviously not
+  //    thread-safe, so the API must require that this function be called and
+  //    completed _before_ any possible concurrent access, and so it's
+  //    completely irrelevant at which point in the function you set the
+  //    variable.
   the_prng = prng;
 
 #ifdef REPLACE_OPENSSL_RAND
@@ -340,10 +358,16 @@ shake_prng_reseed(shake_prng_t *prng)
    *
    * (We don't need to worry about threads seeing an unseeded PRNG, since
    * we don't expose it to them till after it's seeded at least once.)
+   * -> cpunk: I say the reason why threads don't see an unseeded PRNG is that
+   *    _init() has to be called before any other function and before any
+   *    potential concurrent access, and it will call _reseed(), that's it.  In
+   *    any case, this explanation is probably better in _init() or the header
+   *    file as API documentation.
    */
   if (crypto_strongest_rand_raw(buf, PRNG_OS_BYTES)) {
     log_err(LD_CRYPTO, "Couldn't get os entropy to reseed shake prng. Dying.");
     tor_assert(0);
+    // cpunk: What happens here on "coverage" builds?
   }
 
   tor_mutex_acquire(&prng_mutex);
@@ -354,6 +378,11 @@ shake_prng_reseed(shake_prng_t *prng)
   prng->reseeding = 0;
   prng->last_reseeded = time(NULL);
   /* We reseeded from the OS, so now we can forget any old pid we were in. */
+  // -> cpunk: Why?  There's already a dedicated api for doing this:
+  //    _postfork().  Doing it here prevents _test_invariants() from failing
+  //    (all other invariants holding) if after a fork _reseed() is called
+  //    instead of _postfork(), and that's bad: _postfork() does more than
+  //    _reseed(), for example it also relocks the mmapped region!
 #ifdef HAVE_PID
   prng->pid = getpid();
 #endif
@@ -364,12 +393,16 @@ shake_prng_reseed(shake_prng_t *prng)
 }
 
 
+// cpunk: Maybe note why this is done: to avoid closing the loop and causing
+// infinite recursion.
 #ifdef REPLACE_OPENSSL_RAND
 #define openssl_RAND_bytes(b,n) RAND_OpenSSL()->bytes((b), (n))
 #define openssl_RAND_add(b,n,e) RAND_OpenSSL()->add((b), (n), (e))
 #else
 #define openssl_RAND_bytes(b,n) RAND_bytes((b),(n))
 #define openssl_RAND_add(b,n,e) do {} while (0)
+// cpunk: _add() is not used in this case.  Better not to define it, let the
+// compilation fail if it's used when it shouldn't.
 #endif
 
 /**
@@ -391,6 +424,7 @@ shake_prng_refill(shake_prng_t *prng, co
 {
   /* Structure for our fixed-length inputs. We leave any unused fields set
    * to 0, since that's easier than adding them conditionally. */
+  // -> cpunk: What do you mean?  You do add one field conditionally.
   struct {
     uint8_t from_ourself[PRNG_CARRYFORWARD];
     uint8_t from_openssl[PRNG_OPENSSL_BYTES];
@@ -402,6 +436,8 @@ shake_prng_refill(shake_prng_t *prng, co
   const char tweak[] = "shake prng update";
 
   memset(&input, 0, sizeof(input));
+  // cpunk: You could do "... input = { {0}, {0}, {0} }" instead, especially if
+  // you're really not going to conditionally add fields.
 
   ++prng->refill_count;
 
@@ -452,7 +488,8 @@ shake_prng_getbytes_raw(shake_prng_t *pr
   /* While we still want any bytes... */
   while (n) {
     /* How many bytes should we extract this time through the loop? */
-    size_t sz = n > prng->remaining ? prng->remaining : n;
+    // cpunk: Much clearer:
+    size_t sz = MIN(n, prng->remaining);
 
     /* Extract the bytes and clear them from the buffer as we do
      * (for backtracking resistance) */
@@ -486,6 +523,21 @@ shake_prng_getbytes_raw(shake_prng_t *pr
  */
 static void
 shake_prng_getbytes_large(shake_prng_t *prng, uint8_t *out, size_t n)
+// cpunk: So, this function has a different behaviour, compared to _raw(), in
+// that it pulls in less external entropy (openssl, libc) for the same amount
+// of bytes.  Is this significant?  Does it change its cryptographic profile?
+// Compare the following two:
+//
+//   1)
+//     crypto_rand(buf, desired_size)
+//
+//   2)
+//     not_large_size = 128
+//     for (i = 0; i < desired_size/not_large_size; ++i)
+//       crypto_rand(buf + i*not_large_size, not_large_size)
+//     crypto_rand(buf + i*not_large_size, desired_size - i*not_large_size)
+//
+// Where desired_size is large enough to cause refill(s).
 {
   keccak_state shake;
   uint8_t buf[PRNG_CARRYFORWARD];
@@ -562,6 +614,8 @@ crypto_rand_unmocked(char *to, size_t n)
  * MUST be called before using the PRNG again.  Most likely, Tor will
  * detect that you've messed up and crash.  But if you're unlucky, the PRNG
  * output will be (unsecurely!) repeated.
+ * -> cpunk: Oh, I get it, the same state in both processes results in the same
+ *    output.  The output is /duplicated/ better than repeated.
  *
  * Callers MUST NOT hold the mutex.
  */
@@ -602,12 +665,10 @@ crypto_shake_prng_check_reseed(int force
   const time_t now = time(NULL);
 
   tor_mutex_acquire(&prng_mutex);
-  int should_reseed = 0;
-  if (! the_prng->reseeding) {
-    should_reseed = force ||
-      (the_prng->refill_count > PRNG_RESEED_AFTER) ||
-      (the_prng->last_reseeded < now - PRNG_RESEED_AFTER_TIME);
-  }
+  // cpunk: This is equivalent and IMO clearer:
+  int should_reseed = !the_prng->reseeding && (force
+     || (the_prng->refill_count > PRNG_RESEED_AFTER)
+     || (the_prng->last_reseeded < now - PRNG_RESEED_AFTER_TIME));
   if (should_reseed) {
     /* We set 'reseeding' here, and check it above, so that we don't launch two
      * simultaneous reseeds. */
@@ -618,10 +679,10 @@ crypto_shake_prng_check_reseed(int force
    */
   tor_mutex_release(&prng_mutex);
 
-  if (!should_reseed)
-    return;
-
-  shake_prng_reseed(the_prng);
+  // cpunk: Single points of exit are nice, especially when they are as
+  // readable as the alternative:
+  if (should_reseed)
+    shake_prng_reseed(the_prng);
 }
 
 /**
@@ -630,11 +691,16 @@ crypto_shake_prng_check_reseed(int force
 void
 crypto_teardown_shake_prng(void)
 {
+  // cpunk: I would tor_assert(the_prng != NULL) here, but maybe you would
+  // if (!the_prng) return;
   tor_mutex_acquire(&prng_mutex);
   shake_prng_t *prng = the_prng;
   the_prng = NULL;
   shake_prng_test_invariants(prng);
   tor_mutex_release(&prng_mutex);
+  // cpunk: As with the init(), it should be pretty obvious that this function
+  // cannot run concurrently with others in the api, locking or not; so maybe
+  // document that?
 
   free_prng_page(prng);
 }
@@ -693,6 +759,7 @@ static void
 ossl_shake_cleanup(void)
 {
   crypto_teardown_shake_prng(); /* ???? */
+                                // -> cpunk: What is it?
 }
 
 static void
@@ -703,8 +770,11 @@ ossl_shake_add(const void *buf, int num,
 
     /* And feed a little back to openssl. */
     uint8_t b[16];
-    shake_prng_getbytes(the_prng, b, 16);
-    openssl_RAND_add(b, 16, entropy > 16 ? 16.0 : entropy);
+    // cpunk: Using sizeof(b) and MIN() it's clearer, IMO.  (Whatever size you
+    // give b is going to be so small that conversion/comparison to double must
+    // be safe, or we would have much bigger problems.)
+    shake_prng_getbytes(the_prng, b, sizeof(b));
+    openssl_RAND_add(b, sizeof(b), MIN(entropy, sizeof(b)));
   } else {
     /* Openssl thinks it's being clever; it just told us what time it is
      * or something like that.  This isn't worth stopping for.
@@ -734,6 +804,25 @@ static void
 usurp_openssl_rand_method(void)
 {
   RAND_set_rand_method(&ossl_shake_method);
+  // cpunk: From https://www.openssl.org/docs/manmaster/crypto/RAND_set_rand_method.html
+  //
+  //   RAND_set_default_method() makes meth the method for PRNG use. NB: This
+  //   is true only whilst no ENGINE has been set as a default for RAND, so
+  //   this function is no longer recommended.
+  //
+  // And later (RAND_METHOD is a typedef for rand_meth_st):
+  //
+  //   RAND_METHOD implementations are grouped together with other algorithmic
+  //   APIs (eg. RSA_METHOD, EVP_CIPHER, etc) in ENGINE modules.  If a default
+  //   ENGINE is specified for RAND functionality using an ENGINE API function,
+  //   that will override any RAND defaults set using the RAND API (ie.
+  //   RAND_set_rand_method()). For this reason, the ENGINE API is the
+  //   recommended way to control default implementations for use in RAND and
+  //   other cryptographic algorithms.
+  //
+  // IIUC, openssl seems to be saying this should be implemented as an engine
+  // instead; or maybe it should be compiled with OPENSSL_NO_ENGINE to make
+  // sure?  Just a thought.
 }
 #endif
 

Edit: I removed a hunk where I think I may have spoken too early: _postfork() seems weird, the only explanation I can give to what you do with the_prng there is that you're assuming that, at that point, there's still a single thread (because of fork()), so you do similar to what you do in _init(). Same comments should apply in that regard. However, you're still grabbing the lock, which is not needed if there's a single thread. But more importantly, locks and fork() do not mix well. I'm going to have to look at that again.

Last edited 3 years ago by cypherpunks (previous) (diff)

comment:36 Changed 3 years ago by nickm

Status: needs_reviewneeds_revision

comment:37 Changed 3 years ago by nickm

Changes incorporated, I believe, except as noted below.

+    // cpunk: What happens here on "coverage" builds?

I'm assuming you mean "coverage" builds that also set --disable-asserts-in-tests. They break; --disable-asserts-in-tests makes your tor broken. I'm adding a ticket (#18888) to make them warn even more loudly, in case somebody thinks it's a good idea to actually use that in production.

 shake_prng_getbytes_large(shake_prng_t *prng, uint8_t *out, size_t n)
+// cpunk: So, this function has a different behaviour, compared to _raw(), in
+// that it pulls in less external entropy (openssl, libc) for the same amount
+// of bytes.  Is this significant?  Does it change its cryptographic profile?
+// Compare the following two:
+//
+//   1)
+//     crypto_rand(buf, desired_size)
+//
+//   2)
+//     not_large_size = 128
+//     for (i = 0; i < desired_size/not_large_size; ++i)
+//       crypto_rand(buf + i*not_large_size, not_large_size)
+//     crypto_rand(buf + i*not_large_size, desired_size - i*not_large_size)
+//
+// Where desired_size is large enough to cause refill(s).

So, the entire "reseed periodically after a certain amount of data" business is a bit voodoo; I've added notes to the check_reseed function. I've added a bit more commentary to the getbytes_large function itself too.

   crypto_teardown_shake_prng(); /* ???? */
+                                // -> cpunk: What is it?

I think I was wondering whether it causes a redundant call to crypto_teardown_shake_prng().

   RAND_set_rand_method(&ossl_shake_method);
+  // cpunk: From https://www.openssl.org/docs/manmaster/crypto/RAND_set_rand_method.html
+  //
+  //   RAND_set_default_method() makes meth the method for PRNG use. NB: This
+  //   is true only whilst no ENGINE has been set as a default for RAND, so
+  //   this function is no longer recommended.
+  //
+  // And later (RAND_METHOD is a typedef for rand_meth_st):
+  //
+  //   RAND_METHOD implementations are grouped together with other algorithmic
+  //   APIs (eg. RSA_METHOD, EVP_CIPHER, etc) in ENGINE modules.  If a default
+  //   ENGINE is specified for RAND functionality using an ENGINE API function,
+  //   that will override any RAND defaults set using the RAND API (ie.
+  //   RAND_set_rand_method()). For this reason, the ENGINE API is the
+  //   recommended way to control default implementations for use in RAND and
+  //   other cryptographic algorithms.
+  //
+  // IIUC, openssl seems to be saying this should be implemented as an engine
+  // instead; or maybe it should be compiled with OPENSSL_NO_ENGINE to make
+  // sure?  Just a thought.
 }

I've left this comment in-place while I investigate. When I last looked at the openssl code, it appeared to be the case that OpenSSL's engines overrode any previous calls to set_rand_method, but that subsequent calls to set_rand_method overrode them.

comment:38 Changed 3 years ago by nickm

Status: needs_revisionneeds_review

comment:39 Changed 3 years ago by nickm

Looking at the implementation of RAND_get_rand_method and RAND_set_rand_method in openssl 1.0.0 and openssl master, I am pretty confident that calling RAND_set_rand_method will replace default_RAND_meth; I've added a comment to explain why.

comment:40 Changed 3 years ago by asn

BTW, maybe we could rename the variables n and sz in shake_prng_getbytes_raw() to something more readable? e.g. total_bytes_needed and n_bytes_to_extract? There was already a serious bug in that function so making its logic as clear as possible seems worthwhile.

As another question, why do we use mmap() to allocate space for the shake_prng_t? Couldn't we just use tor_malloc() which would get translated to mmap() anyway if the requested space is big enough? Are we doing it so that we can pass MAP_PRIVATE to mmap() or something?

comment:41 Changed 3 years ago by nickm

maybe we could rename the variables

Fair point; done.

As another question, why do we use mmap() to allocate space for the shake_prng_t?

We're using mmap to ensure that we get our own page. If we have our own page, we can use madvise and mlock (and eventually mprotect and minherit as needed) on it, whereas if we use malloc, we'll be sharing the page with other parts of the heap. Expanded the comment a little there.

comment:42 Changed 3 years ago by nickm

Oh, I hadn't actually used those. I've made a new branch, shake_prng_v3_nofork, to include the madvise and minherit calls, and explanations about why an extra layer of protection isn't crazy.

comment:43 in reply to:  35 Changed 3 years ago by cypherpunks

Replying to cypherpunks:

Edit: I removed a hunk where I think I may have spoken too early: _postfork() seems weird, the only explanation I can give to what you do with the_prng there is that you're assuming that, at that point, there's still a single thread (because of fork()), so you do similar to what you do in _init(). Same comments should apply in that regard. However, you're still grabbing the lock, which is not needed if there's a single thread. But more importantly, locks and fork() do not mix well. I'm going to have to look at that again.

I tried to reach a conclusive assessment in this regard, but I realised I don't understand the intended use-cases of this module in relation to forking and threading, so I cannot. So instead I will just tell you what bothers me:

  1. The module is trying to be thread-safe. I assume, then, that it's in fact going to be used in a multi-threaded environment.
  1. The module also anticipates the posiblity of forking the process *and* having *both* copies (parent and child) continue to run the same executable. Otherwise, _postfork's efforts would be for naught: if the child is going to exec or the parent is going to exit, there's no worry of duplicated PRNG state.

So threading and forking, what could go wrong? Well, fork(2) tells us:

  • The child process is created with a single thread — the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use of pthread_atfork(3) may be helpful for dealing with problems that this can cause.

As it is, _postfork() could deadlock while trying to grab prng_mutex. This mutex is not exposed to other compilation units, so is not possible to prevent the problem "from the outside" (except by adding an external, redundant, layer of locking, which is silly).

Unless it's implied that you use forking xor you use threading. This is what I can't say, since I don't know the use-cases. (And if this is the case, it should be enforced by the code: currently it isn't.)

If instead forking and threading have to be contemplated in combination (particularly forking after having created threads) then the solution, for this module, would involve doing something like:

_init()
{
	...
	pthread_atfork(acquire_locks, release_locks, reinit_locks)
	...
}

But if you look at this for more than 3 seconds, you start to wonder: Why is _postfork part of the API at all? Why not simply pass it to pthread_atfork? That way it even gets called automatically.

_init()
{
	...
	pthread_atfork(acquire_locks, release_locks, _child_postfork)
	...
}

_child_postfork()
{
	reinit locks
	relock mmapped region or remap
	reseed prng
	reset pid
	...
}

And if you look at _child_postfork for more than 2.46 seconds, you realise that it is doing almost the same as _init. The canonical ADT/OOP behaviour here would be to simply destroy the previous object and create a new one.

Still the issue of forking and threading is more complicated. Above I said that would be the solution /for this module/. The problem is that when you fork *every* address mapping is copied. So this problem with locks in inconsistent/indeterminate state can come from any side. Think about libraries: libssl? libevent? libc even?

So, what ho?

comment:44 Changed 3 years ago by nickm

Keywords: TorCoreTeam201605 TorCoreTeam-postponed-201604 added; TorCoreTeam201604 removed

April is over; calling these april tickets postponed into may.

comment:45 Changed 3 years ago by nickm

Sounds like we're in for a few more batches of threading revision before merge. So I'm really asking for review of everything *but* the threading code: the crypto needs to be solid, or we absolutely shouldn't merge this one.

Changed 3 years ago by cypherpunks

Fix typos in comments. Change "remaining" from unsigned to size_t, since it refers to memory.

comment:47 Changed 3 years ago by nickm

Keywords: review-group-1 added

comment:48 Changed 3 years ago by isabela

Points: small/medium-remaining2

comment:49 Changed 3 years ago by asn

Status: needs_reviewneeds_revision

Hello,

I did another review of this branch (minus the threading code as suggested by comment:45). I didn't find anything major, but here are some comments:

  • I get the following double init warning message when I start tor with this branch:
    [warn] crypto_init_shake_prng(): Bug: Attempt to double-initialize the RNG! (on Tor 0.2.8.0-alpha-dev 3413cff1c862248d)
    
    Here are some backtraces:
    (gdb) bt
    #0  crypto_init_shake_prng () at src/common/crypto_rng.c:300
    #1  0x00005555556a5ef3 in crypto_seed_rng () at src/common/crypto.c:2702
    #2  0x00005555556a6071 in crypto_early_init () at src/common/crypto.c:315
    #3  0x00005555556a6161 in crypto_early_init () at src/common/crypto.c:324
    #4  0x0000555555592858 in tor_init (argc=argc@entry=3, argv=argv@entry=0x7fffffffe4e8) at src/or/main.c:2912
    #5  0x0000555555593e05 in tor_main (argc=3, argv=0x7fffffffe4e8) at src/or/main.c:3576
    #6  0x000055555558dfe9 in main (argc=<optimized out>, argv=<optimized out>) at src/or/tor_main.c:30
    
    (gdb) bt
    #0  crypto_init_shake_prng () at src/common/crypto_rng.c:300
    #1  0x00005555556a5ef3 in crypto_seed_rng () at src/common/crypto.c:2702
    #2  0x000055555558f099 in add_entropy_callback (now=<optimized out>, options=<optimized out>) at src/or/main.c:1634
    #3  0x00005555555a75b8 in periodic_event_dispatch (fd=fd@entry=-1, what=what@entry=1, data=data@entry=0x5555559851e0 <periodic_events+160>) at src/or/periodic.c:46
    #4  0x00005555555a7861 in periodic_event_launch (event=event@entry=0x5555559851e0 <periodic_events+160>) at src/or/periodic.c:108
    #5  0x000055555558e97c in initialize_periodic_events_cb (fd=<optimized out>, events=<optimized out>, data=<optimized out>) at src/or/main.c:1351
    #6  0x00007ffff768a584 in ?? () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
    #7  0x00007ffff76883dc in event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5
    #8  0x00005555555922fc in run_main_loop_once () at src/or/main.c:2510
    #9  run_main_loop_until_done () at src/or/main.c:2556
    #10 do_main_loop () at src/or/main.c:2482
    #11 0x0000555555595805 in tor_main (argc=<optimized out>, argv=<optimized out>) at src/or/main.c:3599
    
  • I see this pattern everytime we call shake_prng_getbytes_raw() in crypto_rng.c:
      tor_mutex_acquire(&prng_mutex);
      shake_prng_getbytes_raw(prng, buf, sizeof(buf));
      tor_mutex_release(&prng_mutex);
    
    Maybe we can move the mutex acquisition to shake_prng_getbytes_raw(), instead of assuming that the caller will do it? Or maybe we can assert that the mutex is acquired inside that function?
  • I feel that the code can be almost unittestable with some mocking. For example, we could call shake_prng_getbytes_raw() and then make sure that the shake_prng_t elements/pointers are as expected. I think bugs like the one from comment:31 could be detected with a test like this. The integration tests are also very useful, so maybe unittests can be delayed?...

Marking ticket as needs_revision since the first comment should be addressed.

comment:50 Changed 3 years ago by nickm

Actual Points: 5
Points: 25

Merged cpunk's patch as 71e55898438bcef88060d0d32fc4c6b31f2cc4aa.

Fixed double-init in b6ec4d3a8ace6b49fb433f0d8e596c683ef6abee.

Lowered mutex management in 6f12d0c3177f8aea706a3581ebf97f979a334858

Fixed a never-actually-free-it bug in ac1b0027cdd5f261c21ff90f2648a5d2df179010

Added glass-box tests in ce71dbfbccbc8c42739f2738fa8e01e9aa294675 and 67f2154bc26917d70d57e43b7b543570395b2987.

(The glass-box tests make sure all the deterministic parts work the way we would hope. They don't test that shake_prng_refill does exactly as it should, because that takes feedback from external RNGs.)

comment:51 Changed 3 years ago by nickm

Status: needs_revisionneeds_review

Added a comment about the fork/threads stuff too. Short version: We're extra-afraid of forks, even though we never actually do any of the dangerous kind.

comment:52 Changed 3 years ago by asn

New code and tests looks good. Tests seem to run fine. Double init bug seems to be addressed fine as well.

I would turn this to merge_ready but maybe you want to get more review?

comment:53 Changed 3 years ago by nickm

Keywords: review-group-2 added; review-group-1 removed

Everything in review-group-1 has had at least 1 review.

comment:54 Changed 3 years ago by nickm

Keywords: TorCoreTeam201605 removed

Remove "TorCoreTeam201605" keyword. The time machine is broken.

comment:55 Changed 3 years ago by nickm

Keywords: review-group-3 added; review-group-2 removed

These have all had at least one review during review-group-2.

comment:56 Changed 3 years ago by nickm

Keywords: review-group-4 added

comment:57 Changed 3 years ago by nickm

Milestone: Tor: 0.2.9.x-finalTor: 0.2.???

comment:58 Changed 3 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:59 Changed 3 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:60 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:61 Changed 2 years ago by nickm

Keywords: TorCoreTeam-postponed-201604 removed

comment:62 Changed 2 years ago by nickm

Keywords: review-group-3 removed

comment:63 Changed 2 years ago by nickm

Keywords: review-group-4 removed

comment:64 Changed 2 years ago by nickm

Keywords: tor-relay tor-client prng crypto added
Summary: Hash All PRNG output before useUse a better PRNG unless OpenSSL starts using a better one on their own.

comment:65 Changed 20 months ago by teor

Milestone: Tor: unspecifiedTor: 0.3.4.x-final

Move all needs_review tickets without a release to 0.3.4

comment:66 Changed 20 months ago by nickm

Keywords: review-group-34 added

comment:67 Changed 19 months ago by asn

Status: needs_reviewnew

Hmm, this ticket again! My intuition here is to NACK this for now even tho I feel like the code is probably OK, and that's for the following reasons:

  • The code here is 2 years old, and will add about 1k LoC to our PRNG subsystem. Even tho the code is probably OK, it's quite complicated and we will need to maintain it. This is also a pretty big feature for something not on the roadmap.
  • All this is done to protect against the threat of backdoored RNGs like Dual-EC, or horribly bad RNGs which get broken if some of their state gets leaked. I haven't heard of this threat since the Dual-EC thing happened, and not sure if trying to fix it on the Tor-layer is a good idea.
  • WRT fixing this on the Tor layer, this might be suboptimal since our other libraries (like our SSL lib) can also reveal the state of our RNG and hence leak it even tho we patched it in Tor. In general, I feel that if we don't trust OpenSSL or our crypto libraries we are already pretty-much screwed up...

In general, I feel inclined to put this back in Tor: unspecified or WONTFIX it and revive it in the future if we ever need to (hopefully not).

Nick also told me something about future OpenSSL releases changing their RNG algorithm too, but I could't find info about this...

Setting this is as new for now.
Nick, what do you think? This ticket was lots of work and I don't want to put it to waste so easily.

comment:68 Changed 19 months ago by asn

Milestone: Tor: 0.3.4.x-finalTor: unspecified

comment:69 Changed 19 months ago by nickm

Resolution: worksforme
Status: newclosed

Nick also told me something about future OpenSSL releases changing their RNG algorithm too, but I could't find info about this...

From the OpenSSL 1.1.1 changelog:

  *) Grand redesign of the OpenSSL random generator

     The default RAND method now utilizes an AES-CTR DRBG according to
     NIST standard SP 800-90Ar1. The new random generator is essentially
     a port of the default random generator from the OpenSSL FIPS 2.0
     object module. It is a hybrid deterministic random bit generator
     using an AES-CTR bit stream and which seeds and reseeds itself
     automatically using trusted system entropy sources.

     Some of its new features are:
      o Support for multiple DRBG instances with seed chaining.
      o Add a public DRBG instance for the default RAND method.
      o Add a dedicated DRBG instance for generating long term private keys.
      o Make the DRBG instances fork-safe.
      o Keep all global DRBG instances on the secure heap if it is enabled.
      o Add a DRBG instance to every SSL instance for lock free operation
        and to increase unpredictability.
     [Paul Dale, Benjamin Kaduk, Kurt Roeckx, Rich Salz, Matthias St. Pierre]

So yeah, I think it's fine for us to drop this. No worries; I had fun writing the code, but I don't need to maintain it forever.

Note: See TracTickets for help on using tickets.