Opened 2 years ago

Last modified 2 months ago

#23061 assigned defect

crypto_rand_double() should produce all possible outputs on platforms with 32-bit int

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.2.2.14-alpha
Severity: Normal Keywords: fast-fix, tor-relay, security-low, privcount, 029-backport, review-group-22, 034-triage-20180328, 034-removed-20180328, 031-unreached-backport, 035-roadmap-subtask, 035-triaged-in-20180711, 040-unreached-20190109
Cc: catalyst, mikeperry, rl1987 Actual Points: 0.5
Parent ID: #26637 Points: 0.1
Reviewer: Sponsor: SponsorV-can

Description

On 32-bit platforms, crypto_rand_double() only produces 1 in every 2 million possible values between 0 and 1.

This happens because:

  • crypto_rand_double() divides a random unsigned int by UINT_MAX
  • an unsigned int on 32-bit platforms is 32 bits
  • the mantissa on a double is 53 bits

So crypto_rand_double() doesn't fill the lower 21 bits with random values.

This makes the rep_hist_format_hs_stats() noise more predictable on 32-bit platforms.

This fix shouldn't affect the unit tests, because they pass on 64-bit.

Child Tickets

TicketStatusOwnerSummaryComponent
#25368closedUpdate the Tor Rust coding standards for new typesCore Tor/Tor

Attachments (1)

uniform01.c (10.2 KB) - added by riastradh 10 months ago.
uniform [0,1] sampler

Download all attachments as: .zip

Change History (78)

comment:1 Changed 2 years ago by teor

Also, this is UINT_MAX_PLUS_ONE_AS_DOUBLE:

#define UINT_MAX_AS_DOUBLE 4294967296.0

comment:2 Changed 2 years ago by teor

Turns out that we only get 32 bits of random mantissa on 64-bit, so fixing this bug might be harder than I thought:

  /* Test for regression to bug 23061, where we produced excessively granular
   * random double values. */
  int low_bits_zero_count = 0;
  do {
    /* First check the value is within the range */
    d = crypto_rand_double();
    tt_assert(d >= 0);
    tt_assert(d < 1.0);
    /* Now check the granularity, by finding the last bit of the result.
     * doubles have a 53 bit mantissa. */
    d *= pow(2, 32);
    d -= trunc(d);
    low_bits_zero_count++;
    /* If the granularity is correct, this will fail with probability
     * 1 in 2**100, which is approximately the RAM bit error rate. */
    tt_assert(low_bits_zero_count <= 100);
  } while (d == 0.0);

comment:3 Changed 2 years ago by teor

Actual Points: 0.5
Keywords: 031-backport 030-backport 029-backport 028-backport-maybe 027-backport-maybe 026-backport-maybe added
Status: newneeds_review

See my branch rand-double-029, which is based on maint-0.2.9. It comes with unit tests and a rewritten crypto_rand_double() that uses rejection sampling. The new unit tests pass for me on macOS 10.12 x86_64 and i386.

We might want to backport this to 0.2.6, where we added laplace noise using this function. But when I tried to cherry-pick the commits, it was non-trivial. (Someone else is welcome to do this!)

The only users of crypto_rand_double() before 0.2.6 are tests, so we don't need to backport it any further.

comment:4 Changed 2 years ago by teor

Summary: crypto_rand_double() should produce all possible outputs on 32-bit platformscrypto_rand_double() should produce all possible outputs on platforms with 32-bit int

The title is wrong, this actually affects LP64 platforms as well.
And the bias affects all platforms.

comment:5 Changed 2 years ago by nickm

Where applicable, we should use the algorithm from http://allendowney.com/research/rand/ . Zack Weinberg wrote it up in C++ with https://github.com/TheTorProject/stegotorus/blob/master/src/rng.cc#L86 ; that may be worth looking at for inspiration.

Teor says (copying with permission) :

23:34 <+teor> Yes, that code is definitely better. But we'll need the following 
              tweaks:
23:35 <+teor> 1. Check if the internal double representation is the one we 
              expect, and if not, fall back to the code in my existing branch
23:35 <+teor> (There's no requirement in the C standard that double has a 
              particular binary representation)
23:36 <+teor> 2. Add a unit test that shows that *if* we're using IEEE754 
              doubles, then we can get a few values in (0, 1/2**64)
23:37 <+teor> 3. Fix the crypto_rand() mocking to stop assuming that we only 
              ask for 64 bits

comment:6 Changed 2 years ago by nickm

Owner: set to nickm
Status: needs_reviewaccepted

comment:7 Changed 2 years ago by nickm

Status: acceptedneeds_revision

comment:8 Changed 2 years ago by nickm

I have a rough attempt at implementing a rough approximation downey's algorithm in a "rand-double-029" branch in my public repo; how does the approach look? How scary are my shortcuts? Is there a better way to do this?

comment:9 Changed 2 years ago by teor

Yawning suggested https://github.com/libressl-portable/openbsd/blob/master/src/lib/libc/stdlib/drand48.c

It might be ok to only have 48 bits of entropy. Or we could extend that scheme to give us 52 (or >52 with subnormals).

comment:10 in reply to:  9 ; Changed 2 years ago by catalyst

Replying to teor:

It might be ok to only have 48 bits of entropy. Or we could extend that scheme to give us 52 (or >52 with subnormals).

Would we have to do something different to make sure the subnormal values are uniform with the others?

comment:11 in reply to:  10 Changed 2 years ago by teor

Replying to catalyst:

Replying to teor:

It might be ok to only have 48 bits of entropy. Or we could extend that scheme to give us 52 (or >52 with subnormals).

Would we have to do something different to make sure the subnormal values are uniform with the others?

If the architecture supports subnormals, ldexp() generates them, and + correctly handles them, then this code will work:

unsigned short rseed[4];
crypto_rand(rseed, sizeof rseed);
return ldexp((double) rseed[0], -64) +
       ldexp((double) rseed[1], -48) +
       ldexp((double) rseed[2], -32) +
       ldexp((double) rseed[3], -16);

It's much better than bit-stuffing, because it doesn't depend on how subnormals are implemented.

Modified from:
https://github.com/libressl-portable/openbsd/blob/master/src/lib/libc/stdlib/drand48.c

comment:12 Changed 2 years ago by catalyst

I think the multiple ldexp() approach is more understandable than Downey's algorithm. Are we performance-critical here?

comment:13 Changed 2 years ago by catalyst

In ba199f789922484b8a2b2efd909ad3dab124dd62, why define EXPECTED_RAND_MANTISSA_BITS as a hardcoded number instead of using DBL_MANT_DIG? (assuming you've already tested for FLT_RADIX==2)

Not directly related to any of the changes, but crypto_rand_double() could use a comment clarifying its contract. That would also help us make clear what we're trying to achieve. Are the double values assumed to be uniformly distributed in the range 0.0 <= d < 1.0? How much entropy is it supposed to have? Is it supposed to produce all representable double values? (assuming only the values that are more frequent than 2**-b where b is the claimed entropy; for IEC 60559 doubles, this means we can ignore subnormals unless we're using more than a thousand bits of entropy)

Regarding the ldexp() approach, we could save operations by doing them in 32-bit chunks (because I think we can assume at least 32 bits of mantissa for a double).

comment:14 in reply to:  13 Changed 2 years ago by teor

Replying to catalyst:

In ba199f789922484b8a2b2efd909ad3dab124dd62, why define EXPECTED_RAND_MANTISSA_BITS as a hardcoded number instead of using DBL_MANT_DIG? (assuming you've already tested for FLT_RADIX==2)

Because we might decide that the contract is N bits of mansissa, rather than DBL_MANT_DIG.
(For example, on my machine, we get 56. Before the patch, we got 31 on i386 and 32 on x86_64.)

Not directly related to any of the changes, but crypto_rand_double() could use a comment clarifying its contract. That would also help us make clear what we're trying to achieve. Are the double values assumed to be uniformly distributed in the range 0.0 <= d < 1.0? How much entropy is it supposed to have? Is it supposed to produce all representable double values? (assuming only the values that are more frequent than 2**-b where b is the claimed entropy; for IEC 60559 doubles, this means we can ignore subnormals unless we're using more than a thousand bits of entropy)

Yes, I agree we need to define the contract better.
We need uniform distribution.
While it would be nice to deliver all values in the range, values below 10-13 (or 2-43) are more likely to be produced by RAM bit errors than entropy, so at some point we can ignore small values.

Regarding the ldexp() approach, we could save operations by doing them in 32-bit chunks (because I think we can assume at least 32 bits of mantissa for a double).

As long as ldexp() supports 32 bits of input.

And we might need to modify the ldexp() approach, because the current one only gives 64 bits of entropy. The algorithm could be:

  1. Generate a random 64-bit double using ldexp()
  2. If it's zero, generate a random 64-bit double and scale down using ldexp() to the next 64 bits

Terminate when we reach our desired entropy (the contract), which should be based on:

  • the RAM bit error rate,
  • the maximum subnormal double bits (DBL_EXP_DIG + ceil(log2(DBL_MANT_DIG)).

Unless we expect really good RAM bit error rates in future RAM, I suggest we go for 64 or 128 bits of entropy, and leave it there. (Also, the likelihood of any smaller numbers is miniscule, and some calculations will turn them into zeroes or infinities anyway.)

comment:15 Changed 2 years ago by nickm

We have a lot of options here, depending on what we want! Let's try to collect the possible goals, and see which we care about.

Here are some goals I think we probably care about, but I could be wrong:

  • We should return a number uniformly at random in the range [0, 1.0). (That is, for all "suitable" x<y in [0,1.0), we should return a value in [x,y] with probability very close to y-x. Defining "suitable" and "very close" will be important, and might not include every possible double.)
  • Return outputs with at least some minimum granularity. (i.e, for some granularity delta, if x is a possible output, and x ± delta is in [0.0, 1.0), then there exists a possible output between x and x ± delta other than x.)
  • Run with reasonable efficiency.
  • Run in constant time.
  • Use the whole mantissa, or almost the whole mantissa.
  • Provide at least some number of bits of entropy in the output.
  • Work at least to a minimal degree on all c99 platforms.

Here are some goals I think we do not care about, but I could be wrong:

  • Work perfectly on systems where FLT_RADIX is not 2.
  • Provide identical output on all architectures regardless of floating-point implementation.
  • Return every possible output with some probability. (For example, values less than 1e-300 are _possible_ doubles. But they have cumulative probability of 1e-300, which is less likely than just guessing the RNG seed on the first try.)
  • Possibly return subnormal values.
  • Perfect behavior on corner cases with total probability less than some epsilon (maybe 2-96)?
  • Run as fast as possible.

comment:16 Changed 2 years ago by teor

I think one way we could choose between our goals is to look at how the function is used (and could be used for privcount-in-tor's guassians, and could be used in other places where we synthesise a random double using a similar method).

For example, if we used the naïve algorithm that divides [0, UINT64_MAX] by (UINT64_MAX+1), I think we get a pattern like:
0, 1/264, 2/264, 3/264, ... , 253/264, 253/264, (253+2)/264, ... , (264 - 211 - 210)/264 (~211 times), 1.0 (~210 times)
due to representation limits (the details would vary depending on the rounding mode and possibly the hardware).

I wonder if this satisfies the requirements for our random noise distributions (which is where we mainly use this function) after being passed through the laplace and guassian transforms.

We should document their range and granularity in a similar level of detail, too.

Last edited 2 years ago by teor (previous) (diff)

comment:17 in reply to:  15 Changed 2 years ago by yawning

Replying to nickm:

Here are some goals I think we probably care about, but I could be wrong:

If that's what you want:

double uint64_to_dbl_0_1(uint64_t x) {
  /* Can't merely check for __STDC_IEC_559__ because not every compiler we care about defines it. */
#if FLT_RADIX != 2
#error FLT_RADIX != 2, your system is the most special of them all.
#endif
  x >>= (sizeof(double) * CHAR_BIT) - DBL_MANT_DIG;
  return (DBL_EPSILON/2) * x;
}
  • We should return a number uniformly at random in the range [0, 1.0). (That is, for all "suitable" x<y in [0,1.0), we should return a value in [x,y] with probability very close to y-x. Defining "suitable" and "very close" will be important, and might not include every possible double.)

Check.

  • Return outputs with at least some minimum granularity. (i.e, for some granularity delta, if x is a possible output, and x ± delta is in [0.0, 1.0), then there exists a possible output between x and x ± delta other than x.)

Check.

  • Run with reasonable efficiency.

Check.

  • Run in constant time.

Check.

  • Use the whole mantissa, or almost the whole mantissa.

Check.

  • Provide at least some number of bits of entropy in the output.

Check.

  • Work at least to a minimal degree on all c99 platforms.

If people want to run tor on something that is exotic to the point where this sort of approach breaks, they can send patches.

Yes this still leaves out "possible" values, but it trivially accomplishes uniform, fast, and constant time.

Edit: Left out a }.

Last edited 2 years ago by yawning (previous) (diff)

comment:18 Changed 2 years ago by nickm

Status: needs_revisionneeds_review

That actually _does_ look fairly good. Let's review that approach.

comment:19 in reply to:  18 Changed 2 years ago by yawning

Replying to nickm:

That actually _does_ look fairly good. Let's review that approach.

As an incremental improvement:

double uint64_to_dbl_0_1(uint64_t x) {
  /* Can't merely check for __STDC_IEC_559__ because not every compiler we care about defines it. */
#if FLT_RADIX != 2
#error FLT_RADIX != 2, your system is the most special of them all.
#endif

  uint64_t mask = ((uint64_t)1 << DBL_MANT_DIG) - 1;
  double half_epsilon = 1.0 / (mask + 1);

  x &= mask;
  return half_epsilon * x;
}

While the assumption I made about how DBL_EPSILON is defined in relation to FLT_RADIX and DBL_MANT_DIG should hold everywhere that we care about, this is cleaner.

If there is to be unit tests or build time tests in autoconf, the following invariants should hold:

  • DBL_MANT_DIG > 0
  • half_epsilon == DBL_EPSILON/2, half_epsilon != DBL_EPSILON
  • uint64_to_dbl_0_1(UINT64_MAX) != 1.0
  • uint64_to_dbl_0_1(UINT64_MAX) + (DBL_EPSILON/2) == 1.0
  • uint64_to_dbl_0_1(UINT64_MAX) + DBL_EPSLION == 1.0
  • uint64_to_dbl_0_1(0) == 0

comment:20 Changed 2 years ago by nickm

Keywords: review-group-22 added

comment:21 Changed 2 years ago by nickm

Okay, I've tried it out, and gotten some results. Apparently when Yawning's code above is used, Teor's tests no longer report that 53 bits of mantissa are set -- only 52 bits are set. I think this is okay, but can we do better? See branch here for more: feature23061_v2_yawning.

I tried the ldexp() trick too, but that fails the test that uint64_to_dbl_0_1(UINT64_MAX) < 1.0 if have all four ldexp() calls. If I remove the lowest-magnitude one, it's fine, but of course we only get 48 bits.

comment:22 in reply to:  21 ; Changed 2 years ago by catalyst

Replying to nickm:

I tried the ldexp() trick too, but that fails the test that uint64_to_dbl_0_1(UINT64_MAX) < 1.0 if have all four ldexp() calls. If I remove the lowest-magnitude one, it's fine, but of course we only get 48 bits.

Oh right, it'll round off the bits past DBL_MANT_DIG. Maybe mask off all but the high 5 bits?

comment:23 Changed 2 years ago by catalyst

If we want more entropy than DBL_MANT_DIG we could look at the exponent to decide how many bits to mask when adding the last chunk of randomness, but of course that might not be constant time. (I'm not sure we can assume basic floating point arithmetic will be constant time, so we might not care.)

comment:25 in reply to:  21 Changed 2 years ago by yawning

Replying to nickm:

Okay, I've tried it out, and gotten some results. Apparently when Yawning's code above is used, Teor's tests no longer report that 53 bits of mantissa are set -- only 52 bits are set. I think this is okay, but can we do better? See branch here for more: feature23061_v2_yawning.

The routine I provided has this property on "sane" systems:

uint64_to_dbl_0_1(1) == 1.1102230246251565e-16 /* 2^-53 */

I don't think anything more is actually useful.

comment:26 Changed 2 years ago by catalyst

The potential problem is less about omitting rare small values and more about missing lots of representable intermediate values. I think with this kind of approach we're losing half of the representable values between 0.25 and 0.5-2-54; 3/4 of the possible values between 0.125 and 0.25-2-55, etc., which was kind of the motivation for the Downey paper. We might decide we don't care, but I think that's highly dependent on what the caller ends up doing with the results. (We should also document this choice; the comment currently focuses on small rare values instead of "holes" in the range.)

edit: fix some exponents I miscomputed due to a sign error

Last edited 2 years ago by catalyst (previous) (diff)

comment:27 in reply to:  21 Changed 2 years ago by yawning

Replying to nickm:

but can we do better?

Yeah, fix the broken test.

do {
  d = crypto_rand_double(); // d = n * pow(2, -53) (Where n is the post mask sampled integer)

  /* Now check the granularity, by finding the lower bits of the result. */
  d *= pow(2, EXPECTED_RAND_MANTISSA_BITS);  // d *= pow(2, 53)
                                             // d  = n * pow(2, -53) * pow(2, 53)
                                             // d  = n
  d -= trunc(d);                             // d -= d (trunc(d) = d)
} while (d == 0.0);

comment:28 Changed 2 years ago by yawning

If the test assuming f(n-1) < f(n) < f(n+1) is ok, then you can do:

  double expected_increment = pow(FLT_RADIX, -DBL_MANT_DIG);

  assert(uint64_to_dbl_0_1(1) == expected_increment);

  // Hell, stick this in a for loop that randomly samples n, and asserts if you really care.
  assert(uint64_to_dbl_0_1(2) - uint64_to_dbl_0_1(1) == expected_increment);

comment:29 Changed 2 years ago by yawning

Replying to nickm:

but can we do better?

On a more serious attempt at giving an answer, as a non-portable "better" (that I still think is largely pointless), you can probably use a variant of my solution with long double and cast to double.

Note that the behavior of such a routine is entirely compiler and architecture dependent, and it'll probably be a lot slower when it works.

If it were up to me, I would fully document the behavior and be done with it, under the assumption that anyone doing something that requires a less leaky floating point abstraction can either scale their values correctly (and use integer math) or write their own sampler.

comment:30 in reply to:  22 Changed 2 years ago by teor

Replying to catalyst:

Replying to nickm:

I tried the ldexp() trick too, but that fails the test that uint64_to_dbl_0_1(UINT64_MAX) < 1.0 if have all four ldexp() calls. If I remove the lowest-magnitude one, it's fine, but of course we only get 48 bits.

Oh right, it'll round off the bits past DBL_MANT_DIG. Maybe mask off all but the high 5 bits?

This is still susceptible to the rounding mode in use at the time.
So yawning's bit twiddling is going to be more reliable and robust.

comment:31 Changed 2 years ago by teor

Keywords: 027-backport-maybe removed

0.2.7 is EOL.

comment:32 Changed 2 years ago by teor

If we use uint64_to_dbl_0_1(), then each random double is of the form N*2-53, N = 0..(253 - 1).

This is the image that sample_laplace_distribution() produces from that domain:

rand_double Output Calculation Output Value
0 INT64_MIN - 263
1 * 2-53 - ln(2*2-53) * delta_f / epsilon 36.04 * delta_f / epsilon
2 * 2-53 - ln(4*2-53) * delta_f / epsilon 35.35 * delta_f / epsilon
... ... ...
0.5 - 2-53 - ln(1.0 - 2*2-53) * delta_f / epsilon 2.22 * delta_f / epsilon
0.5 - ln(1.0) * delta_f / epsilon 0.0 * delta_f / epsilon

0.5 to 1.0 - 2-53 has the same pattern, but mirrored and negative.

The greatest delta_f / epsilon we currently have is 2048 / 0.3 = 6826.66, so the maximum noise is 246033, with a discontinuity that destroys the signal when 0.0 maps to INT64_MIN (let's track that bug in #23323).

The gap between 0.5 and 0.5 - 2-53 concerns me, because I think we need the smallest possible noise to be 1.0, rather than 2.22 * delta_f / epsilon = 2.22 * 8 / 0.3 = 59.2. But maybe this granularity is OK.

comment:33 Changed 2 years ago by teor

Using the code in nickm/privcount_nm_v2_shake_032, this is the image that sample_unit_gaussian() would produce from that domain, if it used crypto_rand_double() to generate x and y (see #23147).

First, let's look at y and r:

y r Output Calculation r Output Value
0 sqrt(-2.0*ln(1.0 - 0.0) -0.0
1 * 2-53 sqrt(-2.0*ln(1.0 - 2-53)) 1.49*10-8
2 * 2-53 sqrt(-2.0*ln(1.0 - 2*2-53)) 2.11*10-8
... ... ...
1.0 - 2*2-53 sqrt(-2.0*ln(2*2-53)) 8.49
1.0 - 2-53 sqrt(-2.0*ln(2-53)) 8.57

Then x and sin(theta), near-zero points:

x sin(theta) Output Calculation sin(theta) Output Value
0 sin(2*pi*0.0) 0.0
1 * 2-53 sin(2*pi*2-53) 6.98*10-16
... ... ...
0.5 - 2-53 sin(2*pi*(0.5 - 2-53)) 1.01*10-15
0.5 sin(2*pi*0.5) 1.22*10-16
0.5 + 2-53 sin(2*pi*(0.5 + 2-53)) -7.66*10-16
... ... ...
1.0 - 2-53 sin(2*pi*(1.0 - 2-53)) -1.13*10-15

Then x and sin(theta), near-one points:

x sin(theta) Output Calculation sin(theta) Output Value
0.25 - 2-53 sin(2*pi*(0.25 - 2-53)) 1.0
0.25 sin(2*pi*0.25) 1.0
0.25 + 2-53 sin(2*pi*(0.25 + 2-53)) 1.0
... ... ...
0.75 - 2-53 sin(2*pi*(0.75 - 2-53)) -1.0
0.75 sin(2*pi*0.75) -1.0
0.75 + 2-53 sin(2*pi*(0.75 + 2-53)) -1.0

This means that the largest values are:

x y r * sin(theta) Output Value
0.25 1.0 - 2-53 8.57
0.75 1.0 - 2-53 -8.57

And given the slope of the sine curve at those points, we should be able to get nearby values easily.

And the smallest values are:

x y r * sin(theta) Output Value
0.0 any 0.0
0.5 2-53 1.82*10-24
0.5 + 2-53 2-53 -1.14*10-23

Typical standard deviations in experimental PrivCount go from 103 to 1010. I think we need the smallest possible noise to be 1.0, so this granularity looks OK. (But it might limit our standard deviations to about 1023 or so, which could be a problem. I need to check this with Aaron.)

comment:34 Changed 2 years ago by teor

There isn't any point trying to include values between 0 and 2-53 in the output of the function. Because both the Laplace and Gaussian functions do 1.0 - crypto_rand_double(), which rounds values smaller than 2-53.

comment:35 in reply to:  34 ; Changed 2 years ago by catalyst

Replying to teor:

There isn't any point trying to include values between 0 and 2-53 in the output of the function. Because both the Laplace and Gaussian functions do 1.0 - crypto_rand_double(), which rounds values smaller than 2-53.

Is there also no point in trying to fill the holes of missing representable values that are between the steps of size 2-53? (losing one bit of precision with probability 0.5; two bits with probability 0.25; etc.) Also intermediate results with greater precision may occur (e.g., on x87 -- this is probably how you got greater than 53 bits of precision reported in your tests with the "naive" approach) and maybe we should allow those? Or does calling into tor_mathlog() round it back down to double precision anyway?

comment:36 in reply to:  35 ; Changed 2 years ago by yawning

Replying to catalyst:

Also intermediate results with greater precision may occur (e.g., on x87 -- this is probably how you got greater than 53 bits of precision reported in your tests with the "naive" approach) and maybe we should allow those?

Unless teor's system or compiler is outdated garbage it will use the SSE2 unit for floating point math.

comment:37 in reply to:  36 Changed 2 years ago by cypherpunks

Replying to yawning:

Unless teor's system or compiler is outdated garbage it will use the SSE2 unit for floating point math.

Latest systems support x87, latest compilers (excl. some garbage) support x87. They will use what you ask them to. SSE2 unit is an outdated garbage. SSEx is deprecated. Compiler which will use the SSE2 unit for floating point math by default is an outdated garbage.

comment:38 in reply to:  35 Changed 2 years ago by teor

Replying to catalyst:

Replying to teor:

There isn't any point trying to include values between 0 and 2-53 in the output of the function. Because both the Laplace and Gaussian functions do 1.0 - crypto_rand_double(), which rounds values smaller than 2-53.

Is there also no point in trying to fill the holes of missing representable values that are between the steps of size 2-53? (losing one bit of precision with probability 0.5; two bits with probability 0.25; etc.) Also intermediate results with greater precision may occur (e.g., on x87 -- this is probably how you got greater than 53 bits of precision reported in your tests with the "naive" approach) and maybe we should allow those? Or does calling into tor_mathlog() round it back down to double precision anyway?

I don't believe there is any point in trying for any more precision than N*2-53 (N integer) in a generic function. If we need more than that for a specific distribution, we should change the transform we use for the distribution, or feed it random bits directly.

Edit: be more precise

Last edited 2 years ago by teor (previous) (diff)

comment:39 Changed 2 years ago by teor

Status: needs_reviewneeds_revision

I've split the issues with calling functions into their own tickets.

I suggest we fix the broken unit tests, and then merge feature23061_v2_yawning.

comment:40 in reply to:  39 Changed 2 years ago by yawning

Replying to teor:

I've split the issues with calling functions into their own tickets.

I suggest we fix the broken unit tests, and then merge feature23061_v2_yawning.

It's probably also worth checking FLT_RADIX and DBL_MANT_DIG at build time, and refusing to build if the compilers idea of a double is something exotic.

comment:41 Changed 2 years ago by teor

Parent ID: #23126

We want to fix this before we deploy HS v3 stats.

comment:42 Changed 2 years ago by nickm

Defer more needs_revision items to 0.3.3

comment:43 Changed 2 years ago by nickm

Milestone: Tor: 0.3.2.x-finalTor: 0.3.3.x-final

comment:44 Changed 22 months ago by catalyst

Cc: catalyst added

comment:45 Changed 22 months ago by catalyst

Section 5.2 of https://pdfs.semanticscholar.org/2f2b/7a0d5000a31f7f0713a3d20919f9703c9876.pdf describes one way to sample uniformly from all the representable floating point numbers in [0, 1). It's not clear to me whether including the numbers with ULPs less than 2-53 is required for the success of the snapping mitigation described in that paper.

comment:46 in reply to:  45 Changed 22 months ago by teor

Replying to catalyst:

Section 5.2 of https://pdfs.semanticscholar.org/2f2b/7a0d5000a31f7f0713a3d20919f9703c9876.pdf describes one way to sample uniformly from all the representable floating point numbers in [0, 1). It's not clear to me whether including the numbers with ULPs less than 2-53 is required for the success of the snapping mitigation described in that paper.

It isn't required, and might even be counterproductive.
The floating point numbers in the proof in section 5.2 are of the form N*2-53, N integer.
(The proof uses a significand of 52 bits, and refers to 2-53 repeatedly.)

comment:47 in reply to:  45 Changed 22 months ago by yawning

https://i.imgur.com/WTDnnwE.gif

Replying to catalyst:

Section 5.2 of https://pdfs.semanticscholar.org/2f2b/7a0d5000a31f7f0713a3d20919f9703c9876.pdf describes one way to sample uniformly from all the representable floating point numbers in [0, 1). It's not clear to me whether including the numbers with ULPs less than 2-53 is required for the success of the snapping mitigation described in that paper.

See "Smoothing" under Section 5.1.

It isn't required, and might even be counterproductive.

Indeed.

comment:48 Changed 21 months ago by teor

We should revise or close these tickets based on Appendix C in the latest privcount-shamir spec:

https://github.com/teor2345/privcount_shamir/blob/noise-limits/doc/xxx-privcount-with-shamir.txt

The good news is that we probably don't need to care about extreme values or binning, because properly implemented noise has known limits, and doesn't need binning.

comment:49 Changed 20 months ago by teor

0.2.8 is EOL, so we won't be backporting to it.

comment:50 Changed 20 months ago by teor

Keywords: 028-backport-maybe removed

0.2.8 is EOL, so we won't be backporting to it.

comment:51 Changed 19 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final

comment:52 Changed 19 months ago by teor

Owner: changed from nickm to teor
Status: needs_revisionassigned

If you don't mind, Nick, I'll do this when I do the rest of the PrivCount in Tor code.
I have to wrote a function like this in Rust anyway, and it should be trivial to port to C.

comment:53 Changed 19 months ago by nickm

Sounds fine; I had fun programming the stuff above, and I look forward to what you do here.

comment:54 Changed 19 months ago by nickm

Keywords: 030-backport removed

Remove 030-backport from all open tickets that have it: 0.3.0 is now deprecated.

comment:55 Changed 19 months ago by nickm

Remove 026-backport from all open tickets that have it: 0.2.6 has been deprecated for some while.

comment:56 Changed 18 months ago by teor

Parent ID: #23126#25263

comment:57 Changed 17 months ago by nickm

Keywords: 034-triage-20180328 added

comment:58 Changed 17 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:59 Changed 17 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

comment:60 Changed 16 months ago by cypherpunks

Keywords: 026-backport-maybe removed
Version 0, edited 16 months ago by cypherpunks (next)

comment:61 Changed 14 months ago by teor

Keywords: 031-unreached-backport added; 031-backport removed

0.3.1 is end of life, there are no more backports.
Tagging with 031-unreached-backport instead.

comment:62 Changed 14 months ago by nickm

Keywords: 035-roadmap-subtask added
Milestone: Tor: unspecifiedTor: 0.3.5.x-final

comment:63 Changed 14 months ago by teor

Parent ID: #25263#22898

comment:64 Changed 14 months ago by teor

Parent ID: #22898#26637

comment:65 Changed 13 months ago by nickm

Keywords: 035-triaged-in-20180711 added

comment:66 Changed 13 months ago by nickm

Sponsor: SponsorQSponsorX-can

comment:67 Changed 13 months ago by nickm

Sponsor: SponsorX-canSponsorV-can

comment:68 Changed 11 months ago by nickm

Milestone: Tor: 0.3.5.x-finalTor: 0.3.6.x-final

Deferring privcount tickets in 0.3.5 to 0.3.6

comment:69 Changed 10 months ago by riastradh

If you take the uniform distribution on [0,1] or (0,1), i.e. the Lebesgue measure, and round it to the nearest floating-point number, the distribution on the result is the natural notion of the uniform distribution on floating-point numbers in [0,1]: the probability of each floating-point number is the width of the interval of real numbers rounded to that floating-point number.

Note that even if you round a sample from the open interval (0,1) of real numbers, the result might be rounded to 0 or 1, if it is in (0, 2emin - p - 1] or [1 - eps/4, 1):

  • The interval near 0 is of negligible size in any format larger than binary16 -- less than 2-126 in binary32 even if you use flush-to-zero, less than 2-1022 in binary64 -- so you don't need to worry about it and can safely compute, e.g., 1/x or log(x).
  • However, the probability of getting a real number in the interval near 1 is 2-54 in binary64, which is small but not negligible, so I recommend not rejecting it unless you must avoid 1, for instance because you need to compute log1p(-x).

In infinite-precision floating-point, the exponent is geometrically distributed in {-1, -2, -3, ...}, and the fractional part of the significand is a uniformly distributed infinite bit string. If we round this to finite precision, the exponent remains the same (unless rounded to subnormal or zero) and the significand is rounded to p bits. The outcome of rounding is not actually affected by more than a finite number of significand bits, with one tricky exception, so you don't need to sample an infinite stream of bits and round it; to get this, you can:

  1. flip a coin until you get one heads, to pick a geometrically distributed exponent;
  1. flip a coin >p+1 times to choose the uniformly distributed fractional part of a significand, and round it to p bits with a trick to avoid a bias toward `even'.

If you drew p bits for the fractional part of the significand, there would be no rounding, and you would exclude from the possible significands the interval (2 - eps/2, 2] of real numbers that are rounded to 2 -- small but not nonnegligible. If you draw >p bits and round it, then no real number in [1, 2] is excluded.

However, there is a slight bias toward `even' outcomes -- that is, outcomes whose least significant bit is zero -- because there's a nonzero probability that the finite significand you choose is a tie, exactly between two floating-point numbers, which is broken by choosing the even option. This event has probability zero in Lebesgue measure. You can simply break the ties by drawing >p+1 bits and forcing the least significant bit to be 1.

The attached file uniform01.c implements this algorithm in the function sample_uniform_01, with parameters chosen for IEEE 754 binary64 ('double-precision') floating-point.

As it happens, the variant algorithm that chooses exactly p bits and doesn't round them gives a uniform distribution on [0, 1 - eps/4), which incidentally effectively never returns zero, and so yields what you might think of 'the uniform distribution on floating-point numbers in (0, 1)'. I named it sample_uniform_01_open, and added some paranoia to clamp the result at the smallest subnormal in case your PRNG is broken. But it also works to just do rejection sampling on sample_uniform_01 and reduce the amount of code to worry about.

Note: This code does not run in constant time -- the time it takes is proportional to the magnitude of the exponent. (In principle, it may also take variable time for subnormal arithmetic, but the probability of that is negligible for any format larger than binary16.)

You could make it run in constant time by always flipping the maximum number of coins for an exponent, and counting the leading zero bits. To improve performance, you can safely truncate the exponent at, say, 128 -- if you flip 128 coins and you get tails every time, either your coin flipper is broken or there's a glitch in the matrix.

Last edited 10 months ago by riastradh (previous) (diff)

Changed 10 months ago by riastradh

Attachment: uniform01.c added

uniform [0,1] sampler

comment:70 Changed 10 months ago by riastradh

I should add about sample_uniform_01 vs sample_uniform_01_open that, generally, the only reason you should use sample_uniform_01_open is as a drop-in replacement for the current crypto_rand_double to avoid breaking downstream code that assumes it can't get the floating-point number 1.

If you want to take the log of the sample without getting -infinity, for instance to generate an exponential sample, you want a floating-point number in (0, 1], not a floating-point number in [0, 1). (Even better, for an exponential sample, you can use -log(p/2) if a fair coin toss turns up heads, and -log1p(-p/2) if it turns up tails -- then the sampler supports high precision negative and positive outputs; flip another coin to pick the sign, for a Laplace sampler.) If for some reason you are tempted to compute log1p(-p) or 1/(1 - p) downstream where p is uniform in [0, 1), better to change what happens downstream to compute log(p) or 1/p where p is in (0, 1] -- log1p and 1/(1 - p) are ill-conditioned near 1, so it's always better to avoid relying on computations involving such intermediate quantities.

The Mironov paper analyzes the use of (a) a correct uniform distribution like I provided (actually, sometimes he vacillates between (0,1) and (0,1] but I don't think it matters for the proof except maybe requiring a small adjustment to Eq. (7)), together with (b) a correctly rounded natural logarithm, and proves a differential privacy result, Theorem 1, under those assumptions.

The system's libm is unlikely to have a correctly rounded log, with maximum 0.5ulp error, corresponding to 2-53 relative error, but it is likely to have something very close to that -- almost certainly much better than 1ulp or 2-52 relative error. As a rough approximation, I think you can just take eta in Sec. 5.2 of the paper to be 2-52 instead of 2-53, and instead of the (1/lambda + 2-49B/lambda)-differential privacy asserted in Theorem 1, you get (1/lambda + 2-48B/lambda)-differential privacy.

P.S. If in your Laplace sampler you use the exponential sampler I described above -- toss a fair coin to decide whether to use -log(p/2) or -log1p(-p/2), instead of the standard unconditional -log(p) -- I bet you'll get a slightly better differential privacy bound. But it would take a bit of work to identify the better bound and prove it, and really, most of this is just to get confidence that the bound doesn't explode and is reasonably small in concrete parameters, not to prove the bestest most tightest bound possible.

comment:71 Changed 10 months ago by teor

Status: assignedneeds_revision

Thanks for this code, I'll integrate it into crypto_rand_double(), and make the necessary replacements in its callers.

This might not happen for a little while, because I have some rust code to write first.

comment:72 Changed 10 months ago by teor

Cc: mikeperry added

If Mike wants to modify crypto_rand_double(), I could do the review.

comment:73 Changed 10 months ago by nickm

Milestone: Tor: 0.3.6.x-finalTor: 0.4.0.x-final

Tor 0.3.6.x has been renamed to 0.4.0.x.

comment:74 Changed 7 months ago by teor

Keywords: 040-unreached-20190109 added
Milestone: Tor: 0.4.0.x-finalTor: unspecified

These tasks aren't essential for a PrivCount proof of concept: moving them to Tor: unspecified.

comment:75 Changed 5 months ago by teor

Keywords: fast-fix added

comment:76 Changed 5 months ago by rl1987

Cc: rl1987 added

comment:77 Changed 2 months ago by teor

Owner: teor deleted
Status: needs_revisionassigned
Note: See TracTickets for help on using tickets.