Opened 2 years ago
Last modified 6 months ago
#23061 assigned defect
crypto_rand_double() should produce all possible outputs on platforms with 32-bit int
Reported by: | teor | Owned by: | |
---|---|---|---|
Priority: | Medium | Milestone: | Tor: unspecified |
Component: | Core Tor/Tor | Version: | Tor: 0.2.2.14-alpha |
Severity: | Normal | Keywords: | fast-fix, tor-relay, security-low, privcount, 029-backport, review-group-22, 034-triage-20180328, 034-removed-20180328, 031-unreached-backport, 035-roadmap-subtask, 035-triaged-in-20180711, 040-unreached-20190109 |
Cc: | catalyst, mikeperry, rl1987 | Actual Points: | 0.5 |
Parent ID: | #26637 | Points: | 0.1 |
Reviewer: | Sponsor: | SponsorV-can |
Description
On 32-bit platforms, crypto_rand_double() only produces 1 in every 2 million possible values between 0 and 1.
This happens because:
- crypto_rand_double() divides a random unsigned int by UINT_MAX
- an unsigned int on 32-bit platforms is 32 bits
- the mantissa on a double is 53 bits
So crypto_rand_double() doesn't fill the lower 21 bits with random values.
This makes the rep_hist_format_hs_stats() noise more predictable on 32-bit platforms.
This fix shouldn't affect the unit tests, because they pass on 64-bit.
Child Tickets
Ticket | Status | Owner | Summary | Component |
---|---|---|---|---|
#25368 | closed | Update the Tor Rust coding standards for new types | Core Tor/Tor |
Attachments (1)
Change History (78)
comment:1 Changed 2 years ago by
comment:2 Changed 2 years ago by
Turns out that we only get 32 bits of random mantissa on 64-bit, so fixing this bug might be harder than I thought:
/* Test for regression to bug 23061, where we produced excessively granular * random double values. */ int low_bits_zero_count = 0; do { /* First check the value is within the range */ d = crypto_rand_double(); tt_assert(d >= 0); tt_assert(d < 1.0); /* Now check the granularity, by finding the last bit of the result. * doubles have a 53 bit mantissa. */ d *= pow(2, 32); d -= trunc(d); low_bits_zero_count++; /* If the granularity is correct, this will fail with probability * 1 in 2**100, which is approximately the RAM bit error rate. */ tt_assert(low_bits_zero_count <= 100); } while (d == 0.0);
comment:3 Changed 2 years ago by
Actual Points: | → 0.5 |
---|---|
Keywords: | 031-backport 030-backport 029-backport 028-backport-maybe 027-backport-maybe 026-backport-maybe added |
Status: | new → needs_review |
See my branch rand-double-029, which is based on maint-0.2.9. It comes with unit tests and a rewritten crypto_rand_double() that uses rejection sampling. The new unit tests pass for me on macOS 10.12 x86_64 and i386.
We might want to backport this to 0.2.6, where we added laplace noise using this function. But when I tried to cherry-pick the commits, it was non-trivial. (Someone else is welcome to do this!)
The only users of crypto_rand_double() before 0.2.6 are tests, so we don't need to backport it any further.
comment:4 Changed 2 years ago by
Summary: | crypto_rand_double() should produce all possible outputs on 32-bit platforms → crypto_rand_double() should produce all possible outputs on platforms with 32-bit int |
---|
The title is wrong, this actually affects LP64 platforms as well.
And the bias affects all platforms.
comment:5 Changed 2 years ago by
Where applicable, we should use the algorithm from http://allendowney.com/research/rand/ . Zack Weinberg wrote it up in C++ with https://github.com/TheTorProject/stegotorus/blob/master/src/rng.cc#L86 ; that may be worth looking at for inspiration.
Teor says (copying with permission) :
23:34 <+teor> Yes, that code is definitely better. But we'll need the following tweaks: 23:35 <+teor> 1. Check if the internal double representation is the one we expect, and if not, fall back to the code in my existing branch 23:35 <+teor> (There's no requirement in the C standard that double has a particular binary representation) 23:36 <+teor> 2. Add a unit test that shows that *if* we're using IEEE754 doubles, then we can get a few values in (0, 1/2**64) 23:37 <+teor> 3. Fix the crypto_rand() mocking to stop assuming that we only ask for 64 bits
comment:6 Changed 2 years ago by
Owner: | set to nickm |
---|---|
Status: | needs_review → accepted |
comment:7 Changed 2 years ago by
Status: | accepted → needs_revision |
---|
comment:8 Changed 2 years ago by
I have a rough attempt at implementing a rough approximation downey's algorithm in a "rand-double-029" branch in my public repo; how does the approach look? How scary are my shortcuts? Is there a better way to do this?
comment:9 follow-up: 10 Changed 2 years ago by
Yawning suggested https://github.com/libressl-portable/openbsd/blob/master/src/lib/libc/stdlib/drand48.c
It might be ok to only have 48 bits of entropy. Or we could extend that scheme to give us 52 (or >52 with subnormals).
comment:10 follow-up: 11 Changed 2 years ago by
Replying to teor:
It might be ok to only have 48 bits of entropy. Or we could extend that scheme to give us 52 (or >52 with subnormals).
Would we have to do something different to make sure the subnormal values are uniform with the others?
comment:11 Changed 2 years ago by
Replying to catalyst:
Replying to teor:
It might be ok to only have 48 bits of entropy. Or we could extend that scheme to give us 52 (or >52 with subnormals).
Would we have to do something different to make sure the subnormal values are uniform with the others?
If the architecture supports subnormals, ldexp() generates them, and + correctly handles them, then this code will work:
unsigned short rseed[4]; crypto_rand(rseed, sizeof rseed); return ldexp((double) rseed[0], -64) + ldexp((double) rseed[1], -48) + ldexp((double) rseed[2], -32) + ldexp((double) rseed[3], -16);
It's much better than bit-stuffing, because it doesn't depend on how subnormals are implemented.
Modified from:
https://github.com/libressl-portable/openbsd/blob/master/src/lib/libc/stdlib/drand48.c
comment:12 Changed 2 years ago by
I think the multiple ldexp()
approach is more understandable than Downey's algorithm. Are we performance-critical here?
comment:13 follow-up: 14 Changed 2 years ago by
In ba199f789922484b8a2b2efd909ad3dab124dd62, why define EXPECTED_RAND_MANTISSA_BITS
as a hardcoded number instead of using DBL_MANT_DIG
? (assuming you've already tested for FLT_RADIX==2
)
Not directly related to any of the changes, but crypto_rand_double()
could use a comment clarifying its contract. That would also help us make clear what we're trying to achieve. Are the double values assumed to be uniformly distributed in the range 0.0 <= d < 1.0? How much entropy is it supposed to have? Is it supposed to produce all representable double values? (assuming only the values that are more frequent than 2**-b
where b
is the claimed entropy; for IEC 60559 doubles, this means we can ignore subnormals unless we're using more than a thousand bits of entropy)
Regarding the ldexp()
approach, we could save operations by doing them in 32-bit chunks (because I think we can assume at least 32 bits of mantissa for a double).
comment:14 Changed 2 years ago by
Replying to catalyst:
In ba199f789922484b8a2b2efd909ad3dab124dd62, why define
EXPECTED_RAND_MANTISSA_BITS
as a hardcoded number instead of usingDBL_MANT_DIG
? (assuming you've already tested forFLT_RADIX==2
)
Because we might decide that the contract is N bits of mansissa, rather than DBL_MANT_DIG.
(For example, on my machine, we get 56. Before the patch, we got 31 on i386 and 32 on x86_64.)
Not directly related to any of the changes, but
crypto_rand_double()
could use a comment clarifying its contract. That would also help us make clear what we're trying to achieve. Are the double values assumed to be uniformly distributed in the range 0.0 <= d < 1.0? How much entropy is it supposed to have? Is it supposed to produce all representable double values? (assuming only the values that are more frequent than2**-b
whereb
is the claimed entropy; for IEC 60559 doubles, this means we can ignore subnormals unless we're using more than a thousand bits of entropy)
Yes, I agree we need to define the contract better.
We need uniform distribution.
While it would be nice to deliver all values in the range, values below 10^{-13} (or 2^{-43}) are more likely to be produced by RAM bit errors than entropy, so at some point we can ignore small values.
Regarding the
ldexp()
approach, we could save operations by doing them in 32-bit chunks (because I think we can assume at least 32 bits of mantissa for a double).
As long as ldexp() supports 32 bits of input.
And we might need to modify the ldexp() approach, because the current one only gives 64 bits of entropy. The algorithm could be:
- Generate a random 64-bit double using ldexp()
- If it's zero, generate a random 64-bit double and scale down using ldexp() to the next 64 bits
Terminate when we reach our desired entropy (the contract), which should be based on:
- the RAM bit error rate,
- the maximum subnormal double bits (DBL_EXP_DIG + ceil(log2(DBL_MANT_DIG)).
Unless we expect really good RAM bit error rates in future RAM, I suggest we go for 64 or 128 bits of entropy, and leave it there. (Also, the likelihood of any smaller numbers is miniscule, and some calculations will turn them into zeroes or infinities anyway.)
comment:15 follow-up: 17 Changed 2 years ago by
We have a lot of options here, depending on what we want! Let's try to collect the possible goals, and see which we care about.
Here are some goals I think we probably care about, but I could be wrong:
- We should return a number uniformly at random in the range [0, 1.0). (That is, for all "suitable" x<y in [0,1.0), we should return a value in [x,y] with probability very close to y-x. Defining "suitable" and "very close" will be important, and might not include every possible double.)
- Return outputs with at least some minimum granularity. (i.e, for some granularity delta, if x is a possible output, and x ± delta is in [0.0, 1.0), then there exists a possible output between x and x ± delta other than x.)
- Run with reasonable efficiency.
- Run in constant time.
- Use the whole mantissa, or almost the whole mantissa.
- Provide at least some number of bits of entropy in the output.
- Work at least to a minimal degree on all c99 platforms.
Here are some goals I think we do not care about, but I could be wrong:
- Work perfectly on systems where FLT_RADIX is not 2.
- Provide identical output on all architectures regardless of floating-point implementation.
- Return every possible output with some probability. (For example, values less than 1e-300 are _possible_ doubles. But they have cumulative probability of 1e-300, which is less likely than just guessing the RNG seed on the first try.)
- Possibly return subnormal values.
- Perfect behavior on corner cases with total probability less than some epsilon (maybe 2^{-96)? }
- Run as fast as possible.
comment:16 Changed 2 years ago by
I think one way we could choose between our goals is to look at how the function is used (and could be used for privcount-in-tor's guassians, and could be used in other places where we synthesise a random double using a similar method).
For example, if we used the naïve algorithm that divides [0, UINT64_MAX] by (UINT64_MAX+1), I think we get a pattern like:
0, 1/2^{64}, 2/2^{64}, 3/2^{64}, ... , 2^{53}/2^{64}, 2^{53}/2^{64}, (2^{53}+2)/2^{64}, ... , (2^{64} - 2^{11} - 2^{10})/2^{64} (~2^{11} times), 1.0 (~2^{10} times)
due to representation limits (the details would vary depending on the rounding mode and possibly the hardware).
I wonder if this satisfies the requirements for our random noise distributions (which is where we mainly use this function) after being passed through the laplace and guassian transforms.
We should document their range and granularity in a similar level of detail, too.
comment:17 Changed 2 years ago by
Replying to nickm:
Here are some goals I think we probably care about, but I could be wrong:
If that's what you want:
double uint64_to_dbl_0_1(uint64_t x) { /* Can't merely check for __STDC_IEC_559__ because not every compiler we care about defines it. */ #if FLT_RADIX != 2 #error FLT_RADIX != 2, your system is the most special of them all. #endif x >>= (sizeof(double) * CHAR_BIT) - DBL_MANT_DIG; return (DBL_EPSILON/2) * x; }
- We should return a number uniformly at random in the range [0, 1.0). (That is, for all "suitable" x<y in [0,1.0), we should return a value in [x,y] with probability very close to y-x. Defining "suitable" and "very close" will be important, and might not include every possible double.)
Check.
- Return outputs with at least some minimum granularity. (i.e, for some granularity delta, if x is a possible output, and x ± delta is in [0.0, 1.0), then there exists a possible output between x and x ± delta other than x.)
Check.
- Run with reasonable efficiency.
Check.
- Run in constant time.
Check.
- Use the whole mantissa, or almost the whole mantissa.
Check.
- Provide at least some number of bits of entropy in the output.
Check.
- Work at least to a minimal degree on all c99 platforms.
If people want to run tor on something that is exotic to the point where this sort of approach breaks, they can send patches.
Yes this still leaves out "possible" values, but it trivially accomplishes uniform, fast, and constant time.
Edit: Left out a }
.
comment:18 follow-up: 19 Changed 2 years ago by
Status: | needs_revision → needs_review |
---|
That actually _does_ look fairly good. Let's review that approach.
comment:19 Changed 2 years ago by
Replying to nickm:
That actually _does_ look fairly good. Let's review that approach.
As an incremental improvement:
double uint64_to_dbl_0_1(uint64_t x) { /* Can't merely check for __STDC_IEC_559__ because not every compiler we care about defines it. */ #if FLT_RADIX != 2 #error FLT_RADIX != 2, your system is the most special of them all. #endif uint64_t mask = ((uint64_t)1 << DBL_MANT_DIG) - 1; double half_epsilon = 1.0 / (mask + 1); x &= mask; return half_epsilon * x; }
While the assumption I made about how DBL_EPSILON
is defined in relation to FLT_RADIX
and DBL_MANT_DIG
should hold everywhere that we care about, this is cleaner.
If there is to be unit tests or build time tests in autoconf, the following invariants should hold:
DBL_MANT_DIG > 0
half_epsilon == DBL_EPSILON/2
,half_epsilon != DBL_EPSILON
uint64_to_dbl_0_1(UINT64_MAX) != 1.0
uint64_to_dbl_0_1(UINT64_MAX) + (DBL_EPSILON/2) == 1.0
uint64_to_dbl_0_1(UINT64_MAX) + DBL_EPSLION == 1.0
uint64_to_dbl_0_1(0) == 0
comment:20 Changed 2 years ago by
Keywords: | review-group-22 added |
---|
comment:21 follow-ups: 22 25 27 Changed 2 years ago by
Okay, I've tried it out, and gotten some results. Apparently when Yawning's code above is used, Teor's tests no longer report that 53 bits of mantissa are set -- only 52 bits are set. I think this is okay, but can we do better? See branch here for more: feature23061_v2_yawning
.
I tried the ldexp() trick too, but that fails the test that uint64_to_dbl_0_1(UINT64_MAX) < 1.0
if have all four ldexp() calls. If I remove the lowest-magnitude one, it's fine, but of course we only get 48 bits.
comment:22 follow-up: 30 Changed 2 years ago by
Replying to nickm:
I tried the ldexp() trick too, but that fails the test that
uint64_to_dbl_0_1(UINT64_MAX) < 1.0
if have all four ldexp() calls. If I remove the lowest-magnitude one, it's fine, but of course we only get 48 bits.
Oh right, it'll round off the bits past DBL_MANT_DIG
. Maybe mask off all but the high 5 bits?
comment:23 Changed 2 years ago by
If we want more entropy than DBL_MANT_DIG
we could look at the exponent to decide how many bits to mask when adding the last chunk of randomness, but of course that might not be constant time. (I'm not sure we can assume basic floating point arithmetic will be constant time, so we might not care.)
comment:24 Changed 2 years ago by
comment:25 Changed 2 years ago by
Replying to nickm:
Okay, I've tried it out, and gotten some results. Apparently when Yawning's code above is used, Teor's tests no longer report that 53 bits of mantissa are set -- only 52 bits are set. I think this is okay, but can we do better? See branch here for more:
feature23061_v2_yawning
.
The routine I provided has this property on "sane" systems:
uint64_to_dbl_0_1(1) == 1.1102230246251565e-16 /* 2^-53 */
I don't think anything more is actually useful.
comment:26 Changed 2 years ago by
The potential problem is less about omitting rare small values and more about missing lots of representable intermediate values. I think with this kind of approach we're losing half of the representable values between 0.25 and 0.5-2^{-54}; 3/4 of the possible values between 0.125 and 0.25-2^{-55}, etc., which was kind of the motivation for the Downey paper. We might decide we don't care, but I think that's highly dependent on what the caller ends up doing with the results. (We should also document this choice; the comment currently focuses on small rare values instead of "holes" in the range.)
edit: fix some exponents I miscomputed due to a sign error
comment:27 Changed 2 years ago by
Replying to nickm:
but can we do better?
Yeah, fix the broken test.
do { d = crypto_rand_double(); // d = n * pow(2, -53) (Where n is the post mask sampled integer) /* Now check the granularity, by finding the lower bits of the result. */ d *= pow(2, EXPECTED_RAND_MANTISSA_BITS); // d *= pow(2, 53) // d = n * pow(2, -53) * pow(2, 53) // d = n d -= trunc(d); // d -= d (trunc(d) = d) } while (d == 0.0);
comment:28 Changed 2 years ago by
If the test assuming f(n-1) < f(n) < f(n+1)
is ok, then you can do:
double expected_increment = pow(FLT_RADIX, -DBL_MANT_DIG); assert(uint64_to_dbl_0_1(1) == expected_increment); // Hell, stick this in a for loop that randomly samples n, and asserts if you really care. assert(uint64_to_dbl_0_1(2) - uint64_to_dbl_0_1(1) == expected_increment);
comment:29 Changed 2 years ago by
Replying to nickm:
but can we do better?
On a more serious attempt at giving an answer, as a non-portable "better" (that I still think is largely pointless), you can probably use a variant of my solution with long double
and cast to double
.
Note that the behavior of such a routine is entirely compiler and architecture dependent, and it'll probably be a lot slower when it works.
If it were up to me, I would fully document the behavior and be done with it, under the assumption that anyone doing something that requires a less leaky floating point abstraction can either scale their values correctly (and use integer math) or write their own sampler.
comment:30 Changed 2 years ago by
Replying to catalyst:
Replying to nickm:
I tried the ldexp() trick too, but that fails the test that
uint64_to_dbl_0_1(UINT64_MAX) < 1.0
if have all four ldexp() calls. If I remove the lowest-magnitude one, it's fine, but of course we only get 48 bits.
Oh right, it'll round off the bits past
DBL_MANT_DIG
. Maybe mask off all but the high 5 bits?
This is still susceptible to the rounding mode in use at the time.
So yawning's bit twiddling is going to be more reliable and robust.
comment:32 Changed 2 years ago by
If we use uint64_to_dbl_0_1()
, then each random double is of the form N*2^{-53}, N = 0..(2^{53} - 1).
This is the image that sample_laplace_distribution() produces from that domain:
rand_double | Output Calculation | Output Value |
0 | INT64_MIN | - 2^{63} |
1 * 2^{-53} | - ln(2*2^{-53}) * delta_f / epsilon | 36.04 * delta_f / epsilon |
2 * 2^{-53} | - ln(4*2^{-53}) * delta_f / epsilon | 35.35 * delta_f / epsilon |
... | ... | ... |
0.5 - 2^{-53} | - ln(1.0 - 2*2^{-53}) * delta_f / epsilon | 2.22 * delta_f / epsilon |
0.5 | - ln(1.0) * delta_f / epsilon | 0.0 * delta_f / epsilon |
0.5 to 1.0 - 2^{-53} has the same pattern, but mirrored and negative.
The greatest delta_f / epsilon
we currently have is 2048 / 0.3 = 6826.66
, so the maximum noise is 246033, with a discontinuity that destroys the signal when 0.0 maps to INT64_MIN (let's track that bug in #23323).
The gap between 0.5 and 0.5 - 2^{-53} concerns me, because I think we need the smallest possible noise to be 1.0
, rather than 2.22 * delta_f / epsilon = 2.22 * 8 / 0.3 = 59.2
. But maybe this granularity is OK.
comment:33 Changed 2 years ago by
Using the code in nickm/privcount_nm_v2_shake_032
, this is the image that sample_unit_gaussian()
would produce from that domain, if it used crypto_rand_double()
to generate x
and y
(see #23147).
First, let's look at y
and r
:
y | r Output Calculation | r Output Value |
0 | sqrt(-2.0*ln(1.0 - 0.0) | -0.0 |
1 * 2^{-53} | sqrt(-2.0*ln(1.0 - 2^{-53})) | 1.49*10^{-8} |
2 * 2^{-53} | sqrt(-2.0*ln(1.0 - 2*2^{-53})) | 2.11*10^{-8} |
... | ... | ... |
1.0 - 2*2^{-53} | sqrt(-2.0*ln(2*2^{-53})) | 8.49 |
1.0 - 2^{-53} | sqrt(-2.0*ln(2^{-53})) | 8.57 |
Then x
and sin(theta)
, near-zero points:
x | sin(theta) Output Calculation | sin(theta) Output Value |
0 | sin(2*pi*0.0) | 0.0 |
1 * 2^{-53} | sin(2*pi*2^{-53}) | 6.98*10^{-16} |
... | ... | ... |
0.5 - 2^{-53} | sin(2*pi*(0.5 - 2^{-53})) | 1.01*10^{-15} |
0.5 | sin(2*pi*0.5) | 1.22*10^{-16} |
0.5 + 2^{-53} | sin(2*pi*(0.5 + 2^{-53})) | -7.66*10^{-16} |
... | ... | ... |
1.0 - 2^{-53} | sin(2*pi*(1.0 - 2^{-53})) | -1.13*10^{-15} |
Then x
and sin(theta)
, near-one points:
x | sin(theta) Output Calculation | sin(theta) Output Value |
0.25 - 2^{-53} | sin(2*pi*(0.25 - 2^{-53})) | 1.0 |
0.25 | sin(2*pi*0.25) | 1.0 |
0.25 + 2^{-53} | sin(2*pi*(0.25 + 2^{-53})) | 1.0 |
... | ... | ... |
0.75 - 2^{-53} | sin(2*pi*(0.75 - 2^{-53})) | -1.0 |
0.75 | sin(2*pi*0.75) | -1.0 |
0.75 + 2^{-53} | sin(2*pi*(0.75 + 2^{-53})) | -1.0 |
This means that the largest values are:
x | y | r * sin(theta) Output Value |
0.25 | 1.0 - 2^{-53} | 8.57 |
0.75 | 1.0 - 2^{-53} | -8.57 |
And given the slope of the sine curve at those points, we should be able to get nearby values easily.
And the smallest values are:
x | y | r * sin(theta) Output Value |
0.0 | any | 0.0 |
0.5 | 2^{-53} | 1.82*10^{-24} |
0.5 + 2^{-53} | 2^{-53} | -1.14*10^{-23} |
Typical standard deviations in experimental PrivCount go from 10^{3} to 10^{10}. I think we need the smallest possible noise to be 1.0
, so this granularity looks OK. (But it might limit our standard deviations to about 10^{23} or so, which could be a problem. I need to check this with Aaron.)
comment:34 follow-up: 35 Changed 2 years ago by
There isn't any point trying to include values between 0 and 2^{-53} in the output of the function. Because both the Laplace and Gaussian functions do 1.0 - crypto_rand_double()
, which rounds values smaller than 2^{-53}.
comment:35 follow-ups: 36 38 Changed 2 years ago by
Replying to teor:
There isn't any point trying to include values between 0 and 2^{-53} in the output of the function. Because both the Laplace and Gaussian functions do
1.0 - crypto_rand_double()
, which rounds values smaller than 2^{-53}.
Is there also no point in trying to fill the holes of missing representable values that are between the steps of size 2^{-53}? (losing one bit of precision with probability 0.5; two bits with probability 0.25; etc.) Also intermediate results with greater precision may occur (e.g., on x87 -- this is probably how you got greater than 53 bits of precision reported in your tests with the "naive" approach) and maybe we should allow those? Or does calling into tor_mathlog()
round it back down to double precision anyway?
comment:36 follow-up: 37 Changed 2 years ago by
Replying to catalyst:
Also intermediate results with greater precision may occur (e.g., on x87 -- this is probably how you got greater than 53 bits of precision reported in your tests with the "naive" approach) and maybe we should allow those?
Unless teor's system or compiler is outdated garbage it will use the SSE2 unit for floating point math.
comment:37 Changed 2 years ago by
Replying to yawning:
Unless teor's system or compiler is outdated garbage it will use the SSE2 unit for floating point math.
Latest systems support x87, latest compilers (excl. some garbage) support x87. They will use what you ask them to. SSE2 unit is an outdated garbage. SSEx is deprecated. Compiler which will use the SSE2 unit for floating point math by default is an outdated garbage.
comment:38 Changed 2 years ago by
Replying to catalyst:
Replying to teor:
There isn't any point trying to include values between 0 and 2^{-53} in the output of the function. Because both the Laplace and Gaussian functions do
1.0 - crypto_rand_double()
, which rounds values smaller than 2^{-53}.
Is there also no point in trying to fill the holes of missing representable values that are between the steps of size 2^{-53}? (losing one bit of precision with probability 0.5; two bits with probability 0.25; etc.) Also intermediate results with greater precision may occur (e.g., on x87 -- this is probably how you got greater than 53 bits of precision reported in your tests with the "naive" approach) and maybe we should allow those? Or does calling into
tor_mathlog()
round it back down to double precision anyway?
I don't believe there is any point in trying for any more precision than N*2^{-53} (N integer) in a generic function. If we need more than that for a specific distribution, we should change the transform we use for the distribution, or feed it random bits directly.
Edit: be more precise
comment:39 follow-up: 40 Changed 2 years ago by
Status: | needs_review → needs_revision |
---|
I've split the issues with calling functions into their own tickets.
I suggest we fix the broken unit tests, and then merge feature23061_v2_yawning
.
comment:40 Changed 2 years ago by
Replying to teor:
I've split the issues with calling functions into their own tickets.
I suggest we fix the broken unit tests, and then merge
feature23061_v2_yawning
.
It's probably also worth checking FLT_RADIX
and DBL_MANT_DIG
at build time, and refusing to build if the compilers idea of a double
is something exotic.
comment:41 Changed 2 years ago by
Parent ID: | → #23126 |
---|
We want to fix this before we deploy HS v3 stats.
comment:43 Changed 2 years ago by
Milestone: | Tor: 0.3.2.x-final → Tor: 0.3.3.x-final |
---|
comment:44 Changed 2 years ago by
Cc: | catalyst added |
---|
comment:45 follow-ups: 46 47 Changed 2 years ago by
Section 5.2 of https://pdfs.semanticscholar.org/2f2b/7a0d5000a31f7f0713a3d20919f9703c9876.pdf describes one way to sample uniformly from all the representable floating point numbers in [0, 1). It's not clear to me whether including the numbers with ULPs less than 2^{-53} is required for the success of the snapping mitigation described in that paper.
comment:46 Changed 2 years ago by
Replying to catalyst:
Section 5.2 of https://pdfs.semanticscholar.org/2f2b/7a0d5000a31f7f0713a3d20919f9703c9876.pdf describes one way to sample uniformly from all the representable floating point numbers in [0, 1). It's not clear to me whether including the numbers with ULPs less than 2^{-53} is required for the success of the snapping mitigation described in that paper.
It isn't required, and might even be counterproductive.
The floating point numbers in the proof in section 5.2 are of the form N*2^{-53}, N integer.
(The proof uses a significand of 52 bits, and refers to 2^{-53} repeatedly.)
comment:47 Changed 2 years ago by
https://i.imgur.com/WTDnnwE.gif
Replying to catalyst:
Section 5.2 of https://pdfs.semanticscholar.org/2f2b/7a0d5000a31f7f0713a3d20919f9703c9876.pdf describes one way to sample uniformly from all the representable floating point numbers in [0, 1). It's not clear to me whether including the numbers with ULPs less than 2^{-53} is required for the success of the snapping mitigation described in that paper.
See "Smoothing" under Section 5.1.
It isn't required, and might even be counterproductive.
Indeed.
comment:48 Changed 2 years ago by
We should revise or close these tickets based on Appendix C in the latest privcount-shamir spec:
https://github.com/teor2345/privcount_shamir/blob/noise-limits/doc/xxx-privcount-with-shamir.txt
The good news is that we probably don't need to care about extreme values or binning, because properly implemented noise has known limits, and doesn't need binning.
comment:50 Changed 2 years ago by
Keywords: | 028-backport-maybe removed |
---|
0.2.8 is EOL, so we won't be backporting to it.
comment:51 Changed 23 months ago by
Milestone: | Tor: 0.3.3.x-final → Tor: 0.3.4.x-final |
---|
comment:52 Changed 23 months ago by
Owner: | changed from nickm to teor |
---|---|
Status: | needs_revision → assigned |
If you don't mind, Nick, I'll do this when I do the rest of the PrivCount in Tor code.
I have to wrote a function like this in Rust anyway, and it should be trivial to port to C.
comment:53 Changed 23 months ago by
Sounds fine; I had fun programming the stuff above, and I look forward to what you do here.
comment:54 Changed 22 months ago by
Keywords: | 030-backport removed |
---|
Remove 030-backport from all open tickets that have it: 0.3.0 is now deprecated.
comment:55 Changed 22 months ago by
Remove 026-backport from all open tickets that have it: 0.2.6 has been deprecated for some while.
comment:56 Changed 22 months ago by
Parent ID: | #23126 → #25263 |
---|
comment:57 Changed 21 months ago by
Keywords: | 034-triage-20180328 added |
---|
comment:58 Changed 21 months ago by
Keywords: | 034-removed-20180328 added |
---|
Per our triage process, these tickets are pending removal from 0.3.4.
comment:59 Changed 20 months ago by
Milestone: | Tor: 0.3.4.x-final → Tor: unspecified |
---|
These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.
comment:60 Changed 20 months ago by
Keywords: | 026-backport-maybe removed |
---|
https://en.wikipedia.org/wiki/C99#IEEE_754_floating_point_support
When compiling C, if -fexcess-precision=standard is specified then excess precision follows the rules specified in ISO C99
comment:61 Changed 17 months ago by
Keywords: | 031-unreached-backport added; 031-backport removed |
---|
0.3.1 is end of life, there are no more backports.
Tagging with 031-unreached-backport instead.
comment:62 Changed 17 months ago by
Keywords: | 035-roadmap-subtask added |
---|---|
Milestone: | Tor: unspecified → Tor: 0.3.5.x-final |
comment:63 Changed 17 months ago by
Parent ID: | #25263 → #22898 |
---|
comment:64 Changed 17 months ago by
Parent ID: | #22898 → #26637 |
---|
comment:65 Changed 17 months ago by
Keywords: | 035-triaged-in-20180711 added |
---|
comment:66 Changed 17 months ago by
Sponsor: | SponsorQ → SponsorX-can |
---|
comment:67 Changed 17 months ago by
Sponsor: | SponsorX-can → SponsorV-can |
---|
comment:68 Changed 15 months ago by
Milestone: | Tor: 0.3.5.x-final → Tor: 0.3.6.x-final |
---|
Deferring privcount tickets in 0.3.5 to 0.3.6
comment:69 Changed 14 months ago by
If you take the uniform distribution on ^{}[0,1^{}] or (0,1), i.e. the Lebesgue measure, and round it to the nearest floating-point number, the distribution on the result is the natural notion of the uniform distribution on floating-point numbers in ^{}[0,1^{}]: the probability of each floating-point number is the width of the interval of real numbers rounded to that floating-point number.
Note that even if you round a sample from the open interval (0,1) of real numbers, the result might be rounded to 0 or 1, if it is in (0, 2^{emin - p - 1}] or [1 - eps/4, 1):
- The interval near 0 is of negligible size in any format larger than binary16 -- less than 2^{-126} in binary32 even if you use flush-to-zero, less than 2^{-1022} in binary64 -- so you don't need to worry about it and can safely compute, e.g., 1/x or log(x).
- However, the probability of getting a real number in the interval near 1 is 2^{-54} in binary64, which is small but not negligible, so I recommend not rejecting it unless you must avoid 1, for instance because you need to compute log1p(-x).
In infinite-precision floating-point, the exponent is geometrically distributed in {-1, -2, -3, ...}, and the fractional part of the significand is a uniformly distributed infinite bit string. If we round this to finite precision, the exponent remains the same (unless rounded to subnormal or zero) and the significand is rounded to p bits. The outcome of rounding is not actually affected by more than a finite number of significand bits, with one tricky exception, so you don't need to sample an infinite stream of bits and round it; to get this, you can:
- flip a coin until you get one heads, to pick a geometrically distributed exponent;
- flip a coin >p+1 times to choose the uniformly distributed fractional part of a significand, and round it to p bits with a trick to avoid a bias toward `even'.
If you drew p bits for the fractional part of the significand, there would be no rounding, and you would exclude from the possible significands the interval (2 - eps/2, 2] of real numbers that are rounded to 2 -- small but not nonnegligible. If you draw >p bits and round it, then no real number in [1, 2] is excluded.
However, there is a slight bias toward `even' outcomes -- that is, outcomes whose least significant bit is zero -- because there's a nonzero probability that the finite significand you choose is a tie, exactly between two floating-point numbers, which is broken by choosing the even option. This event has probability zero in Lebesgue measure. You can simply break the ties by drawing >p+1 bits and forcing the least significant bit to be 1.
The attached file uniform01.c implements this algorithm in the function sample_uniform_01
, with parameters chosen for IEEE 754 binary64 ('double-precision') floating-point.
As it happens, the variant algorithm that chooses exactly p bits and doesn't round them gives a uniform distribution on [0, 1 - eps/4), which incidentally effectively never returns zero, and so yields what you might think of 'the uniform distribution on floating-point numbers in (0, 1)'. I named it sample_uniform_01_open
, and added some paranoia to clamp the result at the smallest subnormal in case your PRNG is broken. But it also works to just do rejection sampling on sample_uniform_01
and reduce the amount of code to worry about.
Note: This code does not run in constant time -- the time it takes is proportional to the magnitude of the exponent. (In principle, it may also take variable time for subnormal arithmetic, but the probability of that is negligible for any format larger than binary16.)
You could make it run in constant time by always flipping the maximum number of coins for an exponent, and counting the leading zero bits. To improve performance, you can safely truncate the exponent at, say, 128 -- if you flip 128 coins and you get tails every time, either your coin flipper is broken or there's a glitch in the matrix.
comment:70 Changed 13 months ago by
I should add about sample_uniform_01
vs sample_uniform_01_open
that, generally, the only reason you should use sample_uniform_01_open
is as a drop-in replacement for the current crypto_rand_double
to avoid breaking downstream code that assumes it can't get the floating-point number 1.
If you want to take the log of the sample without getting -infinity, for instance to generate an exponential sample, you want a floating-point number in (0, 1], not a floating-point number in [0, 1). (Even better, for an exponential sample, you can use -log(p/2) if a fair coin toss turns up heads, and -log1p(-p/2) if it turns up tails -- then the sampler supports high precision negative and positive outputs; flip another coin to pick the sign, for a Laplace sampler.) If for some reason you are tempted to compute log1p(-p) or 1/(1 - p) downstream where p is uniform in [0, 1), better to change what happens downstream to compute log(p) or 1/p where p is in (0, 1] -- log1p and 1/(1 - p) are ill-conditioned near 1, so it's always better to avoid relying on computations involving such intermediate quantities.
The Mironov paper analyzes the use of (a) a correct uniform distribution like I provided (actually, sometimes he vacillates between (0,1) and (0,1] but I don't think it matters for the proof except maybe requiring a small adjustment to Eq. (7)), together with (b) a correctly rounded natural logarithm, and proves a differential privacy result, Theorem 1, under those assumptions.
The system's libm is unlikely to have a correctly rounded log, with maximum 0.5ulp error, corresponding to 2^{-53} relative error, but it is likely to have something very close to that -- almost certainly much better than 1ulp or 2^{-52} relative error. As a rough approximation, I think you can just take eta in Sec. 5.2 of the paper to be 2^{-52} instead of 2^{-53}, and instead of the (1/lambda + 2^{-49}B/lambda)-differential privacy asserted in Theorem 1, you get (1/lambda + 2^{-48}B/lambda)-differential privacy.
P.S. If in your Laplace sampler you use the exponential sampler I described above -- toss a fair coin to decide whether to use -log(p/2) or -log1p(-p/2), instead of the standard unconditional -log(p) -- I bet you'll get a slightly better differential privacy bound. But it would take a bit of work to identify the better bound and prove it, and really, most of this is just to get confidence that the bound doesn't explode and is reasonably small in concrete parameters, not to prove the bestest most tightest bound possible.
comment:71 Changed 13 months ago by
Status: | assigned → needs_revision |
---|
Thanks for this code, I'll integrate it into crypto_rand_double(), and make the necessary replacements in its callers.
This might not happen for a little while, because I have some rust code to write first.
comment:72 Changed 13 months ago by
Cc: | mikeperry added |
---|
If Mike wants to modify crypto_rand_double(), I could do the review.
comment:73 Changed 13 months ago by
Milestone: | Tor: 0.3.6.x-final → Tor: 0.4.0.x-final |
---|
Tor 0.3.6.x has been renamed to 0.4.0.x.
comment:74 Changed 11 months ago by
Keywords: | 040-unreached-20190109 added |
---|---|
Milestone: | Tor: 0.4.0.x-final → Tor: unspecified |
These tasks aren't essential for a PrivCount proof of concept: moving them to Tor: unspecified.
comment:75 Changed 9 months ago by
Keywords: | fast-fix added |
---|
comment:76 Changed 8 months ago by
Cc: | rl1987 added |
---|
comment:77 Changed 6 months ago by
Owner: | teor deleted |
---|---|
Status: | needs_revision → assigned |
Also, this is UINT_MAX_PLUS_ONE_AS_DOUBLE: