We should have constant-time comparison functions available in Tor, and use them whenever we compare cryptographic values or passwords.
We probably don't need to do all of our comparisons of digests in constant time, but once we have constant-time comparison functions around, it will be easier to use them than to convince ourselves that we don't need to in any particular spot.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
As I said on IRC, I think that we might as well use constant time comparisons for anything that includes a digest, a key, a password, etc - It's unlikely that we really need non constant time comparisons; I'd rather that we do constant time by default and opt for something less strict when we're totally sure it will have no security, privacy, or anonymity implications.
[I sent this as an email to tor-dev 45 minutes or so ago but haven't seen it come through. I just signed up for the list so perhaps first posts are modded.]
Greetings all,
I happened to download the tor-0.2.2.25-alpha.tar.gz source yesterday and I noticed something. Apologies in advance if this has already been discussed and resolved, I did a cursory web search and didn't see anything.
There are a lot of places in the code where memcmp() is called on memory buffers that look like they might contain various hashes or digests:
The built-in memcmp typically runs in time proportional to the common prefix of the two memory buffers being compared. In networked applications, this is often a significant source of timing information. Sometimes these are severe enough to result in disclosure of secret key material. A good resource for remote timing attacks is Nate Lawson's blog:
Some cases look particularly concerning. For example, in the following code 'handshake_reply' appears to be supplied by the remote party and it is compared with something called "key material".
src/or/onion.c:331:
if (memcmp(key_material, handshake_reply+DH_KEY_LEN, DIGEST_LEN)) {
/* H(K) does not match. Something fishy. */
log_warn(LD_PROTOCOL,"Digest DOES NOT MATCH on onion handshake. "
"Bug or attack.");
goto err;
}
In the worst-case, the attacker could simply supply different values of handshake_reply, observe how long each takes to compare, and figure out the value of key_material statistically byte-by-byte.
Perhaps this key material is never used again, or there are other reasons this timing informaion is not useful. But in general, it's very difficult to determine whether or not such a timing info disclosure represents a vulnerability or how difficult it would be to exploit. Expert programmers seem to consistently underestimate the severtiy of these weaknesses, perhaps because they are so subtle. I've just started learning about Tor so it's not possible for me to review every instance.
I think the quickest and easiest solution would be to replace every usage of memcmp and str*cmp in the source with an alternate version that guarantees a constant time comparison.
I'd expect this could be done with negligible performance impact, especially for short comparisons like 20-byte hash values.
I agree with using data-independent memcmp (that is to say, dependent on the length parameter but not on the data) everywhere that it's not specifically shown to be safe. Personally, I'd suggest that we just outright switch all of the memcmps that we do to use a data-independent version, and have a fast_memcmp() that we use for cases where the length may be larger and we know that the operation is safe. It seems safer to audit for safe and critical cases than it does to try to audit for the risky ones.
But I'm not clear what a "constant-time" strcmp operation even means. It could be dependent on the length of the shorter string, or on the length of the longer string, or on the first or the second, but I'm not sure how you're supposed to implement true "data-independent" strcmp. This will want closer code auditing.
FWIW, the particular example above is safe. Even if the attacker somehow learned not only one byte but rather every byte in key_material by sending a bad handshake reply, the information would be useless: a bad reply means that the client closes the circuit immediately. The next circuit the client tries to build will have a different g^x value for its diffie hellman handshake, and the key_material that the client would have accepted last time will not be the key_material that it expects in any subsequent circuit extend handshake.
Actually implementing memcmp in a data-independent form that returns -1, 0, or 1 properly is harder. Fortunately, we almost never need that version. Dropping in mem_neq as a replacement for nearly every memcmp should do pretty well.
There are a lot of places in the code where memcmp() is called on memory buffers that look like they might contain various hashes or digests:
Applying memcmp to a hash or digest is not normally a problem. memcmp is only dangerous when applied to a MAC, password, or other secret value which the attacker can attempt to guess one byte at a time.
There are a few cases where we pretend that a hash value of non-secret information is hard for an attacker to guess, most notably in our bridge design where anyone who knows the hash of a published bridge's identity public key can use that hash to retrieve the bridge's descriptor from the bridge authority, but the correct fix for that type of security hole is to stop using a value that is so easily obtained as a password of any kind.
Some cases look particularly concerning. For example, in the following code 'handshake_reply' appears to be supplied by the remote party and it is compared with something called "key material".
src/or/onion.c:331:
if (memcmp(key_material, handshake_reply+DH_KEY_LEN, DIGEST_LEN)) { /* H(K) does *not* match. Something fishy. */ log_warn(LD_PROTOCOL,"Digest DOES NOT MATCH on onion handshake. " "Bug or attack."); goto err; }
In the worst-case, the attacker could simply supply different values of handshake_reply, observe how long each takes to compare, and figure out the value of key_material statistically byte-by-byte.
Or the attacker could read HK out of the (plaintext) CREATED cell sent by the relay, exactly where the client is reading it from.
HK is a hash of the Diffie-Hellman shared secret between a client and a relay, and it is used in exactly two ways in Tor's protocol: for key confirmation (the use you are complaining about), and as a replay-prevention nonce in the hidden-service protocol. We know it is public information.
And as nickm just explained, the attacker cannot extract the correct value of HK for a handshake from the client using a timing attack on that memcmp because the attacker can only test one guess.
There are a lot of places in the code where memcmp() is called on memory buffers that look like they might contain various hashes or digests:
Applying memcmp to a hash or digest is not normally a problem. memcmp is only dangerous when applied to a MAC, password, or other secret value which the attacker can attempt to guess one byte at a time.
I'm not so sure that every digest we compare will be public or that every digest comparison is not a problem. I think that generally, we need to show that each one is safe based on some assumptions and then weigh the changes of those assumptions ever changing...
Define a new tor_memcmp and a new tor_memneq. Both should be data-independent. Define a fast_memcmp that aliases the platform memcmp(). That's just there so that any unadorned memcmp() in our code can be called "incorrect".
Starting with 0.2.1, do a global search-and-replace for memcmp->tor_memcmp. This should be automatable.
Do a hand-search for cases where we use tor_memcmp for equality-checking only. This should be most of them. Replace them with tor_memneq.
Identify cases that we think are performance-relevant and where we believe that data-dependence is safe. Move those back to use fast_memcmp.
In 0.2.2 and later, optimize the 16-byte, 20-byte, and 32-byte cases (I expect they come up a lot).
II. String operations
List all of our relevant string operations and figure out how to identify any high-risk class of users. I think that 95% of our string operations are not on secret data, so we could say "string functions must not be called on secret stuff" or "string functions must not be called on stuff over the net" or something.
Write data-independent variants of our string functions.
Use them in some relatively easy-to-decide pattern.
III. Other things
We need to look for other kinds of operations that alter control flow based on sensitive information. This includes at minimum auditing hash tables and lookup functions. This will be an ongoing thing.
Oh, add another step to part I. Memcmp: "4.1. On a case-by-case basis, change !tor_memneq(..) and tor_memneq(..)==0 into tor_memeq. This will need to be done very carefully; it is exactly where I expect to screw up."
We need to look for other kinds of operations that alter control flow based on sensitive information. This includes at minimum auditing hash tables and lookup functions. This will be an ongoing thing.
The solution here is to never use a secret string as a lookup key in an associative data structure. One easy way to do this is to HMAC the secret lookup key with an ephemeral secret HMAC key; the result is not so secret, although we would still use our constant-time comparison functions within the data structure's implementation purely for performance reasons.
Define a new tor_memcmp and a new tor_memneq. Both should be data-independent. Define a fast_memcmp that aliases the platform memcmp(). That's just there so that any unadorned memcmp() in our code can be called "incorrect".
Neat. Is anybody picking up the transition work on that (steps I.1 through I.4 above, or should I get on it?)
Also, I am a little concerned with this:
{{{
equal_p >>= 8;
/* Thanks to (sign-preserving) arithmetic shift, equal_p is now
* equal to -(v1 == v2), which is exactly what we need below.
* (Since we're assuming two's-complement arithmetic, -1 is the
* same as ~0 (all bits set).) */
}}
Right-shift of negative numbers is implementation-defined in C. If we're going to rely on it, we need to verify that its behavior is what we expect whenever we're building.
equal_p >>= 8; /* Thanks to (sign-preserving) arithmetic shift, equal_p is now * equal to -(v1 == v2), which is exactly what we need below. * (Since we're assuming two's-complement arithmetic, -1 is the * same as ~0 (all bits set).) */
Right-shift of negative numbers is implementation-defined in C. If we're going to rely on it, we need to verify that its behavior is what we expect whenever we're building.
It's not done -- it still needs a commit message and a changes file -- but I added a di_ops.[ch] with tests based on Robert's , based on 0.2.1, in a branch called bug3122_memcmp in my public repository.
I threw together some quick and dirty timing code at http://www.wangafu.net/~nickm/volatile/memcmp_timing.gitbundle , based on rransom's repo. The emphasis is on dirty: either i am testing it wrong, or I am doing the stats wrong on the output, or something.
Things to check: Did I get the T-value calculation about right? Should I just be piping the results to R to do my stats instead?
Does the way that I make a change to the n'th position between calling memcmp (to prevent the compiler from caring that memcmp is attribute((pure, const)) ) affect cacheing?
Suggestive datum: The reported T values are high even for the function do_nothing_at_all, which doesn't actually do anything but add all the bytes in its inputs and return the results. This suggests a bug in the harness or in the analysis.
Suggestive datum: The reported T values are high even for the function do_nothing_at_all, which doesn't actually do anything but add all the bytes in its inputs and return the results. This suggests a bug in the harness or in the analysis.
The harness is broken. The purpose of using a processor's cycle counter to time a function is to get a useful time value from a single execution of the function. If you repeat a function hundreds of times, you dramatically increase the likelihood that the OS will interrupt the process, thereby ruining your timings.
(That ruination is much worse if you use a cycle counter for timing on a multi-processor system -- since only one of the processors started when the computer was turned on, and the rest were activated later as the OS kernel initialized its hardware drivers, the cycle counters for different processors (even if they live on the same chip) will be skewed by many cycles. You can see this effect on some of DJB's SUPERCOP graphs.)
@@ -640,31 +640,36 @@ find_whitespace_eos(const char *s, const char *eos) int tor_mem_is_zero(const char *mem, size_t len) { static const char ZERO[] = { 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, }; while (len >= sizeof(ZERO)) {- if (tor_memcmp(mem, ZERO, sizeof(ZERO)))+ /* It's safe to use fast_memcmp here, since the very worst thing an+ * attacker could learn is how many initial bytes of a secret were zero */+ if (fast_memcmp(mem, ZERO, sizeof(ZERO))) return 0;
I'm not distinguishing fast_memcmp from fast_memneq in this round of changes, since they're the same function under the covers. I created fast_memeq and fast_memneq mainly so we could transition between tor_* and fast_* without worrying about messing up when adding and removing ! s.
I looked through all the different transformations of memcmp (the last two patches in the branch) and didn't spot anything obvious.
Maybe we should add something (maybe to make test, or make check-spaces, or even something new for this purpose) that prints a warning if we're using memcmp anywhere in src/ other than src/common/di_ops* with the possible exception of unit tests? The cleanup would be easy - only two comments (one in dirserv.c, one in routerlist.c) remain.
For 0.2.1: bug3122_memcmp_squashed (With the documentation fix above, and no other changes)
For 0.2.2: bug3122_memcmp_022
For 0.2.3: bug3122_memcmp_023
I think that the merging and updating were straightforward... but it's when I think it's straightforward that I'm likeliest to get careless. Thus, please review.
Moving to 0.2.3 milestone since I think it would be a poor plan to mess with 0.2.1.x at this point, and if we find something good we can decide then whether a backport to 0.2.2 is smart.
Trac: Milestone: Tor: 0.2.1.x-final to Tor: 0.2.3.x-final