Opened 3 years ago

Closed 2 years ago

Last modified 19 months ago

#3122 closed enhancement (fixed)

Write and use constant-time comparison functions

Reported by: rransom Owned by: nickm
Priority: major Milestone: Tor: 0.2.3.x-final
Component: Tor Version:
Keywords: tor-client Cc: desnacked@…, iang@…
Actual Points: Parent ID:
Points:

Description

We should have constant-time comparison functions available in Tor, and use them whenever we compare cryptographic values or passwords.

We probably don't need to do all of our comparisons of digests in constant time, but once we have constant-time comparison functions around, it will be easier to use them than to convince ourselves that we don't need to in any particular spot.

Child Tickets

Change History (33)

comment:1 Changed 3 years ago by ioerror

As I said on IRC, I think that we might as well use constant time comparisons for anything that includes a digest, a key, a password, etc - It's unlikely that we really need non constant time comparisons; I'd rather that we do constant time by default and opt for something less strict when we're totally sure it will have no security, privacy, or anonymity implications.

comment:2 follow-up: Changed 3 years ago by cypherpunks

[I sent this as an email to tor-dev 45 minutes or so ago but haven't seen it come through. I just signed up for the list so perhaps first posts are modded.]

Greetings all,

I happened to download the tor-0.2.2.25-alpha.tar.gz source yesterday and I noticed something. Apologies in advance if this has already been discussed and resolved, I did a cursory web search and didn't see anything.

There are a lot of places in the code where memcmp() is called on memory buffers that look like they might contain various hashes or digests:

~/tor-0.2.2.25-alpha$ grep -r memcmp . | grep -i digest | wc -l
137

~/tor-0.2.2.25-alpha$ grep -r memcmp . | grep -i key | wc -l
14

The built-in memcmp typically runs in time proportional to the common prefix of the two memory buffers being compared. In networked applications, this is often a significant source of timing information. Sometimes these are severe enough to result in disclosure of secret key material. A good resource for remote timing attacks is Nate Lawson's blog:

http://rdist.root.org/2010/07/19/exploiting-remote-timing-attacks/

Some cases look particularly concerning. For example, in the following code 'handshake_reply' appears to be supplied by the remote party and it is compared with something called "key material".

src/or/onion.c:331:

if (memcmp(key_material, handshake_reply+DH_KEY_LEN, DIGEST_LEN)) {

/* H(K) does *not* match. Something fishy. */

log_warn(LD_PROTOCOL,"Digest DOES NOT MATCH on onion handshake. "

"Bug or attack.");

goto err;

}

In the worst-case, the attacker could simply supply different values of handshake_reply, observe how long each takes to compare, and figure out the value of key_material statistically byte-by-byte.

Perhaps this key material is never used again, or there are other reasons this timing informaion is not useful. But in general, it's very difficult to determine whether or not such a timing info disclosure represents a vulnerability or how difficult it would be to exploit. Expert programmers seem to consistently underestimate the severtiy of these weaknesses, perhaps because they are so subtle. I've just started learning about Tor so it's not possible for me to review every instance.

I think the quickest and easiest solution would be to replace every usage of memcmp and str*cmp in the source with an alternate version that guarantees a constant time comparison.

I'd expect this could be done with negligible performance impact, especially for short comparisons like 20-byte hash values.

Regards,

  • Marsh

comment:4 Changed 3 years ago by nickm

I agree with using data-independent memcmp (that is to say, dependent on the length parameter but not on the data) everywhere that it's not specifically shown to be safe. Personally, I'd suggest that we just outright switch *all* of the memcmps that we do to use a data-independent version, and have a fast_memcmp() that we use for cases where the length may be larger and we know that the operation is safe. It seems safer to audit for safe and critical cases than it does to try to audit for the risky ones.

But I'm not clear what a "constant-time" strcmp operation even means. It could be dependent on the length of the shorter string, or on the length of the longer string, or on the first or the second, but I'm not sure how you're supposed to implement true "data-independent" strcmp. This will want closer code auditing.

FWIW, the particular example above is safe. Even if the attacker somehow learned not only one byte but rather *every* byte in key_material by sending a bad handshake reply, the information would be useless: a bad reply means that the client closes the circuit immediately. The next circuit the client tries to build will have a different gx value for its diffie hellman handshake, and the key_material that the client would have accepted last time will not be the key_material that it expects in any subsequent circuit extend handshake.

comment:5 Changed 3 years ago by nickm

FWIW, it's trivial to do a data-independent equality check to drop in for cases where we are just using memcmp for equality:

int mem_neq(const void *m1, const void *m2, size_t n)
{
  const uint8_t *b1 = m1, *b2 = m2;
  uint8_t diff = 0;
  while (n--)
    diff |= *b1++ ^ *b2++;
  return diff != 0;
}
#define mem_eq(m1, m2, n) (!mem_neq((m1), (m2),(n)))

Actually implementing memcmp in a data-independent form that returns -1, 0, or 1 properly is harder. Fortunately, we almost never need that version. Dropping in mem_neq as a replacement for nearly every memcmp should do pretty well.

comment:6 in reply to: ↑ 2 ; follow-up: Changed 3 years ago by rransom

Replying to cypherpunks:

There are a lot of places in the code where memcmp() is called on memory buffers that look like they might contain various hashes or digests:

Applying memcmp to a hash or digest is not normally a problem. memcmp is only dangerous when applied to a MAC, password, or other secret value which the attacker can attempt to guess one byte at a time.

There are a few cases where we pretend that a hash value of non-secret information is hard for an attacker to guess, most notably in our bridge design where anyone who knows the hash of a published bridge's identity public key can use that hash to retrieve the bridge's descriptor from the bridge authority, but the correct fix for that type of security hole is to stop using a value that is so easily obtained as a password of any kind.

Some cases look particularly concerning. For example, in the following code 'handshake_reply' appears to be supplied by the remote party and it is compared with something called "key material".

src/or/onion.c:331:

  if (memcmp(key_material, handshake_reply+DH_KEY_LEN, DIGEST_LEN)) {
    /* H(K) does *not* match. Something fishy. */
     log_warn(LD_PROTOCOL,"Digest DOES NOT MATCH on onion handshake. "
           "Bug or attack.");
    goto err;
  }

In the worst-case, the attacker could simply supply different values of handshake_reply, observe how long each takes to compare, and figure out the value of key_material statistically byte-by-byte.

Or the attacker could read HK out of the (plaintext) CREATED cell sent by the relay, exactly where the client is reading it from.

HK is a hash of the Diffie-Hellman shared secret between a client and a relay, and it is used in exactly two ways in Tor's protocol: for key confirmation (the use you are complaining about), and as a replay-prevention nonce in the hidden-service protocol. We know it is public information.

And as nickm just explained, the attacker cannot extract the correct value of HK for a handshake from the client using a timing attack on that memcmp because the attacker can only test one guess.

comment:7 in reply to: ↑ 6 Changed 3 years ago by ioerror

Replying to rransom:

Replying to cypherpunks:

There are a lot of places in the code where memcmp() is called on memory buffers that look like they might contain various hashes or digests:

Applying memcmp to a hash or digest is not normally a problem. memcmp is only dangerous when applied to a MAC, password, or other secret value which the attacker can attempt to guess one byte at a time.

I'm not so sure that every digest we compare will be public or that every digest comparison is not a problem. I think that generally, we need to show that each one is safe based on some assumptions and then weigh the changes of those assumptions ever changing...

comment:8 follow-ups: Changed 3 years ago by nickm

So here's what I'd suggest we do for starters:

  1. Memcmp
  1. Define a new tor_memcmp and a new tor_memneq. Both should be data-independent. Define a fast_memcmp that aliases the platform memcmp(). That's just there so that any unadorned memcmp() in our code can be called "incorrect".
  2. Starting with 0.2.1, do a global search-and-replace for memcmp->tor_memcmp. This should be automatable.
  3. Do a hand-search for cases where we use tor_memcmp for equality-checking only. This should be most of them. Replace them with tor_memneq.
  4. Identify cases that we think are performance-relevant and where we believe that data-dependence is safe. Move those back to use fast_memcmp.
  5. In 0.2.2 and later, optimize the 16-byte, 20-byte, and 32-byte cases (I expect they come up a lot).
  1. String operations
  1. List all of our relevant string operations and figure out how to identify any high-risk class of users. I think that 95% of our string operations are not on secret data, so we could say "string functions must not be called on secret stuff" or "string functions must not be called on stuff over the net" or something.
  2. Write data-independent variants of our string functions.
  3. Use them in some relatively easy-to-decide pattern.
  1. Other things

We need to look for other kinds of operations that alter control flow based on sensitive information. This includes at minimum auditing hash tables and lookup functions. This will be an ongoing thing.

comment:9 Changed 3 years ago by nickm

  • Milestone set to Tor: 0.2.1.x-final

comment:10 Changed 3 years ago by nickm

Oh, add another step to part I. Memcmp: "4.1. On a case-by-case basis, change !tor_memneq(..) and tor_memneq(..)==0 into tor_memeq. This will need to be done very carefully; it is exactly where I expect to screw up."

comment:11 in reply to: ↑ 8 Changed 3 years ago by rransom

Replying to nickm:

  1. Other things

We need to look for other kinds of operations that alter control flow based on sensitive information. This includes at minimum auditing hash tables and lookup functions. This will be an ongoing thing.

The solution here is to never use a secret string as a lookup key in an associative data structure. One easy way to do this is to HMAC the secret lookup key with an ephemeral secret HMAC key; the result is not so secret, although we would still use our constant-time comparison functions within the data structure's implementation purely for performance reasons.

comment:13 in reply to: ↑ 8 Changed 3 years ago by rransom

Replying to nickm:

  1. Memcmp
  1. Define a new tor_memcmp and a new tor_memneq. Both should be data-independent. Define a fast_memcmp that aliases the platform memcmp(). That's just there so that any unadorned memcmp() in our code can be called "incorrect".

See tor-safe-memcmp in my tor-utils repo ( ssh://mob@repo.or.cz/srv/git/tor-utils/rransom.git tor-safe-memcmp ) for tor_memcmp and a minimal unit test.

comment:14 Changed 3 years ago by nickm

Neat. Is anybody picking up the transition work on that (steps I.1 through I.4 above, or should I get on it?)

Also, I am a little concerned with this:

       equal_p >>= 8;
       /* Thanks to (sign-preserving) arithmetic shift, equal_p is now
        * equal to -(v1 == v2), which is exactly what we need below.
        * (Since we're assuming two's-complement arithmetic, -1 is the
        * same as ~0 (all bits set).) */
}}

Right-shift of negative numbers is implementation-defined in C.  If we're going to rely on it, we need to verify that its behavior is what we expect whenever we're building.

comment:15 Changed 3 years ago by nickm

oops. Bad formatting. That should have said:

Also, I am a little concerned with this:

       equal_p >>= 8;
       /* Thanks to (sign-preserving) arithmetic shift, equal_p is now
        * equal to -(v1 == v2), which is exactly what we need below.
        * (Since we're assuming two's-complement arithmetic, -1 is the
        * same as ~0 (all bits set).) */

Right-shift of negative numbers is implementation-defined in C. If we're going to rely on it, we need to verify that its behavior is what we expect whenever we're building.

comment:16 Changed 3 years ago by asn

  • Cc desnacked@… added

comment:17 Changed 3 years ago by nickm

It's not done -- it still needs a commit message and a changes file -- but I added a di_ops.[ch] with tests based on Robert's , based on 0.2.1, in a branch called bug3122_memcmp in my public repository.

comment:18 follow-up: Changed 3 years ago by nickm

I threw together some quick and dirty timing code at http://www.wangafu.net/~nickm/volatile/memcmp_timing.gitbundle , based on rransom's repo. The emphasis is on *dirty*: either i am testing it wrong, or I am doing the stats wrong on the output, or something.

Things to check: Did I get the T-value calculation about right? Should I just be piping the results to R to do my stats instead?

Does the way that I make a change to the n'th position between calling memcmp (to prevent the compiler from caring that memcmp is attribute((pure, const)) ) affect cacheing?

Suggestive datum: The reported T values are high even for the function do_nothing_at_all, which doesn't actually do anything but add all the bytes in its inputs and return the results. This suggests a bug in the harness or in the analysis.

comment:19 in reply to: ↑ 18 Changed 3 years ago by rransom

Replying to nickm:

Suggestive datum: The reported T values are high even for the function do_nothing_at_all, which doesn't actually do anything but add all the bytes in its inputs and return the results. This suggests a bug in the harness or in the analysis.

The harness is broken. The purpose of using a processor's cycle counter to time a function is to get a useful time value from a single execution of the function. If you repeat a function hundreds of times, you dramatically increase the likelihood that the OS will interrupt the process, thereby ruining your timings.

(That ruination is much worse if you use a cycle counter for timing on a multi-processor system -- since only one of the processors started when the computer was turned on, and the rest were activated later as the OS kernel initialized its hardware drivers, the cycle counters for different processors (even if they live on the same chip) will be skewed by many cycles. You can see this effect on some of DJB's SUPERCOP graphs.)

comment:20 Changed 3 years ago by nickm

  • Status changed from new to needs_review

Please review branch 3122_memcmp for 0.2.1; it needs careful attention. I want to merge it soon.

comment:21 Changed 3 years ago by rransom

@@ -640,31 +640,36 @@ find_whitespace_eos(const char *s, const char *eos)
 int
 tor_mem_is_zero(const char *mem, size_t len)
 {
   static const char ZERO[] = {
     0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
   };
   while (len >= sizeof(ZERO)) {
-    if (tor_memcmp(mem, ZERO, sizeof(ZERO)))
+    /* It's safe to use fast_memcmp here, since the very worst thing an
+     * attacker could learn is how many initial bytes of a secret were zero */
+    if (fast_memcmp(mem, ZERO, sizeof(ZERO)))
       return 0;

It's also safe to use fast_memneq here.

comment:22 Changed 3 years ago by rransom

In src/or/control.c:

@@ -508,15 +508,15 @@ connection_printf_to_buf(control_connection_t *conn, const char *format, ...)

...

-  if (tor_memcmp("\r\n\0", buf+len-2, 3)) {
+  if (fast_memcmp("\r\n\0", buf+len-2, 3)) {

This, too, can be fast_memneq.

comment:23 Changed 3 years ago by rransom

All of the fast_memcmp calls in src/or/control.c can be fast_memneq calls.

In src/or/directory.c:

@@ -3454,18 +3454,18 @@ dir_routerdesc_download_failed(smartlist_t *failed, int status_code,
 /** Helper.  Compare two fp_pair_t objects, and return -1, 0, or 1 as
  * appropriate. */
 static int
 _compare_pairs(const void **a, const void **b)

This function cannot be relied on to return -1, 0, or 1 even if it calls the system's memcmp function.

comment:24 Changed 3 years ago by nickm

I'm not distinguishing fast_memcmp from fast_memneq in this round of changes, since they're the same function under the covers. I created fast_memeq and fast_memneq mainly so we could transition between tor_* and fast_* without worrying about messing up when adding and removing ! s.

Fixing the docs is important though.

comment:25 Changed 3 years ago by Sebastian

I looked through all the different transformations of memcmp (the last two patches in the branch) and didn't spot anything obvious.

Maybe we should add something (maybe to make test, or make check-spaces, or even something new for this purpose) that prints a warning if we're using memcmp anywhere in src/ other than src/common/di_ops* with the possible exception of unit tests? The cleanup would be easy - only two comments (one in dirserv.c, one in routerlist.c) remain.

comment:26 Changed 3 years ago by nickm

More branches to look at!

For 0.2.1: bug3122_memcmp_squashed (With the documentation fix above, and no other changes)
For 0.2.2: bug3122_memcmp_022
For 0.2.3: bug3122_memcmp_023

I think that the merging and updating were straightforward... but it's when I think it's straightforward that I'm likeliest to get careless. Thus, please review.

comment:27 Changed 3 years ago by nickm

Merged those branches.

comment:28 Changed 3 years ago by nickm

  • Owner changed from ioerror to nickm
  • Status changed from needs_review to accepted

Moving this ticket to "accepted" to be a place to look for more opportunities to eliminate timing leaks.

comment:29 Changed 3 years ago by iang

  • Cc iang@… added

comment:30 Changed 3 years ago by arma

  • Milestone changed from Tor: 0.2.1.x-final to Tor: 0.2.3.x-final

Moving to 0.2.3 milestone since I think it would be a poor plan to mess with 0.2.1.x at this point, and if we find something good we can decide then whether a backport to 0.2.2 is smart.

comment:31 Changed 2 years ago by nickm

  • Resolution set to fixed
  • Status changed from accepted to closed

I don't currently believe we have any more sensitive time-variant comparisons. Closing this one as finished.

comment:32 Changed 19 months ago by nickm

  • Keywords tor-client added

comment:33 Changed 19 months ago by nickm

  • Component changed from Tor Client to Tor
Note: See TracTickets for help on using tickets.