Choose a faster memwipe implementation

changed milestone to %Tor: 0.2.8.x-final

added component::core tor/tor milestone::Tor: 0.2.8.x-final priority::medium resolution::implemented severity::normal status::closed tor-relay type::enhancement labels

Also see http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/protect-secrets.html :

 void *guaranteed_memset(void *v,int c,size_t n)
  { volatile char *p=v; while (n--) *p++=c; return v; }

(Totally untested on modern compilers; the advice at the end of that page makes me think that the document could be a bit dated.)

Replying to nickm:

{ volatile char *p=v; while (n--) *p++=c; return v; }

I'm no expert here, but it seems to me that anything that doesn't use the changed memory after the change is just asking for trouble.

hoppy on #nottor:

so a while ago it was pointed out to me that I should not count on memset to zero sensitive memory because it could be optimized out, and was pointed to a Tor ticket (by NickM) that discussed this same thing in Tor and the solution suggested in that ticket was to use openssl cleanse instead but the ticket also pointed out that this was a heavy handed solution and there should be something better thought up and today I told this to somebody else who is developing a security program , to make sure to not use memset like I was in case they were, and to use openssl cleanse instead, and they said that a better solution is to use the word volatile in the cast because then the compiler doesn't optimize it out http://www.lix.polytechnique.fr/~liberti/public/computing/prog/c/C/SYNTAX/volatile.html so maybe that is better for Tor too , I dunno, just passing it on

The volatile-based thing is actually pretty sure to work, assuming that the compiler isn't busted. It turns out that it's pretty slow in practice, though. Another way to use volatile is:

    void * (* volatile memset_volatile)(void *, int, size_t) = memset; 
    ...
    memset_volatile(p, c, n);

(via a NetBSD mailing list at http://netbsd.2816.n7.nabble.com/Adding-memset-s-function-td43349.html .)

Another option is the memset_s() function added in the the C11 standard. Other than MS, though, I don't think any of the libcs we use support it yet.

Replying to nickm:

The volatile-based thing is actually pretty sure to work, assuming that the compiler isn't busted.

If the compiler isn't broken, a plain memset is sufficient. The point of this ticket is to find a memory-erasing function that broken compilers are not yet smart enough to recognize and remove.

This might actually be worth doing, since OPENSSL_cleanse is such a pile of voodoo.

See branch better_memwipe in my public repository.

Trac:
Milestone: Tor: unspecified to Tor: 0.2.6.x-final
Status: new to needs_review

Apply a nickm-patch keyword to tickets in needs_review in 0.2.6 where I wrote the patch, so I know which ones I can('t) review myself.

Trac:
Keywords: N/A deleted, nickm-patch added

OS X (10.9?) supports memset_s, as documented in the manual pages as of February 21, 2012. We'd have to #define STDC_WANT_LIB_EXT1 1 before (first) including string.h, which could be fun.

If we do something like:

void *guaranteed_memset(void *v,int c,size_t n) {

  char memset_failed = 0;
  int i;

  /* do the memset in a fast, safe, platform-specific way */

  for (i = 0; i < n; i++) {
    if (v[i] != c) {
      memset_failed = 1;
      break;
    }
  }

  assert(!memset_failed);

  return v;
}

we could prove the overwrite occurred, and ensure compilers don't/haven't optimise(d) it away.

This is the safest approach - one question: should we guard it with #if PARANOIA, or do it all the time?

(I favour doing it every time we overwrite, as it is a vital security property. However, it will be slower to read all the memory again.)

What do you think, nickm, arma, rransom?

Also, from http://www.securityfocus.com/archive/82/298061/2002-10-27/2002-11-02/0 linked from http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/protect-secrets.html which is quoted in comment 1:

"If you don't like the "volatile" solution (I don't, because access to volatile objects may be significantly slowed, and because aliasing volatile objects with non-volatile-qualified pointers and accessing through such an alias invokes undefined behavior, and because volatile seems like the sort of thing broken implementations may get wrong), use the external-memset-wrapper one I proposed in my previous note."

I don't like undefined behaviour - it tends to make conforming, optimising compilers misbehave, and unpredictably as they add new optimisations. We don't want to do things for broken compilers that break conformant ones.

To be clear, I don't think any of my patch requires undefined behavior; do you? In most cases, it accesses non-volatile objects through a volatile pointer; not the other way around. Does the standard forbid that too? Or have I misunderstood?

Also fwiw, the "external memset wrapper" solution isn't going to work with any compiler that does whole-program optimization.

The check-after-memset thing you propose might work too .. but I think that a compiler is also technically allowed to optimize that whole thing out, along with the memset, if it can prove that nothing else will look at the buffer afterwards.

Apologies, I wasn't clear - I was commenting on the solution proposed in the article, not your patch.

However, this quote worries me: "volatile seems like the sort of thing broken implementations may get wrong"

But, ultimately, there is only so much we can do to work around broken compilers.

"The check-after-memset thing you propose might work too .. but I think that a compiler is also technically allowed to optimize that whole thing out, along with the memset, if it can prove that nothing else will look at the buffer afterwards."

The assert() guarantees that there will be output if the buffer isn't cleared.

However, do you think a compiler could prove to itself that:

if it executed the code, there would never be any output from the assert()
therefore, it doesn't need to execute the code or the assert?

I guess it could. Screwy logic though.

I think asserting on the value of a volatile pointer fixes this.

How do you feel about:

#if PARANOIA
static void
memwipe_checker(volatile char *p, char c, size_t sz)
{
  /* check we filled the block with the right values */
  while (sz--)
    assert(*p++ == c);
}
#endif

void
memwipe(void *mem, unsigned char byte, size_t sz)
{
  /* ... memory wiping code ... */
#if PARANOIA
  /* if we're paranoid, check we actually wiped the memory */
  memwipe_checker(mem, byte, sz);
#endif

We could also make this level of PARANOIA mandatory, at some cost to performance.

Re: the patch itself - test-memwipe.c

I think the buffers could all be declared volatile:

if char buf[2048] (secret) is declared volatile, this means that the compiler cannot predict the value of sum, and therefore isn't allowed to precompute it (I can imagine a series of optimisation steps: optimise memcpy to static initialisation, optimise sum based on static initialisation, notice code similarity and use same value for all three sums, avoid compiling most of the program...)
if char buf[1024] (check) is declared volatile, it is no longer "uninitialised" memory - it is externally modifiable memory, that the compiler can never predict the value of.
using volatile also avoids all caching of values from these buffers, making sure the data is really written to and read from memory, not just cached in a large L3 cache somewhere. volatile is a way of telling the compiler: "I know what I'm doing here."

If we were really paranoid, we could test with and without volatile on char buf[2048] (secret).

Also, should we test heap allocations as well as stack allocations? They could have very different behaviours, and therefore security characteristics.

Well, for test-memwipe.c, we want the compiler to remove the memwipe if it can. I worry that making 'secret' volatile might keep some compiler out there from using its memset optimization, and thereby make our test invalid.

Heap allocations are probably even simpler to test than stack allocations. Let me see what I can come up with.

I've made the tests a little more sophisticated wrt stuff above; does this seem worth merging now?

I think that memwipe.c already has the required #define for OSX; does it look okay to you?

Yes, the #define looked good when I first reviewed the code. I'll review after finishing #13538 (moved).

Deferring; I have no evidence that the performance issue here actually matters.

Trac:
Milestone: Tor: 0.2.6.x-final to Tor: 0.2.7.x-final

actually, the openssl implementation is plenty fast, it turns out. The horrible kludgy implementation is only used when there isn't an asm one available, and there appear to be asm ones for most platforms.

I'm going to suggest we take the test for this in 0.2.7 though; tests are good.

Trac:
Milestone: Tor: 0.2.7.x-final to Tor: 0.2.???

Choose a faster memwipe implementation

Child items ...

Activity