Opened 7 years ago

Closed 4 years ago

#7419 closed enhancement (implemented)

Choose a faster memwipe implementation

Reported by: nickm Owned by:
Priority: Medium Milestone: Tor: 0.2.8.x-final
Component: Core Tor/Tor Version:
Severity: Normal Keywords: tor-relay
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

Our current memwipe implementation (see #7352) was chosen for being very-likely-to-be-safe, not for its speed. We could get much faster, probably: All we really need is a memwipe, plus some way to tell the compiler that the memwipe shouldn't be eliminated.

One GCC-only option is

void memwipe(void *p, uint8_t b, size_t sz) __attribtue__((noinline));
void memwipe(void *p, uint8_t b, size_t sz)
{
   memset(p, b, sz);
}

Another, crazier GCC-only option is

inline void 
memwipe(void *p, uint8_t b, size_t sz)
{
   memset(p, b, sz);
   asm("" ::: "memory");
}

And on windows, one could probably say

void
memwipe(void *p, uint8_t b, size_t sz)
{
   SecureZeroMemory(p, sz);
}

if b == 0.

Probably, we shouldn't bother to do any of these unless it turns out that OPENSSL_cleanse() is showing up in our profiles.

Child Tickets

Attachments (2)

tor_explicit_bzero.diff (884 bytes) - added by logan 4 years ago.
memset_s_explicit_bzero.diff (1.4 KB) - added by selven@… 4 years ago.
new diff, with changes files pointing to this ticket added.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 7 years ago by nickm

Also see http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/protect-secrets.html :

 void *guaranteed_memset(void *v,int c,size_t n)
  { volatile char *p=v; while (n--) *p++=c; return v; }

(Totally untested on modern compilers; the advice at the end of that page makes me think that the document could be a bit dated.)

comment:2 in reply to:  1 Changed 7 years ago by arma

Replying to nickm:

{ volatile char *p=v; while (n--) *p++=c; return v; }

I'm no expert here, but it seems to me that anything that doesn't use the changed memory after the change is just asking for trouble.

comment:3 Changed 7 years ago by mo

hoppy on #nottor:

so a while ago it was pointed out to me that I should not count on memset to zero sensitive memory because it could be optimized out, and was pointed to a Tor ticket (by NickM) that discussed this same thing in Tor and the solution suggested in that ticket was to use openssl cleanse instead
but the ticket also pointed out that this was a heavy handed solution and there should be something better thought up and today I told this to somebody else who is developing a security program , to make sure to not use memset like I was in case they were, and to use openssl cleanse instead, and they said that a better solution is to use the word volatile in the cast because then the compiler doesn't optimize it out http://www.lix.polytechnique.fr/~liberti/public/computing/prog/c/C/SYNTAX/volatile.html
so maybe that is better for Tor too , I dunno, just passing it on

comment:4 Changed 6 years ago by nickm

The volatile-based thing is actually pretty sure to work, assuming that the compiler isn't busted. It turns out that it's pretty slow in practice, though. Another way to use volatile is:

    void * (* volatile memset_volatile)(void *, int, size_t) = memset; 
    ...
    memset_volatile(p, c, n);

(via a NetBSD mailing list at http://netbsd.2816.n7.nabble.com/Adding-memset-s-function-td43349.html .)

Another option is the memset_s() function added in the the C11 standard. Other than MS, though, I don't think any of the libcs we use support it yet.

comment:5 in reply to:  4 Changed 6 years ago by rransom

Replying to nickm:

The volatile-based thing is actually pretty sure to work, assuming that the compiler isn't busted.

If the compiler isn't broken, a plain memset is sufficient. The point of this ticket is to find a memory-erasing function that broken compilers are not yet smart enough to recognize and remove.

comment:6 Changed 5 years ago by nickm

Milestone: Tor: unspecifiedTor: 0.2.6.x-final
Status: newneeds_review

This might actually be worth doing, since OPENSSL_cleanse is such a pile of voodoo.

See branch better_memwipe in my public repository.

comment:7 Changed 5 years ago by nickm

Keywords: nickm-patch added

Apply a nickm-patch keyword to tickets in needs_review in 0.2.6 where I wrote the patch, so I know which ones I can('t) review myself.

comment:8 Changed 5 years ago by teor

OS X (10.9?) supports memset_s, as documented in the manual pages as of February 21, 2012.
We'd have to #define STDC_WANT_LIB_EXT1 1 before (first) including string.h, which could be fun.

comment:9 Changed 5 years ago by teor

If we do something like:

void *guaranteed_memset(void *v,int c,size_t n) {

  char memset_failed = 0;
  int i;

  /* do the memset in a fast, safe, platform-specific way */

  for (i = 0; i < n; i++) {
    if (v[i] != c) {
      memset_failed = 1;
      break;
    }
  }

  assert(!memset_failed);

  return v;
}

we could prove the overwrite occurred, and ensure compilers don't/haven't optimise(d) it away.

This is the safest approach - one question: should we guard it with #if PARANOIA, or do it all the time?

(I favour doing it every time we overwrite, as it is a vital security property. However, it will be slower to read all the memory again.)

What do you think, nickm, arma, rransom?

comment:10 Changed 5 years ago by teor

Also, from http://www.securityfocus.com/archive/82/298061/2002-10-27/2002-11-02/0
linked from http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/protect-secrets.html which is quoted in comment 1:

"If you don't like the "volatile" solution
(I don't, because access to volatile objects may be significantly slowed,
and because aliasing volatile objects with non-volatile-qualified pointers
and accessing through such an alias invokes undefined behavior, and because
volatile seems like the sort of thing broken implementations may get wrong),
use the external-memset-wrapper one I proposed in my previous note."

I don't like undefined behaviour - it tends to make conforming, optimising compilers misbehave, and unpredictably as they add new optimisations. We don't want to do things for broken compilers that break conformant ones.

comment:11 Changed 5 years ago by nickm

To be clear, I don't think any of my patch requires undefined behavior; do you? In most cases, it accesses non-volatile objects through a volatile pointer; not the other way around. Does the standard forbid that too? Or have I misunderstood?

Also fwiw, the "external memset wrapper" solution isn't going to work with any compiler that does whole-program optimization.

The check-after-memset thing you propose might work too .. but I think that a compiler is also technically allowed to optimize that whole thing out, along with the memset, if it can prove that nothing else will look at the buffer afterwards.

comment:12 Changed 5 years ago by teor

Apologies, I wasn't clear - I was commenting on the solution proposed in the article, not your patch.

However, this quote worries me:
"volatile seems like the sort of thing broken implementations may get wrong"

But, ultimately, there is only so much we can do to work around broken compilers.

"The check-after-memset thing you propose might work too .. but I think that a compiler is also technically allowed to optimize that whole thing out, along with the memset, if it can prove that nothing else will look at the buffer afterwards."

The assert() guarantees that there will be output if the buffer isn't cleared.

However, do you think a compiler could prove to itself that:

  1. if it executed the code, there would never be any output from the assert()
  2. therefore, it doesn't need to execute the code or the assert?

I guess it could. Screwy logic though.

I think asserting on the value of a volatile pointer fixes this.

How do you feel about:

#if PARANOIA
static void
memwipe_checker(volatile char *p, char c, size_t sz)
{
  /* check we filled the block with the right values */
  while (sz--)
    assert(*p++ == c);
}
#endif

void
memwipe(void *mem, unsigned char byte, size_t sz)
{
  /* ... memory wiping code ... */
#if PARANOIA
  /* if we're paranoid, check we actually wiped the memory */
  memwipe_checker(mem, byte, sz);
#endif

We could also make this level of PARANOIA mandatory, at some cost to performance.

comment:13 Changed 5 years ago by teor

Re: the patch itself - test-memwipe.c

I think the buffers could all be declared volatile:

  • if char buf[2048] (secret) is declared volatile, this means that the compiler cannot predict the value of sum, and therefore isn't allowed to precompute it (I can imagine a series of optimisation steps: optimise memcpy to static initialisation, optimise sum based on static initialisation, notice code similarity and use same value for all three sums, avoid compiling most of the program...)
  • if char buf[1024] (check) is declared volatile, it is no longer "uninitialised" memory - it is externally modifiable memory, that the compiler can never predict the value of.
  • using volatile also avoids all caching of values from these buffers, making sure the data is really written to and read from memory, not just cached in a large L3 cache somewhere.

volatile is a way of telling the compiler: "I know what I'm doing here."

If we were really paranoid, we could test with and without volatile on char buf[2048] (secret).

Also, should we test heap allocations as well as stack allocations?
They could have very different behaviours, and therefore security characteristics.

comment:14 Changed 5 years ago by nickm

Well, for test-memwipe.c, we _want_ the compiler to remove the memwipe if it can. I worry that making 'secret' volatile might keep some compiler out there from using its memset optimization, and thereby make our test invalid.

Heap allocations are probably even simpler to test than stack allocations. Let me see what I can come up with.

comment:15 Changed 5 years ago by nickm

I've made the tests a little more sophisticated wrt stuff above; does this seem worth merging now?

I think that memwipe.c already has the required #define for OSX; does it look okay to you?

comment:16 Changed 5 years ago by teor

Yes, the #define looked good when I first reviewed the code. I'll review after finishing #13538.

comment:17 Changed 5 years ago by nickm

Milestone: Tor: 0.2.6.x-finalTor: 0.2.7.x-final

Deferring; I have no evidence that the performance issue here actually _matters_.

comment:18 Changed 5 years ago by nickm

Milestone: Tor: 0.2.7.x-finalTor: 0.2.???

actually, the openssl implementation is plenty fast, it turns out. The horrible kludgy implementation is only used when there isn't an asm one available, and there appear to be asm ones for most platforms.

I'm going to suggest we take the test for this in 0.2.7 though; tests are good.

comment:19 Changed 4 years ago by logan

Severity: Normal

Here's a patch that checks if the platform supports explicit_bzero(). Tested on OpenBSD.

PASS: src/test/test-memwipe

comment:20 Changed 4 years ago by logan

Wrong patch sent. Wrong branch. Uploading in a few minutes.

Changed 4 years ago by logan

Attachment: tor_explicit_bzero.diff added

comment:21 in reply to:  19 Changed 4 years ago by teor

Replying to logan:

Here's a patch that checks if the platform supports explicit_bzero(). Tested on OpenBSD.

PASS: src/test/test-memwipe

Replying to logan:

Here's a patch that checks if the platform supports explicit_bzero(). Tested on OpenBSD.

PASS: src/test/test-memwipe

There are multiple changes suggested in this ticket:

nickm's original branch better_memwipe attempted to avoid using OpenSSL_cleanse for performance reasons by checking if a simple memset worked. I was happy to see it merged, but we never did because there was no evidence that performance was an issue.

The attached explicit_bzero has better semantics than OpenSSL_cleanse, and is faster. Let's merge it.

As discussed on IRC, we can also use memset_s on platforms that support it (NetBSD, OS X):

So we can do the following:

  • if memset_s is supported:
    • call memset_s to wipe the memory and set the bytes
  • if explicit_bzero is supported:
    • call explicit_bzero to wipe the memory
    • call memset to set the bytes
  • if SecureZeroMemory is supported: (Win32)
    • call SecureZeroMemory to wipe the memory
    • call memset to set the bytes
  • otherwise:
    • call OpenSSL_cleanse to wipe the memory
    • call memset to set the bytes

Edit: add Windows SecureZeroMemory

Last edited 4 years ago by teor (previous) (diff)

comment:22 Changed 4 years ago by teor

Win32 also apparently has memset_s.

comment:23 Changed 4 years ago by teor

Status: needs_reviewneeds_revision

comment:24 Changed 4 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.2.8.x-final

comment:25 in reply to:  22 Changed 4 years ago by teor

Replying to teor:

Win32 also apparently has memset_s.

Windows has SecureZeroMemory in Windows.h.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366877.aspx

As long as we build with headers that contain SecureZeroMemory, the code will work on any Windows version:

"This function is defined as the RtlSecureZeroMemory function (see WinBase.h). The implementation of RtlSecureZeroMemory is provided inline and can be used on any version of Windows (see WinNT.h.)"

comment:26 Changed 4 years ago by selven@…

for the above diff https://trac.torproject.org/projects/tor/attachment/ticket/7419/memset_s_explicit_bzero.diff , i forgot to add a description

uname -a
Darwin selven@… 15.2.0 Darwin Kernel Version 15.2.0: Fri Nov 13 19:56:56 PST 2015; root:xnu-3248.20.55~2/RELEASE_X86_64 x86_64

configured with: ./configure --with-openssl-dir=/opt/local/ --disable-asciidoc
make check said:
PASS: src/test/test
PASS: src/test/test-slow
PASS: src/test/test-memwipe
PASS: src/test/test_workqueue
PASS: src/test/test_keygen.sh
PASS: src/test/test_zero_length_keys.sh
SKIP: src/test/test_switch_id.sh
PASS: src/test/test_ntor.sh
PASS: src/test/test_bt.sh

run:
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 0%: Starting
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 80%: Connecting to the Tor network
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 85%: Finishing handshake with first hop
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 90%: Establishing a Tor circuit
Jan 03 xx:xx:xx.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 100%: Done

Happy new year
Thanks,
Selven

comment:27 Changed 4 years ago by logan

Tested Selven's patch on OpenBSD. Detects explicit_bzero, and uses it.

comment:28 in reply to:  26 Changed 4 years ago by teor

Keywords: nickm-patch removed
Status: needs_revisionneeds_review

Replying to selven@…:

for the above diff https://trac.torproject.org/projects/tor/attachment/ticket/7419/memset_s_explicit_bzero.diff , i forgot to add a description

Thanks for the patch and the testing!

It's a straightforward patch that works on later versions of OpenBSD, NetBSD, and OS X.
(Using memset_s(..., ..., 0, ...) then memset(..., ..., byte, ...) is redundant, but I'm ok with that. It makes our choice between alternative functions clearer, which is important in security-critical code.)

Let's get this merged, then close this ticket.
https://trac.torproject.org/projects/tor/attachment/ticket/7419/memset_s_explicit_bzero.diff

(I've split off #17986 for using SecureZeroMemory on Windows.)
(Removing nickm-patch because the latest patch is not by nickm.)

comment:29 Changed 4 years ago by teor

Status: needs_reviewneeds_revision

Awaiting a revision to the changes file.

comment:30 Changed 4 years ago by teor

This is going to require a manual merge with #17986 that preserves all the different cases.

Changed 4 years ago by selven@…

new diff, with changes files pointing to this ticket added.

comment:31 Changed 4 years ago by nickm

Resolution: implemented
Status: needs_revisionclosed

Merged this.; thanks!

Note: See TracTickets for help on using tickets.