Opened 4 years ago

Last modified 17 months ago

#13912 new defect

Key Security: Zeroing Buffers Is Insufficient (AES-NI leaves keys in SSE registers)

Reported by: teor Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version: Tor: 0.2.6.1-alpha
Severity: Normal Keywords: security registers aesni memwipe tor-relay
Cc: nickm, isis Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

The article "Zeroing Buffers Is Insufficient" describes how AES-NI can leave keys in SSE registers for long periods of time. (It also describes issues with temporary variables on the stack, and in other registers.)

http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html

Is there a way we can semi-portably fix this?

Child Tickets

Change History (13)

comment:1 Changed 4 years ago by nickm

I think that with aesni, at least, it's a non-issue. Remember:

  • we're doing AES all over the place, with many keys, many bytes at a time.
  • Leaking SSE registers is not so simple as leaking memory.

But it it seems important to fix. options might include:

  • sticking in a pure-assembly "zero the SSE registers" call after each AES or SSL invocation

comment:2 Changed 4 years ago by teor

As for compilers' habits of spilling register values onto the stack: (GCC in particular)

Compilers are free to make copies of data, rearranging it for faster access. One of the worst culprits in this regard is GCC: Because its register allocator does not apply any backpressure to the common subexpression elimination routines, GCC can decide to load values from memory into "registers", only to end up spilling those values onto the stack when it discovers that it does not have enough physical registers (this is one of the reasons why gcc -O3 sometimes produces slower code than gcc -O2). Even without register allocation bugs, however, all compilers will store temporary values on the stack from time to time, and there is no legal way to sanitize these from within C.

Is the conclusion, that "there is no legal way to sanitize [compiler-created temporaries on the stack] from within C", correct?

If so, I could imagine the following strategies to address this issue:

  • avoid building tor with gcc -O3
  • allocate and zero buffers on the stack after returning from sensitive functions

Are there any others?

comment:3 Changed 4 years ago by teor

Are there compiler flags which keep the number of virtual registers allocated within physical limits?
Can we decorate only certain functions or files with these?

comment:4 Changed 4 years ago by nickm

Well, there's no *portable*, *standards-guaranteed* way to sanitize the stack. But in practice you can probably call a function that does something like.

  void stomp_stack(void) __attribute__((noinline));
  void stomp_stack(void)
  {
      unsigned char huge[256*1024];
      memset_s(huge, sizeof(huge), 0, sizeof(huge));
  }

But of course, you wouldn't want to do that too often.

WRT the original question of SSE registers, Yawning had some interesting points on IRC today. I hope he can summarize.

comment:5 Changed 4 years ago by yawning

Ooof. This is tricky to solve correctly, but the AES-NI case is probably not exploitable. From talking with nickm on IRC about this, the only way for this to actually leak AES keys would be:

  • Bugs that allow arbitrary code execution (we've lost in that case regardless)
  • Something that reads from a uninitialized XMM register in a way that spits it out onto heap/stack/the network, while displaying "correct" behavior otherwise.
  • Your kernel is compromised (we've lost in that case regardless) since the registers get saved on context switch.

These cases seem somewhat far fetched to me. Skimming the OpenSSL code (Warning, not comprehensive), it looks like the round keys are stored in xmm0/xmm1 (xmm0-5 is used for the key expansion), so we don't actually need to scrub *everything* if we want to go down this path. The compiler shouldn't be writing the contents of these registers out onto the stack/heap after a return back into our code.

It's also worth a minor sidenote that recent glibc will use vectorized memcpy() for sufficiently large copies, and will obliterate the contents of these registers, though I have not checked to see if we memcpy() enough data to trigger the vectorized codepath with any large frequency.

comment:6 Changed 3 years ago by isis

Cc: isis added

comment:7 Changed 2 years ago by teor

Milestone: Tor: 0.2.???Tor: 0.3.???

Milestone renamed

comment:8 Changed 2 years ago by nickm

Keywords: tor-03-unspecified-201612 added
Milestone: Tor: 0.3.???Tor: unspecified

Finally admitting that 0.3.??? was a euphemism for Tor: unspecified all along.

comment:9 Changed 19 months ago by nickm

Keywords: tor-03-unspecified-201612 removed

Remove an old triaging keyword.

comment:10 Changed 18 months ago by nickm

Keywords: registers aesni memwipe tor-relay added
Severity: Normal

comment:11 in reply to:  5 ; Changed 17 months ago by cypherpunks

Replying to yawning:

Ooof. This is tricky to solve correctly, but the AES-NI case is probably not exploitable. From talking with nickm on IRC about this, the only way for this to actually leak AES keys would be:

  • Bugs that allow arbitrary code execution (we've lost in that case regardless)
  • Something that reads from a uninitialized XMM register in a way that spits it out onto heap/stack/the network, while displaying "correct" behavior otherwise.
  • Your kernel is compromised (we've lost in that case regardless) since the registers get saved on context switch.

What about ROP gadgets that do not provide turing complete behavior (so no "arbitrary" code execution), but still expose the sensitive registers? There will certainly be gadgets for reading from these registers.

comment:12 in reply to:  11 ; Changed 17 months ago by yawning

Replying to cypherpunks:

What about ROP gadgets that do not provide turing complete behavior (so no "arbitrary" code execution), but still expose the sensitive registers?

I think you've likewise effective lost at that point. Patch OpenSSL's assembly in strategic locations if you actually care about this, though there's a a lot of other places in the code that don't scrub "sensitive" keying information so IMO this is a lost cause.

comment:13 in reply to:  12 Changed 17 months ago by isis

Replying to yawning:

Replying to cypherpunks:

What about ROP gadgets that do not provide turing complete behavior (so no "arbitrary" code execution), but still expose the sensitive registers?

I think you've likewise effective lost at that point. Patch OpenSSL's assembly in strategic locations if you actually care about this, though there's a a lot of other places in the code that don't scrub "sensitive" keying information so IMO this is a lost cause.

Agreed. I think if we're at the point that an adversary can somehow chain ROP gadgets to get a partial key read from an xmm register, I'd be way more worried about a ROP chain for full RCE.

Note: See TracTickets for help on using tickets.