Opened 20 months ago

Last modified 16 months ago

#24400 new enhancement

Seccomp filter incorrectly tries to act on strings, allowing sandbox bypass

Reported by: Sebastian Owned by:
Priority: Medium Milestone: Tor: unspecified
Component: Core Tor/Tor Version:
Severity: Major Keywords: sandbox, 034-triage-20180328, 034-removed-20180328
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:

Description

I am filing this on behalf of cypherpunks who cannot use our trac because of its javascript requirements for posting some stuff.

Currently, the Tor seccomp sandbox attempts to whitelist internal strings. This is a bug, because seccomp is incapable of whitelisting memory contents, only register contents (i.e. the pointer itself). That is, when you call open(path, mode), the first argument is actually an address, such as 0x368a227dec3. When you attempt to use the seccomp filter to whitelist this, you are not whitelisting the string, only the pointer to the string. The only reason it works is because compilers deduplicate identical strings, allowing two strings that are the same (e.g. the string given to seccomp for whitelisting, and the string being given to a syscall) point to the same location in .rodata.

A demonstration showing deduplication:

cat $ hello.c
#include <stdio.h>

void main(int argc, char *argv[])
{
    char *s1 = "Hello, world!";
    char *s2 = "Hello, world!";
    char *s3 = argv[1];

    printf("%p\t%s\n%p\t%s\n%p\t%s\n", s1, s1, s2, s2, s3, s3);
}

$ gcc hello.c

$ ./a.out "Hello, world!"
0x64a4b457b4	Hello, world!
0x64a4b457b4	Hello, world!
0x3eb7d22b37d	Hello, world!

$ readelf -x .rodata a.out

Hex dump of section '.rodata':
  0x000007b0 01000200 48656c6c 6f2c2077 6f726c64 ....Hello, world
  0x000007c0 21002570 0925730a 25700925 730a2570 !.%p.%s.%p.%s.%p
  0x000007d0 0925730a 00                         .%s..

Despite the strings being identical in all cases, only the first two, which were defined internally, point to the same data. Only when the string is accessed at runtime does the value change. This means that hardcoding a string with a seccomp sandbox may seem to work, but provides no protection, as all an attacker would have to do is write their own strings to those addresses (changing the permissions of .rodata first if it is stored there).

A demonstration which shows how a whitelisted string can be bypassed:

$ cat seccomp.c
#include <seccomp.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>

void main(void)
{
    char *s = malloc(14);

    memcpy(s, "Hello, world!\n", 14);

    scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_TRAP);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
    seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 1,
        SCMP_A1(SCMP_CMP_EQ, (scmp_datum_t)s));
    seccomp_load(ctx);

    write(1, s, 14);
    memcpy(s, "Hallo, world!\n", 14);
    write(1, s, 14);
}

$ gcc seccomp.c -lseccomp

$ ./a.out
Hello, world!
Hallo, world!

See "Why does seccomp-filter is not enough?" [sic] from https://lwn.net/Articles/698226/, which explains why an LSM (currently in development) is required to provide seccomp with the ability to filter paths and other memory-resident values:

A seccomp filter can access to raw syscall arguments which means that it is not
possible to filter according to pointed data as a file path. As demonstrated
the first version of this patch series, filtering at the syscall level is
complicated (e.g. need to take care of race conditions). This is mainly because
the access control checkpoints of the kernel are not at this high-level but
more underneath, at LSM hooks level. The LSM hooks are designed to handle this
kind of checks. This series use this approach to leverage the ability of
unprivileged users to limit themselves.

It may seem like it would be enough to ensure the buffers are read-only and disallow mprotect() accessing that address. Unfortunately, other syscalls like brk() can bypass this. In the end, it's extremely difficult to ensure that a memory region cannot be written to by malicious code without disabling virtually all memory-related syscalls.

The entirety of src/common/sandbox.c:sandbox_intern_string is poorly thought out and relies on incorrect and dangerous assumptions. This results in 6 syscalls so far which devs believe to be filtered but which in fact can be bypassed:

  • chown - pathname argument
  • chmod - pathname argument
  • open - pathname argument
  • openat - pathname argument
  • rename - oldpath and newpath arguments
  • stat64 - pathname argument

The solution is to use the syscalls before enabling the sandbox, or whitelisting them (excluding arguments in memory) and eventually blacklisting them in a stage 2 sandbox.

Child Tickets

Change History (11)

comment:1 Changed 20 months ago by cypherpunks

I think cat $ hello.c should probably be $ cat hello.c instead.

Tested the demonstration files and they work. There is also a danger of thread concurrency bugs when trying to read memory in a filter context as well.

comment:2 Changed 20 months ago by cypherpunks

Severity: NormalMajor

comment:3 Changed 20 months ago by cypherpunks

@cypherpunks

While this is certainly an ugly issue, I wouldn't quite call it *major*. When I first found this issue, I looked for any syscalls that could break the whole sandbox, and didn't find any. Perhaps I should have asked for the title to be "allowing sandbox bypass of certain syscalls".

comment:4 Changed 20 months ago by cypherpunks

It seems like trying to set things like SocksPort to a UNIX domain socket will cause Tor to crash due to this bug. The path is whitelisted at compile-time, so changing it results in a violation and a confusing error suggesting that the path may not be readable.

Other affected parameters that are broken when the sandbox are in use include:

DirPortFrontPage
ServerDNSResolvConfFile
CookieAuthFile
ExtORPortCookieAuthFile

And likely more.

comment:5 Changed 20 months ago by cypherpunks

In case it's necessary, another PoC showing that .rodata, not just the heap, can be written to:

$ cat rodata.c
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>

void main(void)
{
    static const unsigned char rodata[4096] = { 0 };
    long page_base = ((long)rodata / 4097) * 4096;

    mprotect((void *)page_base, 1, PROT_READ|PROT_WRITE);
    strcpy((void *)page_base, "Hello, world!\n");
    write(1, rodata, 4096);
}

$ gcc rodata.c

$ ./a.out
Hello, world!

comment:6 Changed 20 months ago by nickm

So, I'm sure that the orgininal reporter knows this, but I don't think that all the people on this thread know it: the current sandbox code tries to do what the the original reporter describes as

ensur[ing] the buffers are read-only and disallow mprotect() accessing that address.

That's what Tor tries to do right now with all the crazy logic in prot_strings(). It tries to prevent mprotect() changes, munmap() or mremap() of the memory region, and any mprotect() changes that would overlap with the protected region. It's sure not very clean code, though, and I can believe that there are ways around it that we don't know about. How does the brk() bypass work here? What are the other bypasses that we should know about?


But let's assume that the bypass issues do affect us here.

I think that in the longer timeframe, the right solution is to have a separate process that retains more syscall permissions. This will take more design and implementation work, though. It would have to be an optional thing that only happens when a sandbox needs it, or we'll make ios (and android?) sad. It's a good candidate for a rust thing IMO.

In the shorter term, we could remove the logic that tries to list all the files and only permit those, and instead permit open, openat, rename, etc more generally, if there's a benefit to that.

We should also figure out what timeframe we can do the "right" solution on.

comment:7 Changed 20 months ago by cypherpunks

It's sure not very clean code, though, and I can believe that there are ways around it that we don't know about. How does the brk() bypass work here? What are the other bypasses that we should know about?

I saw a demonstration when I proposed this idea to... I think it was TheJH? I'd have to ask again to remember the details.

(and android?)

Android works the same way as vanilla Linux in this respect.

In the shorter term, we could remove the logic that tries to list all the files and only permit those, and instead permit open, openat, rename, etc more generally, if there's a benefit to that.

While removal would fix some bugs, it still provides (I think) benefit for systems with PaX MPROTECT, since that prevents making rx pages writable (such as .text).

We should also figure out what timeframe we can do the "right" solution on.

This is an issue for many projects, so there is effort to remedy this (e.g. with an LSM). It might be best for the "right" solution to use that when it comes out. Having a separate process or greatly reworking the architecture of Tor doesn't seem likely.

comment:8 Changed 18 months ago by nickm

Milestone: Tor: 0.3.3.x-finalTor: 0.3.4.x-final
Type: defectenhancement

Label a bunch of (arguable and definite) enhancements as enhancements for 0.3.4.

comment:9 Changed 16 months ago by nickm

Keywords: 034-triage-20180328 added

comment:10 Changed 16 months ago by nickm

Keywords: 034-removed-20180328 added

Per our triage process, these tickets are pending removal from 0.3.4.

comment:11 Changed 16 months ago by nickm

Milestone: Tor: 0.3.4.x-finalTor: unspecified

These tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if time permits.

Note: See TracTickets for help on using tickets.