Linux bcc Tracing Security Capabilities

Which Linux security capabilities are your applications using? I recently developed a new tool, capable, to print out capability checks live:

# capable
TIME      UID    PID    COMM             CAP  NAME                 AUDIT
22:11:23  114    2676   snmpd            12   CAP_NET_ADMIN        1
22:11:23  0      6990   run              24   CAP_SYS_RESOURCE     1
22:11:23  0      7003   chmod            3    CAP_FOWNER           1
22:11:23  0      7003   chmod            4    CAP_FSETID           1
22:11:23  0      7005   chmod            4    CAP_FSETID           1
22:11:23  0      7005   chmod            4    CAP_FSETID           1
22:11:23  0      7006   chown            4    CAP_FSETID           1
22:11:23  0      7006   chown            4    CAP_FSETID           1
22:11:23  0      6990   setuidgid        6    CAP_SETGID           1
22:11:23  0      6990   setuidgid        6    CAP_SETGID           1
22:11:23  0      6990   setuidgid        7    CAP_SETUID           1
22:11:24  0      7013   run              24   CAP_SYS_RESOURCE     1
22:11:24  0      7026   chmod            3    CAP_FOWNER           1
[...]

capable uses bcc, a front-end and a collection of tools that use new Linux enhanced BPF tracing capabilities. capable works by using BPF with kprobes to dynamically trace the kernel cap_capable() function, and then uses a table to map the capability index to the name seen in the output. Here's the source code: it's pretty straightforward.

I wrote it as a colleague, Michael Wardrop, asked what security capabilities our applications were actually using. Given a list, we could use setcap(8) (or other software) to improve the security of applications by only allowing the necessary capabilities.

Non-audit Checks

The cap_capable() function has an audit argument, which directs whether the capability check should write an audit message or not, if that's configured. By default, I only print capability checks where this is true, but capable can also trace all checks with the -v option:

# capable -h
usage: capable [-h] [-v] [-p PID]

Trace security capability checks

optional arguments:
  -h, --help         show this help message and exit
  -v, --verbose      include non-audit checks
  -p PID, --pid PID  trace this PID only

examples:
    ./capable             # trace capability checks
    ./capable -v          # verbose: include non-audit checks
    ./capable -p 181      # only trace PID 181

Here's some non-audit events:

# capable -v
TIME      UID    PID    COMM             CAP  NAME                 AUDIT
20:53:45  60004  22061  lsb_release      21   CAP_SYS_ADMIN        0
20:53:45  60004  22061  lsb_release      21   CAP_SYS_ADMIN        0
20:53:45  60004  22061  lsb_release      21   CAP_SYS_ADMIN        0
20:53:45  60004  22061  lsb_release      21   CAP_SYS_ADMIN        0
20:53:45  60004  22061  lsb_release      21   CAP_SYS_ADMIN        0
20:53:45  60004  22061  lsb_release      21   CAP_SYS_ADMIN        0
[...]

What are all those?

I'll start by showing the cap_capable() function prototype, from security/commoncap.c:

int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
                int cap, int audit)

Now I can use bcc's trace program to inspect these calls (bear with me), given that cap will be arg3, and audit arg4 (from the above prototype):

# trace 'cap_capable "cap: %d, audit: %d", arg3, arg4'
TIME     PID    COMM         FUNC             -
20:56:18 25535  lsb_release  cap_capable      cap: 21, audit: 0
20:56:18 25535  lsb_release  cap_capable      cap: 21, audit: 0
20:56:18 25535  lsb_release  cap_capable      cap: 21, audit: 0
20:56:18 25535  lsb_release  cap_capable      cap: 21, audit: 0
20:56:18 25535  lsb_release  cap_capable      cap: 21, audit: 0
[...]

That one-liner is pretty similar to my capable program, except it lacks the "NAME" column with human readable translations.

I'm really doing this so I can add the (newly added) -K and -U options, which print kernel and user-level stack traces. I'll just use -K:

# trace -K 'cap_capable "cap: %d, audit: %d", arg3, arg4'
TIME     PID    COMM         FUNC             -
[...]
20:59:58 30607  lsb_release  cap_capable      cap: 21, audit: 0
    Kernel Stack Trace:
        ffffffff813659d1 cap_capable
        ffffffff813684bb security_vm_enough_memory_mm
        ffffffff811deda6 expand_downwards
        ffffffff811def64 expand_stack
        ffffffff81234321 setup_arg_pages
        ffffffff8128c10b load_elf_binary
        ffffffff81234cee search_binary_handler
        ffffffff8128b7ff load_script
        ffffffff81234cee search_binary_handler
        ffffffff8123635a do_execveat_common.isra.35
        ffffffff812367da sys_execve
        ffffffff81003bae do_syscall_64
        ffffffff81861ca5 return_from_SYSCALL_64

20:59:58 30607  lsb_release  cap_capable      cap: 21, audit: 0
    Kernel Stack Trace:
        ffffffff813659d1 cap_capable
        ffffffff813684bb security_vm_enough_memory_mm
        ffffffff811df623 mmap_region
        ffffffff811dff4b do_mmap
        ffffffff811c122a vm_mmap_pgoff
        ffffffff811c1295 vm_mmap
        ffffffff8128bb93 elf_map
        ffffffff8128c271 load_elf_binary
        ffffffff81234cee search_binary_handler
        ffffffff8128b7ff load_script
        ffffffff81234cee search_binary_handler
        ffffffff8123635a do_execveat_common.isra.35
        ffffffff812367da sys_execve
        ffffffff81003bae do_syscall_64
        ffffffff81861ca5 return_from_SYSCALL_64
[...]

Awesome. So these are coming from security_vm_enough_memory_mm(). By reading the source, I see it's used to reserve some memory for root. It's not a hard failure if the capability is missing. And it's not really a security check, hence why it disabled audit.

I should add a -K option to the capable tool, so it can print stack traces too.

Older Kernels

To use capable, you'll need a 4.4 kernel. To use the -K option, 4.6.

Here's a version using my older perf-tools collection, which uses ftrace and should work on much older kernels including the 3.x series:

# ./perf-tools/bin/kprobe -s 'p:cap_capable cap=%dx audit=%cx' 'audit != 0'
Tracing kprobe cap_capable. Ctrl-C to end.
             run-4440  [003] d... 6417394.367486: cap_capable: (cap_capable+0x0/0x70) cap=0x18 audit=0x1
             run-4440  [003] d... 6417394.367492: 
 => ns_capable_common
 => capable
 => do_prlimit
 => SyS_setrlimit
 => entry_SYSCALL_64_fastpath
           chmod-4453  [006] d... 6417394.399020: cap_capable: (cap_capable+0x0/0x70) cap=0x3 audit=0x1
           chmod-4453  [006] d... 6417394.399027: 
 => ns_capable_common
 => ns_capable
 => inode_owner_or_capable
 => inode_change_ok
 => xfs_setattr_nonsize
 => xfs_vn_setattr
 => notify_change
 => chmod_common
 => SyS_fchmodat
 => entry_SYSCALL_64_fastpath
           chmod-4453  [006] d... 6417394.399035: cap_capable: (cap_capable+0x0/0x70) cap=0x4 audit=0x1
           chmod-4453  [006] d... 6417394.399037: 
 => ns_capable_common
 => capable_wrt_inode_uidgid
 => inode_change_ok
 => xfs_setattr_nonsize
 => xfs_vn_setattr
 => notify_change
 => chmod_common
 => SyS_fchmodat
 => entry_SYSCALL_64_fastpath
           chmod-4455  [007] d... 6417394.402524: cap_capable: (cap_capable+0x0/0x70) cap=0x4 audit=0x1
           chmod-4455  [007] d... 6417394.402529: 
 => ns_capable_common
 => capable_wrt_inode_uidgid
 => inode_change_ok
 => xfs_setattr_nonsize
 => xfs_vn_setattr
 => notify_change
 => chmod_common
 => SyS_fchmodat
 => entry_SYSCALL_64_fastpath
[...]

It's a one-liner using my kprobe tool. It's also (currently) a bit harder to use: I need to know which registers those arguments will be in: the example above is for x86_64 only.

That's all for now. Happy hacking.

Brendan Gregg's Blog