Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

IOVisor Summit: BPF Tools 2017

Slides from a discussion at the IOVisor (eBPF) workshop in Feb 2017 led by Brendan Gregg, about BPF performance and observability tools.

next
prev
1/44
next
prev
2/44
next
prev
3/44
next
prev
4/44
next
prev
5/44
next
prev
6/44
next
prev
7/44
next
prev
8/44
next
prev
9/44
next
prev
10/44
next
prev
11/44
next
prev
12/44
next
prev
13/44
next
prev
14/44
next
prev
15/44
next
prev
16/44
next
prev
17/44
next
prev
18/44
next
prev
19/44
next
prev
20/44
next
prev
21/44
next
prev
22/44
next
prev
23/44
next
prev
24/44
next
prev
25/44
next
prev
26/44
next
prev
27/44
next
prev
28/44
next
prev
29/44
next
prev
30/44
next
prev
31/44
next
prev
32/44
next
prev
33/44
next
prev
34/44
next
prev
35/44
next
prev
36/44
next
prev
37/44
next
prev
38/44
next
prev
39/44
next
prev
40/44
next
prev
41/44
next
prev
42/44
next
prev
43/44
next
prev
44/44

PDF: IOVisor2017_bpftools.pdf

Keywords (from pdftotext):

slide 1:
    27 Feb 2017
    Brendan Gregg
    Senior Performance Architect
    image: h)p://makeitstranger.com
    
slide 2:
    Observability
    
slide 3:
    Best Possible Performance
    Root Cause Analysis
    
slide 4:
    Needed:
    Observe Everything
    In Production
    Quickly
    
slide 5:
slide 6:
    Enhanced BPF is in Linux
    
slide 7:
    Version BPF support arrived
    Linux 4.7
    Linux 4.3
    Linux 4.1
    BPF output
    Linux 4.4
    BPF stacks
    Linux 4.6
    Linux 4.9
    Linux 4.9
    
slide 8:
    How do we
    use these
    superpowers?
    
slide 9:
    Methodologies
    Off-CPU Analysis
    Thread State Analysis
    e.t.c.
    Pose Q's for tools to A
    
slide 10:
    Current Tools
    
slide 11:
    bcc: BPF Compiler Collection
    https://github.com/iovisor/bcc
    
slide 12:
slide 13:
    Single Purpose Tools
    Multi-Tools
    
slide 14:
    Single purpose vs Multi-tools
    # opensnoop
    PID
    COMM
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    FD ERR PATH
    0 /lib/x86_64-linux-gnu/libresolv.so.2
    0 /lib/x86_64-linux-gnu/libgpg-error.so.0
    0 /dev/urandom
    2 /lib/x86_64-linux-gnu/.libcrypto.so.1.0.0.hmac
    2 /proc/sys/crypto/fips_enabled
    # trace 'do_sys_open "%s", arg2' 'r::do_sys_open "ret:%d", retval'
    PID
    TID
    COMM
    FUNC
    redis-server do_sys_open
    /proc/1651/stat
    redis-server do_sys_open
    /proc/1968/stat
    redis-server do_sys_open
    ret:5
    redis-server do_sys_open
    ret:5
    snmp-pass
    do_sys_open
    /proc/cpuinfo
    snmp-pass
    do_sys_open
    ret:4
    snmp-pass
    do_sys_open
    /proc/stat
    snmp-pass
    do_sys_open
    ret:4
    
slide 15:
    Single purpose tool usage
    # biolatency -h
    usage: biolatency [-h] [-T] [-Q] [-m] [-D] [interval] [count]
    Summarize block device I/O latency as a histogram
    positional arguments:
    interval
    output interval, in seconds
    count
    number of outputs
    optional arguments:
    -h, --help
    -T, --timestamp
    -Q, --queued
    -m, --milliseconds
    -D, --disks
    examples:
    ./biolatency
    [...]
    show this help message and exit
    include timestamp on output
    include OS queued time in I/O time
    millisecond histogram
    print a histogram per disk device
    # summarize block I/O latency as a histogram
    
slide 16:
    CLI
    Tool Design
    
slide 17:
    Template 1: Per Event Output
    # opensnoop
    PID
    COMM
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    10085 sshd
    […]
    FD ERR PATH
    0 /lib/x86_64-linux-gnu/libkeyutils.so.1
    0 /lib/x86_64-linux-gnu/libresolv.so.2
    0 /lib/x86_64-linux-gnu/libgpg-error.so.0
    0 /dev/urandom
    2 /lib/x86_64-linux-gnu/.libcrypto.so.1.0.0.hmac
    2 /proc/sys/crypto/fips_enabled
    0 /proc/filesystems
    0 /dev/null
    0 /proc/10085/fd
    0 /usr/lib/ssl/openssl.cnf
    0 /etc/gai.conf
    0 /etc/nsswitch.conf
    0 /etc/ld.so.cache
    0 /lib/x86_64-linux-gnu/libnss_compat.so.2
    0 /etc/ld.so.cache
    0 /lib/x86_64-linux-gnu/libnss_nis.so.2
    
slide 18:
    Template 2: Filtered Event Output
    # ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    06:49:17 bash
    R 128
    06:49:17 cksum
    R 39552
    06:49:17 cksum
    R 96
    06:49:17 cksum
    R 96
    06:49:17 cksum
    R 10320
    06:49:17 cksum
    R 65536
    06:49:17 cksum
    R 55400
    06:49:17 cksum
    R 36792
    06:49:17 cksum
    R 15008
    06:49:17 cksum
    R 6123
    06:49:17 cksum
    R 6280
    06:49:17 cksum
    R 27696
    06:49:17 cksum
    R 58080
    06:49:17 cksum
    R 906
    06:49:17 cksum
    R 6320
    […]
    LAT(ms) FILENAME
    7.75 cksum
    1.34 [
    5.36 2to3-2.7
    14.94 2to3-3.4
    6.82 411toppm
    4.01 a2p
    8.77 ab
    16.34 aclocal-1.14
    19.31 acpi_listen
    17.23 add-apt-repository
    18.40 addpart
    2.16 addr2line
    10.11 ag
    6.30 ec2-meta-data
    10.00 animate.im6
    
slide 19:
    Template 3: Interval Summary
    # dcstat
    TIME
    08:11:47:
    08:11:48:
    08:11:49:
    08:11:50:
    08:11:51:
    08:11:52:
    08:11:53:
    08:11:54:
    08:11:55:
    08:11:56:
    08:11:57:
    […]
    REFS/s
    SLOW/s
    MISS/s
    HIT%
    
slide 20:
    Template 4: Count Summary
    # funccount 'vfs_*'
    Tracing... Ctrl-C to end.
    ADDR
    FUNC
    ffffffff811efe81 vfs_create
    ffffffff811f24a1 vfs_rename
    ffffffff81215191 vfs_fsync_range
    ffffffff81231df1 vfs_lock_file
    ffffffff811e8dd1 vfs_fstatat
    ffffffff811e8d71 vfs_fstat
    ffffffff811e4381 vfs_write
    ffffffff811e8c71 vfs_getattr_nosec
    ffffffff811e8d41 vfs_getattr
    ffffffff811e3221 vfs_open
    ffffffff811e4251 vfs_read
    Detaching...
    COUNT
    
slide 21:
    Template 5: Histogram Summary
    # biolatency
    Tracing block device I/O... Hit Ctrl-C to end.
    usecs
    : count
    distribution
    4 ->gt; 7
    : 0
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 1
    128 ->gt; 255
    : 12
    |********
    256 ->gt; 511
    : 15
    |**********
    512 ->gt; 1023
    : 43
    |*******************************
    1024 ->gt; 2047
    : 52
    |**************************************|
    2048 ->gt; 4095
    : 47
    |**********************************
    4096 ->gt; 8191
    : 52
    |**************************************|
    8192 ->gt; 16383
    : 36
    |**************************
    16384 ->gt; 32767
    : 15
    |**********
    32768 ->gt; 65535
    : 2
    65536 ->gt; 131071
    : 2
    
slide 22:
    Template 6: Heatmap Summary
    
slide 23:
    Template 7: Folded stack output for flame graphs
    offcputime -f
    offwaketime -f
    wakeuptime -f
    profile -f
    | flamegraph.pl
    >gt; out.svg
    
slide 24:
    Valuable
    Know what already exists
    and what doesn't
    
slide 25:
    Documented
    code comments
    man pages
    example files
    
slide 26:
    Concise, intuitive
    self-explanatory
    # iolatency
    Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
    >gt;=(ms) .. gt; 1
    1 ->gt; 2
    2 ->gt; 4
    4 ->gt; 8
    8 ->gt; 16
    […]
    : I/O
    : 4381
    : 9
    : 5
    : 0
    : 1
    |Distribution
    |######################################|
    
slide 27:
    # ./biolatency -h
    usage: biolatency [-h] [-T] [-Q] [-m] [-D] [interval] [count]
    .POSIX-style.
    arguments
    Summarize block device I/O latency as a histogram
    positional arguments:
    interval
    output interval, in seconds
    count
    number of outputs
    optional arguments:
    -h, --help
    -T, --timestamp
    -Q, --queued
    -m, --milliseconds
    -D, --disks
    examples:
    ./biolatency
    ./biolatency 1 10
    ./biolatency -mT 1
    ./biolatency -Q
    ./biolatency -D
    show this help message and exit
    include timestamp on output
    include OS queued time in I/O time
    millisecond histogram
    print a histogram per disk device
    # summarize block I/O latency as a histogram
    # print 1 second summaries, 10 times
    # 1s summaries, milliseconds, and timestamps
    # include OS queued time in I/O time
    # show each disk device separately
    
slide 28:
    Op>gt;on
    -c CMD
    -d SECONDS
    -i FILE
    -i SECONDS
    -n name
    -o FILE
    -p PID
    -P PORT
    -t or -T
    [interval [count]]
    Alternate
    --all
    --cmd …
    --duraAon …
    --help
    --input …
    --interval …
    --name …
    --output …
    --pid …
    --by-process
    --port …
    --[no]Amestamp
    --verbose
    --extended, --errors
    Expecta>gt;on
    all events
    run this command
    duraAon of tool execuAon
    help
    input file
    summary interval
    this process name only
    output file
    this process ID only
    per-process ID breakdown
    this TCP port only
    include or exclude Amestamps
    verbose output
    extended output, or only failures
    summary interval, and # of outputs
    
slide 29:
    Tested
    If you can't write the workload,
    you can't write the tool
    
slide 30:
    Future
    Challenges
    
slide 31:
    State of bcc, Feb 2017
    State of BPF, Feb 2017
    Dynamic tracing, kernel-level (BPF support for kprobes)
    Dynamic tracing, user-level (BPF support for uprobes)
    StaAc tracing, kernel-level (BPF support for tracepoints)
    Timed sampling events (BPF with perf_event_open)
    PMC events (BPF with perf_event_open)
    Filtering (via BPF programs)
    Debug output (bpf_trace_printk())
    Per-event output (bpf_perf_event_output())
    Basic variables (global & per-thread variables, via BPF maps)
    AssociaAve arrays (via BPF maps)
    Frequency counAng (via BPF maps)
    Histograms (power-of-2, linear, and custom, via BPF maps)
    Timestamps and Ame deltas (bpf_kAme_get_() and BPF)
    Stack traces, kernel (BPF stackmap)
    Stack traces, user (BPF stackmap)
    Overwrite ring buffers
    String factory (stringmap)
    OpAonal: bounded loops, 
slide 32:
    Dynamic tracing stability
    need those smoke tests
    switch tools to static tracepoints
    
slide 33:
    Invalid Tools
    
slide 34:
    Overhead
    Especially current uprobes
    
slide 35:
    Ease of Coding
    
slide 36:
    bcc/BPF
    bcc examples/tracing/bitehist.py
    en>gt;re program
    
slide 37:
    ply/BPF
    h)ps://github.com/wkz/ply/blob/master/README.md
    en>gt;re program
    
slide 38:
    Visualizations
    
slide 39:
    Visualizations and GUIs
    Eg, Nejlix self-service UI:
    Flame Graphs
    Tracing Reports
    
slide 40:
    Ancient Linux
    Linux 3.18
    Linux 3.10
    Linux 3.2
    Linux 2.6.x
    
slide 41:
    (Some) More Tools
    
slide 42:
    Finish porting my old DTrace tools
    
slide 43:
    Links & References
    iovisor bcc:
    https://github.com/iovisor/bcc
    https://github.com/iovisor/bcc/tree/master/docs
    http://www.brendangregg.com/blog/ (search for "bcc")
    http://blogs.microsoft.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/
    I'll change your view of Linux tracing: https://www.youtube.com/watch?v=GsMs3n8CB6g
    On designing tracing tools: https://www.youtube.com/watch?v=uibLwoVKjec
    BPF:
    • https://www.kernel.org/doc/Documentation/networking/filter.txt
    • https://github.com/iovisor/bpf-docs
    • https://suchakra.wordpress.com/tag/bpf/
    Flame Graphs:
    • http://www.brendangregg.com/flamegraphs.html
    • http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
    • http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
    Linux Performance: http://www.brendangregg.com/linuxperf.html
    
slide 44:
    Thanks
    Discussion?
    iovisor bcc: https://github.com/iovisor/bcc
    http://www.brendangregg.com
    http://slideshare.net/brendangregg
    bgregg@netflix.com
    @brendangregg
    Thanks to Alexei Starovoitov (Facebook), Brenden Blanco
    (PLUMgrid/VMware), Sasha Goldshtein (Sela), Daniel
    Borkmann (Cisco), Wang Nan (Huawei), and other BPF
    and bcc contributors!