Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

OSSNA 2017: Performance Analysis Superpowers with Linux BPF

Talk by Brendan Gregg for OSSNA 2017.

Description: "Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will be a dive deep on these new tracing, observability, and debugging capabilities, which sooner or later will be available to everyone who uses Linux. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.

This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."

next
prev
1/68
next
prev
2/68
next
prev
3/68
next
prev
4/68
next
prev
5/68
next
prev
6/68
next
prev
7/68
next
prev
8/68
next
prev
9/68
next
prev
10/68
next
prev
11/68
next
prev
12/68
next
prev
13/68
next
prev
14/68
next
prev
15/68
next
prev
16/68
next
prev
17/68
next
prev
18/68
next
prev
19/68
next
prev
20/68
next
prev
21/68
next
prev
22/68
next
prev
23/68
next
prev
24/68
next
prev
25/68
next
prev
26/68
next
prev
27/68
next
prev
28/68
next
prev
29/68
next
prev
30/68
next
prev
31/68
next
prev
32/68
next
prev
33/68
next
prev
34/68
next
prev
35/68
next
prev
36/68
next
prev
37/68
next
prev
38/68
next
prev
39/68
next
prev
40/68
next
prev
41/68
next
prev
42/68
next
prev
43/68
next
prev
44/68
next
prev
45/68
next
prev
46/68
next
prev
47/68
next
prev
48/68
next
prev
49/68
next
prev
50/68
next
prev
51/68
next
prev
52/68
next
prev
53/68
next
prev
54/68
next
prev
55/68
next
prev
56/68
next
prev
57/68
next
prev
58/68
next
prev
59/68
next
prev
60/68
next
prev
61/68
next
prev
62/68
next
prev
63/68
next
prev
64/68
next
prev
65/68
next
prev
66/68
next
prev
67/68
next
prev
68/68

PDF: OSS2017_BPF_superpowers.pdf

Keywords (from pdftotext):

slide 1:
    Performance Analysis
    Superpowers with Linux BPF
    Brendan Gregg
    Sep 2017
    
slide 2:
slide 3:
    bcc/BPF tools
    
slide 4:
    DEMO
    
slide 5:
    Agenda
    1. eBPF & bcc
    2. bcc/BPF CLI Tools
    3. bcc/BPF Visualizations
    
slide 6:
    Take aways
    1. Understand Linux tracing and enhanced BPF
    2. How to use BPF tools
    3. Areas of future development
    
slide 7:
slide 8:
    Who at Ne/lix will use BPF?
    
slide 9:
    BPF
    Introducing enhanced BPF for tracing: kernel-level
    software
    
slide 10:
    Ye Olde BPF
    Berkeley Packet Filter
    # tcpdump host 127.0.0.1 and port 22 -d
    Optimizes packet filter
    (000) ldh
    [12]
    performance
    (001) jeq
    #0x800
    jt 2
    jf 18
    (002) ld
    [26]
    (003) jeq
    #0x7f000001
    jt 6
    jf 4
    (004) ld
    [30]
    2 x 32-bit registers
    (005) jeq
    #0x7f000001
    jt 6
    jf 18
    & scratch memory
    (006) ldb
    [23]
    (007) jeq
    #0x84
    jt 10
    jf 8
    (008) jeq
    #0x6
    jt 10
    jf 9
    (009) jeq
    #0x11
    jt 10
    jf 18
    User-defined bytecode
    (010) ldh
    [20]
    executed by an in-kernel
    (011) jset
    #0x1fff
    jt 18
    jf 12
    sandboxed virtual machine
    (012) ldxb
    4*([14]&0xf)
    (013) ldh
    [x + 14]
    Steven McCanne and Van Jacobson, 1993
    [...]
    
slide 11:
    Enhanced BPF
    aka eBPF or just "BPF"
    10 x 64-bit registers
    maps (hashes)
    actions
    Alexei Starovoitov, 2014+
    
slide 12:
    BPF for Tracing, Internals
    Observability Program
    BPF
    bytecode
    BPF
    program
    event config
    output
    per-event
    data
    statistics
    Kernel
    load
    verifier
    static tracing
    tracepoints
    attach
    dynamic tracing
    BPF
    kprobes
    uprobes
    async
    copy
    sampling, PMCs
    maps
    perf_events
    Enhanced BPF is also now used for SDNs, DDOS mitigation, intrusion detection, container security, …
    
slide 13:
    Dynamic Tracing
    
slide 14:
    1999: Kerninst
    http://www.paradyn.org/html/kerninst.html
    
slide 15:
    Event Tracing Efficiency
    E.g., tracing TCP retransmits
    Kernel
    Old way: packet capture
    tcpdump
    Analyzer
    1. read
    2. dump
    buffer
    1. read
    2. process
    3. print
    file system
    send
    receive
    disks
    New way: dynamic tracing
    Tracer
    1. configure
    2. read
    tcp_retransmit_skb()
    
slide 16:
    Linux Events & BPF Support
    BPF output
    Linux 4.4
    Linux 4.7
    BPF stacks
    Linux 4.6
    Linux 4.3
    Linux 4.1
    (version
    BPF
    support
    arrived)
    Linux 4.9
    Linux 4.9
    
slide 17:
    A Linux Tracing Timeline
    1990’s: Static tracers, prototype dynamic tracers
    2000: LTT + DProbes (dynamic tracing; not integrated)
    2004: kprobes (2.6.9)
    2005: DTrace (not Linux), SystemTap (out-of-tree)
    2008: ftrace (2.6.27)
    2009: perf_events (2.6.31)
    2009: tracepoints (2.6.32)
    2010-2017: ftrace & perf_events enhancements
    2012: uprobes (3.5)
    2014-2017: enhanced BPF patches: supporting tracing events
    2016-2017: ftrace hist triggers
    also: LTTng, ktap, sysdig, ...
    
slide 18:
    BCC
    Introducing BPF Complier Collection: user-level
    front-end
    
slide 19:
    bcc
    • BPF Compiler Collection
    Tracing layers:
    – https://github.com/iovisor/bcc
    – Lead developer: Brenden Blanco
    bcc tool
    • Includes tracing tools
    • Provides BPF front-ends:
    Python
    Lua
    C++
    C helper libraries
    golang (gobpf)
    bcc tool
    bcc
    Python
    user
    kernel
    lua
    front-ends
    Kernel
    Events
    BPF
    
slide 20:
    Raw BPF
    samples/bpf/sock_example.c
    87 lines truncated
    
slide 21:
    C/BPF
    samples/bpf/tracex1_kern.c
    58 lines truncated
    
slide 22:
    bcc/BPF (C & Python)
    bcc examples/tracing/bitehist.py
    enBre program
    
slide 23:
    bpftrace
    hHps://github.com/ajor/bpJrace
    enBre program
    
slide 24:
    The Tracing Landscape, Sep 2017
    Ease of use
    (less brutal)
    (my opinion)
    dtrace4L.
    sysdig
    (many)
    perf
    LTTng
    recent changes
    (alpha)
    (brutal)
    ktap
    (hist
    trigge
    rs)
    ftrace
    (mature)
    bpftrace
    ply/BPF
    stap
    bcc/BPF
    C/BPF
    Stage of
    Development
    Raw BPF
    Scope & Capability
    
slide 25:
    BCC/BPF CLI Tools
    Performance Analysis
    
slide 26:
    Pre-BPF: Linux Perf Analysis in 60s
    1. uptime
    2. dmesg -T | tail
    3. vmstat 1
    4. mpstat -P ALL 1
    5. pidstat 1
    6. iostat -xz 1
    7. free -m
    8. sar -n DEV 1
    9. sar -n TCP,ETCP 1
    10. top
    hHp://techblog.ne/lix.com/2015/11/linux-performance-analysis-in-60s.html
    
slide 27:
    bcc Installation
    • https://github.com/iovisor/bcc/blob/master/INSTALL.md
    • eg, Ubuntu Xenial:
    # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" |\
    sudo tee /etc/apt/sources.list.d/iovisor.list
    # sudo apt-get update
    # sudo apt-get install bcc-tools
    – Also available as an Ubuntu snap
    – Ubuntu 16.04 is good, 16.10 better: more tools work
    • Installs many tools
    – In /usr/share/bcc/tools, and …/tools/old for older kernels
    
slide 28:
    bcc General Performance Checklist
    execsnoop
    opensnoop
    ext4slower (…)
    biolatency
    biosnoop
    cachestat
    tcpconnect
    tcpaccept
    9. tcpretrans
    10. gethostlatency
    11. runqlat
    12. profile
    
slide 29:
    Discover short-lived process issues using execsnoop
    # execsnoop -t
    TIME(s) PCOMM
    dirname
    run
    run
    run
    run
    bash
    svstat
    perl
    grep
    sed
    cut
    xargs
    xargs
    xargs
    xargs
    echo
    [...]
    PID
    PPID
    RET ARGS
    0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh
    0 ./run
    -2 /command/bash
    -2 /usr/local/bin/bash
    -2 /usr/local/sbin/bash
    0 /bin/bash
    0 /command/svstat /service/nflx-httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;print $1||0;
    0 /bin/ps --ppid 1 -o pid,cmd,args
    0 /bin/grep org.apache.catalina
    0 /bin/sed s/^ *//;
    0 /usr/bin/cut -d -f 1
    0 /usr/bin/xargs
    -2 /command/echo
    -2 /usr/local/bin/echo
    -2 /usr/local/sbin/echo
    0 /bin/echo
    Efficient: only traces exec()
    
slide 30:
    Discover short-lived process issues using execsnoop
    # execsnoop -t
    TIME(s) PCOMM
    dirname
    run
    run
    run
    run
    bash
    svstat
    perl
    grep
    sed
    cut
    xargs
    xargs
    xargs
    xargs
    echo
    [...]
    PID
    PPID
    RET ARGS
    0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh
    0 ./run
    -2 /command/bash
    -2 /usr/local/bin/bash
    -2 /usr/local/sbin/bash
    0 /bin/bash
    0 /command/svstat /service/nflx-httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;print $1||0;
    0 /bin/ps --ppid 1 -o pid,cmd,args
    0 /bin/grep org.apache.catalina
    0 /bin/sed s/^ *//;
    0 /usr/bin/cut -d -f 1
    0 /usr/bin/xargs
    -2 /command/echo
    -2 /usr/local/bin/echo
    -2 /usr/local/sbin/echo
    0 /bin/echo
    Efficient: only traces exec()
    
slide 31:
    Exonerate or confirm storage latency outliers with ext4slower
    # /usr/share/bcc/tools/ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    17:31:42 postdrop
    15523 S 0
    17:31:42 cleanup
    15524 S 0
    17:32:09 titus-log-ship 19735 S 0
    17:35:37 dhclient
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:45 postdrop
    16187 S 0
    17:35:45 cleanup
    16188 S 0
    […]
    LAT(ms) FILENAME
    2.32 5630D406E4
    1.89 57BB7406EC
    1.94 slurper_checkpoint.db
    3.32 dhclient.eth0.leases
    26.62 system.journal
    1.56 system.journal
    1.73 system.journal
    2.41 C0369406E4
    6.52 C1B90406EC
    Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency
    Also: btrfsslower, xfsslower, zfsslower
    
slide 32:
    Exonerate or confirm storage latency outliers with ext4slower
    # /usr/share/bcc/tools/ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    17:31:42 postdrop
    15523 S 0
    17:31:42 cleanup
    15524 S 0
    17:32:09 titus-log-ship 19735 S 0
    17:35:37 dhclient
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:45 postdrop
    16187 S 0
    17:35:45 cleanup
    16188 S 0
    […]
    LAT(ms) FILENAME
    2.32 5630D406E4
    1.89 57BB7406EC
    1.94 slurper_checkpoint.db
    3.32 dhclient.eth0.leases
    26.62 system.journal
    1.56 system.journal
    1.73 system.journal
    2.41 C0369406E4
    6.52 C1B90406EC
    Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency
    Also: btrfsslower, xfsslower, zfsslower
    
slide 33:
    Identify multimodal disk I/O latency and outliers with biolatency
    # biolatency -mT 10
    Tracing block device I/O... Hit Ctrl-C to end.
    19:19:04
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    19:19:14
    msecs
    0 ->gt; 1
    2 ->gt; 3
    […]
    The "count" column is
    summarized in-kernel
    : count
    : 238
    : 424
    : 834
    : 506
    : 986
    : 97
    : 7
    : 27
    distribution
    |*********
    |*****************
    |*********************************
    |********************
    |****************************************|
    |***
    : count
    : 427
    : 424
    distribution
    |*******************
    |******************
    Average latency (iostat/sar) may not be represen[[ve with mul[ple modes or outliers
    
slide 34:
    Identify multimodal disk I/O latency and outliers with biolatency
    # biolatency -mT 10
    Tracing block device I/O... Hit Ctrl-C to end.
    19:19:04
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    19:19:14
    msecs
    0 ->gt; 1
    2 ->gt; 3
    […]
    The "count" column is
    summarized in-kernel
    : count
    : 238
    : 424
    : 834
    : 506
    : 986
    : 97
    : 7
    : 27
    distribution
    |*********
    |*****************
    |*********************************
    |********************
    |****************************************|
    |***
    : count
    : 427
    : 424
    distribution
    |*******************
    |******************
    Average latency (iostat/sar) may not be represen[[ve with mul[ple modes or outliers
    
slide 35:
    Efficiently trace TCP sessions with PID and bytes using tcplife
    # /usr/share/bcc/tools/tcplife
    PID
    COMM
    LADDR
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    12030 upload-mes 127.0.0.1
    2509 java
    12030 upload-mes 127.0.0.1
    3964 mesos-slav 127.0.0.1
    12021 upload-sys 127.0.0.1
    2509 java
    2235 dockerd
    2235 dockerd
    [...]
    LPORT RADDR
    8078 100.82.130.159
    8078 100.82.78.215
    60778 100.82.207.252
    38884 100.82.208.178
    4243 127.0.0.1
    42166 127.0.0.1
    34020 127.0.0.1
    8078 127.0.0.1
    21196 127.0.0.1
    7101 127.0.0.1
    34022 127.0.0.1
    8078 127.0.0.1
    13730 100.82.136.233
    34314 100.82.64.53
    RPORT TX_KB RX_KB MS
    0 5.44
    0 135.32
    13 15126.87
    0 15568.25
    0 0.61
    0 0.67
    0 3.38
    11 3.41
    0 12.61
    0 12.64
    0 15.28
    372 15.31
    4 18.50
    8 56.73
    Dynamic tracing of TCP set state only; does not trace send/receive
    Also see: tcpconnect, tcpaccept, tcpretrans
    
slide 36:
    Efficiently trace TCP sessions with PID and bytes using tcplife
    # /usr/share/bcc/tools/tcplife
    PID
    COMM
    LADDR
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    12030 upload-mes 127.0.0.1
    2509 java
    12030 upload-mes 127.0.0.1
    3964 mesos-slav 127.0.0.1
    12021 upload-sys 127.0.0.1
    2509 java
    2235 dockerd
    2235 dockerd
    [...]
    LPORT RADDR
    8078 100.82.130.159
    8078 100.82.78.215
    60778 100.82.207.252
    38884 100.82.208.178
    4243 127.0.0.1
    42166 127.0.0.1
    34020 127.0.0.1
    8078 127.0.0.1
    21196 127.0.0.1
    7101 127.0.0.1
    34022 127.0.0.1
    8078 127.0.0.1
    13730 100.82.136.233
    34314 100.82.64.53
    RPORT TX_KB RX_KB MS
    0 5.44
    0 135.32
    13 15126.87
    0 15568.25
    0 0.61
    0 0.67
    0 3.38
    11 3.41
    0 12.61
    0 12.64
    0 15.28
    372 15.31
    4 18.50
    8 56.73
    Dynamic tracing of TCP set state only; does not trace send/receive
    Also see: tcpconnect, tcpaccept, tcpretrans
    
slide 37:
    Identify DNS latency issues system wide with gethostlatency
    # /usr/share/bcc/tools/gethostlatency
    TIME
    PID
    COMM
    18:56:36 5055
    mesos-slave
    18:56:40 5590
    java
    18:56:51 5055
    mesos-slave
    18:56:53 30166 ncat
    18:56:56 6661
    java
    18:56:59 5589
    java
    18:57:03 5370
    java
    18:57:03 30259 sudo
    18:57:06 5055
    mesos-slave
    18:57:10 5590
    java
    18:57:21 5055
    mesos-slave
    18:57:29 5589
    java
    18:57:36 5055
    mesos-slave
    18:57:40 5590
    java
    18:57:51 5055
    mesos-slave
    […]
    LATms HOST
    0.01 100.82.166.217
    3.53 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    0.21 localhost
    2.19 atlas-alert-….prod.netflix.net
    1.50 ec2-…-207.compute-1.amazonaws.com
    0.04 localhost
    0.07 titusagent-mainvpc-m…3465
    0.01 100.82.166.217
    3.10 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    52.36 ec2-…-207.compute-1.amazonaws.com
    0.01 100.82.166.217
    1.83 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.
    
slide 38:
    Identify DNS latency issues system wide with gethostlatency
    # /usr/share/bcc/tools/gethostlatency
    TIME
    PID
    COMM
    18:56:36 5055
    mesos-slave
    18:56:40 5590
    java
    18:56:51 5055
    mesos-slave
    18:56:53 30166 ncat
    18:56:56 6661
    java
    18:56:59 5589
    java
    18:57:03 5370
    java
    18:57:03 30259 sudo
    18:57:06 5055
    mesos-slave
    18:57:10 5590
    java
    18:57:21 5055
    mesos-slave
    18:57:29 5589
    java
    18:57:36 5055
    mesos-slave
    18:57:40 5590
    java
    18:57:51 5055
    mesos-slave
    […]
    LATms HOST
    0.01 100.82.166.217
    3.53 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    0.21 localhost
    2.19 atlas-alert-….prod.netflix.net
    1.50 ec2-…-207.compute-1.amazonaws.com
    0.04 localhost
    0.07 titusagent-mainvpc-m…3465
    0.01 100.82.166.217
    3.10 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    52.36 ec2-…-207.compute-1.amazonaws.com
    0.01 100.82.166.217
    1.83 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.
    
slide 39:
    Examine CPU scheduler latency as a histogram with runqlat
    # /usr/share/bcc/tools/runqlat 10
    Tracing run queue latency... Hit Ctrl-C to end.
    usecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    256 ->gt; 511
    512 ->gt; 1023
    1024 ->gt; 2047
    2048 ->gt; 4095
    4096 ->gt; 8191
    : count
    : 2810
    : 5248
    : 12369
    : 71312
    : 55705
    : 11775
    : 6230
    : 2758
    : 549
    : 46
    : 11
    : 4
    : 5
    distribution
    |**
    |******
    |****************************************|
    |*******************************
    |******
    |***
    […]
    As efficient as possible: scheduler calls can become frequent
    
slide 40:
    Examine CPU scheduler latency as a histogram with runqlat
    # /usr/share/bcc/tools/runqlat 10
    Tracing run queue latency... Hit Ctrl-C to end.
    usecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    256 ->gt; 511
    512 ->gt; 1023
    1024 ->gt; 2047
    2048 ->gt; 4095
    4096 ->gt; 8191
    : count
    : 2810
    : 5248
    : 12369
    : 71312
    : 55705
    : 11775
    : 6230
    : 2758
    : 549
    : 46
    : 11
    : 4
    : 5
    distribution
    |**
    |******
    |****************************************|
    |*******************************
    |******
    |***
    […]
    As efficient as possible: scheduler calls can become frequent
    
slide 41:
    Construct programmatic one-liners with trace
    e.g. reads over 20000 bytes:
    # trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    TIME
    PID
    COMM
    FUNC
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    # trace -h
    [...]
    trace –K blk_account_io_start
    Trace this kernel function, and print info with a kernel stack trace
    trace 'do_sys_open "%s", arg2'
    Trace the open syscall and print the filename being opened
    trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    Trace the read syscall and print a message for reads >gt;20000 bytes
    trace r::do_sys_return
    Trace the return from the open syscall
    trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
    Trace the open() call from libc only if the flags (arg2) argument is 42
    [...]
    argdist by Sasha Goldshtein
    
slide 42:
    Create in-kernel summaries with argdist
    e.g. histogram of tcp_cleanup_rbuf() copied:
    # argdist -H 'p::tcp_cleanup_rbuf(struct sock *sk, int copied):int:copied'
    [15:34:45]
    copied
    : count
    distribution
    0 ->gt; 1
    : 15088
    |**********************************
    2 ->gt; 3
    : 0
    4 ->gt; 7
    : 0
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 4786
    |***********
    128 ->gt; 255
    : 1
    256 ->gt; 511
    : 1
    512 ->gt; 1023
    : 4
    1024 ->gt; 2047
    : 11
    2048 ->gt; 4095
    : 5
    4096 ->gt; 8191
    : 27
    8192 ->gt; 16383
    : 105
    16384 ->gt; 32767
    : 0
    argdist by Sasha Goldshtein
    
slide 43:
    BCC/BPF
    Visualizations
    Coming to a GUI near you
    
slide 44:
    BPF metrics and analysis can be automated in GUIs
    Eg, Netflix Vector (self-service UI):
    Flame Graphs
    Heat Maps
    Tracing Reports
    Should be open sourced; you may also build/buy your own
    
slide 45:
    Latency heatmaps show histograms over time
    
slide 46:
    Optimize CPU flame graphs with BPF: count stacks in-kernel
    
slide 47:
    Generic thread state digram
    What about Off-CPU?
    
slide 48:
    Efficient Off-CPU flame graphs via scheduler tracing and BPF
    CPU
    Off-CPU
    Solve
    everything?
    
slide 49:
    Off-CPU Time (zoomed): gzip(1)
    Off-CPU doesn't always make sense:
    what is gzip blocked on?
    
slide 50:
    Wakeup time flame graphs show waker thread stacks
    
slide 51:
    Wakeup Time (zoomed): gzip(1)
    gzip(1) is blocked on tar(1)!
    tar cf - * | gzip >gt; out.tar.gz
    Can't we associate off-CPU with wakeup stacks?
    
slide 52:
    Off-wake flame graphs: BPF can merge blocking plus waker stacks
    in-kernel
    Waker task
    Waker stack
    Stack
    Direc[on
    Wokeup
    Blocked stack
    Blocked task
    
slide 53:
    Another
    example
    
slide 54:
    Chain graphs: merge all wakeup stacks
    
slide 55:
    Future Work
    BPF
    
slide 56:
    BCC Improvements
    • Challenges
    Initialize all variables
    BPF_PERF_OUTPUT()
    Verifier errors
    Still explicit bpf_probe_read()s.
    It's getting better (thanks):
    • High-Level Languages
    – One-liners and scripts
    – Can use libbcc
    tcpconnlat.py
    
slide 57:
    ply
    • A new BPF-based language and tracer for Linux
    – Created by Tobias Waldekranz
    – https://github.com/iovisor/ply https://wkz.github.io/ply/
    – Promising, was in development
    # ply -c 'kprobe:do_sys_open { printf("opened: %s\n", mem(arg(1), "128s")); }'
    1 probe active
    opened: /sys/kernel/debug/tracing/events/enable
    opened: /etc/ld.so.cache
    opened: /lib/x86_64-linux-gnu/libselinux.so.1
    opened: /lib/x86_64-linux-gnu/libc.so.6
    opened: /proc/filesystems
    opened: /usr/lib/locale/locale-archive
    opened: .
    [...]
    
slide 58:
    ply programs are concise, such as measuring read latency
    # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); }
    kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]);
    @start[tid()] = nil; }'
    2 probes active
    ^Cde-activating probes
    [...]
    @ns:
    [ 512,
    1k)
    [ 1k,
    2k)
    [ 2k,
    4k)
    [ 4k,
    8k)
    [ 8k, 16k)
    [ 16k, 32k)
    [ 32k, 64k)
    [ 64k, 128k)
    [128k, 256k)
    [256k, 512k)
    [512k,
    1M)
    [...]
    3 |########
    7 |###################
    12 |################################|
    3 |########
    2 |#####
    0 |
    0 |
    3 |########
    1 |###
    1 |###
    2 |#####
    
slide 59:
    bpftrace
    • Another new BPF-based language and tracer for Linux
    – Created by Alastair Robertson
    – https://github.com/ajor/bpftrace
    – In active development
    # bpftrace -e 'kprobe:sys_open { printf("opened: %s\n", str(arg0)); }'
    Attaching 1 probe...
    opened: /sys/devices/system/cpu/online
    opened: /proc/1956/stat
    opened: /proc/1241/stat
    opened: /proc/net/dev
    opened: /proc/net/if_inet6
    opened: /sys/class/net/eth0/device/vendor
    opened: /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms
    [...]
    
slide 60:
    bpftrace programs are concise, such as measuring read latency
    # bpftrace -e 'kprobe:SyS_read { @start[tid] = nsecs; } kretprobe:SyS_read /@start[tid]/
    { @ns = quantize(nsecs - @start[tid]); @start[tid] = delete(); }'
    Attaching 2 probes...
    @ns:
    [0, 1]
    [2, 4)
    [4, 8)
    [8, 16)
    [16, 32)
    [32, 64)
    [64, 128)
    [128, 256)
    [256, 512)
    [512, 1k)
    [1k, 2k)
    [2k, 4k)
    [4k, 8k)
    [8k, 16k)
    [16k, 32k)
    [32k, 64k)
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    6 |@@@@@
    20 |@@@@@@@@@@@@@@@@@@@
    4 |@@@
    14 |@@@@@@@@@@@@@
    53 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    2 |@
    
slide 61:
    New Tooling/Metrics
    
slide 62:
    New Visualizations
    
slide 63:
    Case Studies
    Use it
    Solve something
    Write about it
    Talk about it
    • Recent posts:
    – https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughputand-low-latency/
    – https://josefbacik.github.io/kernel/scheduler/bcc/bpf/2017/08/03/sched-time.html
    
slide 64:
    Advanced Analysis
    • Find/draw a functional diagram
    • Apply performance methods
    http://www.brendangregg.com/methodology.html
    Workload Characterization
    USE Method
    Latency Analysis
    Start with the Q's,
    then find the A's
    • Use multi-tools:
    – funccount, trace, argdist, stackcount
    e.g., storage I/O subsystem
    
slide 65:
    Take aways
    1. Understand Linux tracing and enhanced BPF
    2. How to use eBPF tools
    Upgrade to Linux 4.9+!
    3. Areas of future development
    Please contribute:
    - hHps://github.com/
    iovisor/bcc
    - hHps://github.com/
    iovisor/ply
    BPF Tracing in Linux
    • 3.19: sockets
    • 3.19: maps
    • 4.1: kprobes
    • 4.3: uprobes
    • 4.4: BPF output
    • 4.6: stacks
    • 4.7: tracepoints
    • 4.9: profiling
    • 4.9: PMCs
    
slide 66:
    Links & References
    iovisor bcc:
    - https://github.com/iovisor/bcc https://github.com/iovisor/bcc/tree/master/docs
    - http://www.brendangregg.com/blog/ (search for "bcc")
    - http://www.brendangregg.com/ebpf.html#bcc
    - http://blogs.microsoft.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/
    - On designing tracing tools: https://www.youtube.com/watch?v=uibLwoVKjec
    bcc tutorial:
    - https://github.com/iovisor/bcc/blob/master/INSTALL.md
    - …/docs/tutorial.md …/docs/tutorial_bcc_python_developer.md …/docs/reference_guide.md
    - .../CONTRIBUTING-SCRIPTS.md
    ply: https://github.com/iovisor/ply
    bpftrace: https://github.com/ajor/bpftrace
    BPF:
    - https://www.kernel.org/doc/Documentation/networking/filter.txt
    - https://github.com/iovisor/bpf-docs
    - https://suchakra.wordpress.com/tag/bpf/
    Flame Graphs:
    - http://www.brendangregg.com/flamegraphs.html
    - http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
    - http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
    Netflix Tech Blog on Vector:
    - http://techblog.netflix.com/2015/04/introducing-vector-netflixs-on-host.html
    Linux Performance: http://www.brendangregg.com/linuxperf.html
    
slide 67:
    BPF @ Open Source Summit
    • Making the Kernel's Networking Data Path Programmable with
    BPF and XDP
    – Daniel Borkmann, Tuesday, 11:55am @ Georgia I/II
    • Performance Analysis Superpowers with Linux BPF
    – Brendan Gregg, this talk
    • Cilium - Container Security and Networking using BPF and XDP
    – Thomas Graf, Wednesday, 2:50pm @ Diamond Ballroom 6
    
slide 68:
    Thank You
    Questions?
    iovisor bcc: https://github.com/iovisor/bcc
    http://www.brendangregg.com
    http://slideshare.net/brendangregg
    bgregg@netflix.com
    @brendangregg
    Thanks to Alexei Starovoitov (Facebook), Brenden Blanco (PLUMgrid/VMware),
    Sasha Goldshtein (Sela), Teng Qin (Facebook), Yonghong Song (Facebook),
    Daniel Borkmann (Cisco/Covalent), Wang Nan (Huawei), Vicent Martí (GitHub),
    Paul Chaignon (Orange), and other BPF and bcc contributors!