Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

SCaLE15x: Linux 4.x Tracing: Performance Analysis with bcc/BPF

Video: https://www.youtube.com/watch?v=w8nFRoFJ6EQ

Talk about bcc/eBPF for Southern California Linux Expo 15x (2017) by Brendan Gregg.

Description: "BPF (Berkeley Packet Filter) has been enhanced in the Linux 4.x series and now powers a large collection of performance analysis and observability tools ready for you to use, included in the bcc (BPF Complier Collection) open source project. BPF nowadays can do system tracing, software defined networks, and kernel fast path: much more than just filtering packets! This talk will focus on the bcc/BPF tools for performance analysis, which make use of other built in Linux capabilities: dynamic tracing (kprobes and uprobes) and static tracing (tracepoints and USDT). There are now bcc tools for measuring latency distributions for file system I/O and run queue latency, printing details of storage device I/O and TCP retransmits, investigating blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. Tracing superpowers have finally arrived, built in to Linux."

next
prev
1/70
next
prev
2/70
next
prev
3/70
next
prev
4/70
next
prev
5/70
next
prev
6/70
next
prev
7/70
next
prev
8/70
next
prev
9/70
next
prev
10/70
next
prev
11/70
next
prev
12/70
next
prev
13/70
next
prev
14/70
next
prev
15/70
next
prev
16/70
next
prev
17/70
next
prev
18/70
next
prev
19/70
next
prev
20/70
next
prev
21/70
next
prev
22/70
next
prev
23/70
next
prev
24/70
next
prev
25/70
next
prev
26/70
next
prev
27/70
next
prev
28/70
next
prev
29/70
next
prev
30/70
next
prev
31/70
next
prev
32/70
next
prev
33/70
next
prev
34/70
next
prev
35/70
next
prev
36/70
next
prev
37/70
next
prev
38/70
next
prev
39/70
next
prev
40/70
next
prev
41/70
next
prev
42/70
next
prev
43/70
next
prev
44/70
next
prev
45/70
next
prev
46/70
next
prev
47/70
next
prev
48/70
next
prev
49/70
next
prev
50/70
next
prev
51/70
next
prev
52/70
next
prev
53/70
next
prev
54/70
next
prev
55/70
next
prev
56/70
next
prev
57/70
next
prev
58/70
next
prev
59/70
next
prev
60/70
next
prev
61/70
next
prev
62/70
next
prev
63/70
next
prev
64/70
next
prev
65/70
next
prev
66/70
next
prev
67/70
next
prev
68/70
next
prev
69/70
next
prev
70/70

PDF: SCALE2017_perf_analysis_eBPF.pdf

Keywords (from pdftotext):

slide 1:
    Linux 4.x Tracing:
    Performance Analysis with bcc/BPF
    Brendan Gregg
    Senior Performance Architect
    Mar 2017
    
slide 2:
    Linux tracing
    in the last 3 years…
    
slide 3:
    How do we
    use these
    superpowers?
    
slide 4:
    Take aways
    1. Understanding the value of Linux tracing superpowers
    2. Upgrade to Linux 4.4+ (4.9 is beDer)
    3. Ask for eBPF support in your perf analysis/monitoring tools
    
slide 5:
slide 6:
    Ye Olde BPF
    Berkeley Packet Filter
    # tcpdump host 127.0.0.1 and port 22 -d
    OpVmizes packet filter
    (000) ldh
    [12]
    performance
    (001) jeq
    #0x800
    jt 2
    jf 18
    (002) ld
    [26]
    (003) jeq
    #0x7f000001
    jt 6
    jf 4
    (004) ld
    [30]
    2 x 32-bit registers
    (005) jeq
    #0x7f000001
    jt 6
    jf 18
    (006) ldb
    [23]
    & scratch memory
    (007) jeq
    #0x84
    jt 10
    jf 8
    (008) jeq
    #0x6
    jt 10
    jf 9
    (009) jeq
    #0x11
    jt 10
    jf 18
    User-defined bytecode
    (010) ldh
    [20]
    executed by an in-kernel
    (011) jset
    #0x1fff
    jt 18
    jf 12
    sandboxed virtual machine
    (012) ldxb
    4*([14]&0xf)
    (013) ldh
    [x + 14]
    Steven McCanne and Van Jacobson, 1993
    [...]
    
slide 7:
    Enhanced BPF
    aka eBPF or just "BPF"
    10 x 64-bit registers
    maps (hashes)
    acIons
    Alexei Starovoitov, 2014+
    
slide 8:
    Enhanced BPF Use Cases
    User-Defined BPF Programs
    Kernel
    SDN ConfiguraIon
    RunIme
    Event Targets
    DDoS MiIgaIon
    verifier
    sockets
    kprobes
    Intrusion DetecIon
    BPF
    Container Security
    Observability
    uprobes
    tracepoints
    BPF acVons
    perf_events
    
slide 9:
    Enhanced BPF
    is in Linux
    
slide 10:
    Demo
    
slide 11:
    New Observability Tools
    • Efficient, producVon safe, useful metrics:
    # biolatency -mT 1
    Tracing block device I/O... Hit Ctrl-C to end.
    06:20:16
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    […]
    : count
    : 36
    : 1
    : 3
    : 17
    : 33
    : 7
    : 6
    distribution
    |**************************************|
    |***
    |*****************
    |**********************************
    |*******
    |******
    These CLI tools may be useful even if you never use them, as examples of what to implement in GUIs
    
slide 12:
    New VisualizaVons and GUIs
    Eg, Neclix self-service UI:
    Flame Graphs
    Tracing Reports
    Should be open sourced; you may also build/buy your own
    
slide 13:
    Introducing enhanced BPF
    BPF TRACING
    
slide 14:
    A Linux Tracing Timeline
    1990’s: StaVc tracers, prototype dynamic tracers
    2000: LTT + DProbes (dynamic tracing; not integrated)
    2004: kprobes (2.6.9)
    2005: DTrace (not Linux), SystemTap (out-of-tree)
    2008: irace (2.6.27)
    2009: perf_events (2.6.31)
    2009: tracepoints (2.6.32)
    2010-2016: irace & perf_events enhancements
    2012: uprobes (3.5)
    2014-2017: enhanced BPF patches: supporIng tracing events
    2016-2017: irace hist triggers
    also: LTTng, ktap, sysdig, ...
    
slide 15:
    Linux Events & BPF Support
    BPF output
    Linux 4.4
    Linux 4.7
    BPF stacks
    Linux 4.6
    Linux 4.3
    Linux 4.1
    (version
    BPF
    support
    arrived)
    Linux 4.9
    Linux 4.9
    
slide 16:
    Event Tracing Efficiency
    E.g., tracing TCP retransmits
    Kernel
    Old way: packet capture
    tcpdump
    Analyzer
    1. read
    2. dump
    buffer
    1. read
    2. process
    3. print
    file system
    send
    receive
    disks
    New way: dynamic tracing
    Tracer
    1. configure
    2. read
    tcp_retransmit_skb()
    
slide 17:
    BPF Tracing Internals
    Observability Program
    BPF
    bytecode
    BPF
    program
    event config
    output
    per-event
    data
    staVsVcs
    Kernel
    load
    verifier
    staVc tracing
    tracepoints
    aDach
    dynamic tracing
    BPF
    kprobes
    uprobes
    async
    copy
    sampling, PMCs
    maps
    perf_events
    
slide 18:
    Introducing bcc
    BPF COMPILER COLLECTION
    
slide 19:
    bcc
    • BPF Compiler CollecVon
    Tracing layers:
    – hDps://github.com/iovisor/bcc
    – Lead developer: Brenden Blanco
    bcc tool
    • Includes tracing tools
    • Provides BPF front-ends:
    Python
    Lua
    C++
    C helper libraries
    golang (gobpf)
    bcc tool
    bcc
    Python
    user
    kernel
    lua
    front-ends
    Kernel
    Events
    BPF
    
slide 20:
    Raw BPF
    samples/bpf/sock_example.c
    87 lines truncated
    
slide 21:
    C/BPF
    samples/bpf/tracex1_kern.c
    58 lines truncated
    
slide 22:
    bcc/BPF (C & Python)
    bcc examples/tracing/bitehist.py
    enIre program
    
slide 23:
    ply/BPF
    hDps://github.com/iovisor/ply/blob/master/README.md
    enIre program
    
slide 24:
    The Tracing Landscape, Mar 2017
    (less brutal)
    (my opinion)
    dtrace4L.
    Ease of use
    sysdig
    (alpha)
    (brutal)
    ktap
    (many)
    perf
    LTTng
    recent changes
    ply/BPF
    (h i s t t
    rigge
    rs)
    irace
    (mature)
    stap
    bcc/BPF
    C/BPF
    Stage of
    Development
    Raw BPF
    Scope & Capability
    
slide 25:
    For end-users
    PERFORMANCE ANALYSIS WITH
    BCC/BPF
    
slide 26:
    Pre-BPF: Linux Perf Analysis in 60s
    1. uptime
    2. dmesg -T | tail
    3. vmstat 1
    4. mpstat -P ALL 1
    5. pidstat 1
    6. iostat -xz 1
    7. free -m
    8. sar -n DEV 1
    9. sar -n TCP,ETCP 1
    10. top
    hDp://techblog.neclix.com/2015/11/linux-performance-analysis-in-60s.html
    
slide 27:
    bcc InstallaVon
    • hDps://github.com/iovisor/bcc/blob/master/INSTALL.md
    • eg, Ubuntu Xenial:
    # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" |\
    sudo tee /etc/apt/sources.list.d/iovisor.list
    # sudo apt-get update
    # sudo apt-get install bcc-tools
    – Also available as an Ubuntu snap
    – Ubuntu 16.04 is good, 16.10 beDer: more tools work
    • Installs many tools
    – In /usr/share/bcc/tools, and …/tools/old for older kernels
    
slide 28:
    bcc tools
    
slide 29:
    bcc General Performance Checklist
    execsnoop
    opensnoop
    ext4slower (…)
    biolatency
    biosnoop
    cachestat
    tcpconnect
    tcpaccept
    9. tcpretrans
    10. gethostlatency
    11. runqlat
    12. profile
    
slide 30:
    1. execsnoop
    • Trace new processes and debug short-lived process issues:
    # execsnoop
    PCOMM
    bash
    preconv
    man
    man
    man
    nroff
    nroff
    groff
    groff
    […]
    PID
    RET ARGS
    0 /usr/bin/man ls
    0 /usr/bin/preconv -e UTF-8
    0 /usr/bin/tbl
    0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
    0 /usr/bin/pager -s
    0 /usr/bin/locale charmap
    0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n ...
    0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=16...
    0 /usr/bin/grotty
    Efficient: only traces exec()
    
slide 31:
    2. opensnoop
    • Find config, log, and data files, and inefficient file usage:
    # opensnoop
    PID
    COMM
    27159 catalina.sh
    redis-server
    redis-server
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    30668 sshd
    […]
    FD ERR PATH
    0 /apps/tomcat8/bin/setclasspath.sh
    0 /proc/4057/stat
    0 /proc/2360/stat
    0 /proc/sys/kernel/ngroups_max
    0 /etc/group
    0 /root/.ssh/authorized_keys
    2 /var/run/nologin
    2 /etc/nologin
    0 /etc/login.defs
    0 /etc/passwd
    Like "strace -feopen", but system-wide and low overhead
    
slide 32:
    3. ext4slower
    • Trace slow FS I/O, to beDer idenVfy I/O issues and outliers:
    # ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    06:49:17 bash
    R 128
    06:49:17 cksum
    R 39552
    06:49:17 cksum
    R 96
    06:49:17 cksum
    R 96
    06:49:17 cksum
    R 10320
    06:49:17 cksum
    R 65536
    06:49:17 cksum
    R 55400
    06:49:17 cksum
    R 36792
    […]
    LAT(ms) FILENAME
    7.75 cksum
    1.34 [
    5.36 2to3-2.7
    14.94 2to3-3.4
    6.82 411toppm
    4.01 a2p
    8.77 ab
    16.34 aclocal-1.14
    More reliable and complete indicator than measuring disk I/O latency
    Also: btrfsslower, xfsslower, zfsslower
    
slide 33:
    4. biolatency
    • IdenVfy mulVmodal latency and outliers with a histogram:
    # biolatency -mT 1
    Tracing block device I/O... Hit Ctrl-C to end.
    06:20:16
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    […]
    : count
    : 36
    : 1
    : 3
    : 17
    : 33
    : 7
    : 6
    The "count" column is
    summarized in-kernel
    distribution
    |**************************************|
    |***
    |*****************
    |**********************************
    |*******
    |******
    Average latency (iostat/sar) may not be represenVVve with mulVple modes or outliers
    
slide 34:
    5. biosnoop
    • Dump disk I/O events for detailed analysis. tcpdump for disks:
    # biosnoop
    TIME(s)
    […]
    COMM
    supervise
    supervise
    supervise
    supervise
    supervise
    supervise
    supervise
    supervise
    xfsaild/md0
    PID
    DISK
    xvda1
    xvda1
    xvda1
    xvda1
    xvda1
    xvda1
    xvda1
    xvda1
    xvdc
    SECTOR
    BYTES
    LAT(ms)
    Can import this into a spreadsheet and do a scaDer plot of Vme vs latency, e.t.c.
    
slide 35:
    6. cachestat
    • Measure file system cache hit raVo staVsVcs:
    # cachestat
    HITS
    MISSES
    […]
    DIRTIES
    READ_HIT% WRITE_HIT%
    80.4%
    19.6%
    96.2%
    3.7%
    89.6%
    10.4%
    100.0%
    0.0%
    100.0%
    0.0%
    55.2%
    4.5%
    100.0%
    0.0%
    99.9%
    0.0%
    100.0%
    0.0%
    100.0%
    0.0%
    BUFFERS_MB
    CACHED_MB
    These stats should be added to /proc
    
slide 36:
    7. tcpconnect
    • Trace acVve ("outbound") TCP connecVons :
    # tcpconnect
    PID
    COMM
    IP SADDR
    DADDR
    DPORT
    25333 recordProgra 4 127.0.0.1
    25338 curl
    4 100.66.3.172
    25340 curl
    4 100.66.3.172
    25342 curl
    4 100.66.3.172
    25344 curl
    4 100.66.3.172
    25365 recordProgra 4 127.0.0.1
    26119 ssh
    6 ::1
    ::1
    25388 recordProgra 4 127.0.0.1
    25220 ssh
    6 fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22
    […]
    Efficient: dynamic tracing of TCP connect funcVons only; does not trace send/receive
    
slide 37:
    8. tcpaccept
    • Trace passive ("inbound") TCP connecVons :
    # tcpaccept
    PID
    COMM
    IP RADDR
    LADDR
    LPORT
    sshd
    4 11.16.213.254
    redis-server 4 127.0.0.1
    redis-server 4 127.0.0.1
    redis-server 4 127.0.0.1
    redis-server 4 127.0.0.1
    sshd
    6 ::1
    ::1
    redis-server 4 127.0.0.1
    redis-server 4 127.0.0.1
    sshd
    6 fe80::8a3:9dff:fed5:6b19 fe80::8a3:9dff:fed5:6b19 22
    […]
    Efficiently: no send/receive tracing
    
slide 38:
    9. tcpretrans
    • Trace TCP retransmits with kernel state (not on the wire):
    # tcpretrans
    TIME
    PID
    01:55:05 0
    01:55:05 0
    01:55:17 0
    […]
    IP LADDR:LPORT
    4 10.153.223.157:22
    4 10.153.223.157:22
    4 10.153.223.157:22
    T>gt; RADDR:RPORT
    R>gt; 69.53.245.40:34619
    R>gt; 69.53.245.40:34619
    R>gt; 69.53.245.40:22957
    STATE
    ESTABLISHED
    ESTABLISHED
    ESTABLISHED
    Efficiently: no send/receive tracing
    
slide 39:
    10. gethostlatency
    • Trace DNS latency system-wide (it's always DNS):
    # gethostlatency
    TIME
    PID
    COMM
    06:10:24 28011 wget
    06:10:28 28127 wget
    06:10:41 28404 wget
    06:10:48 28544 curl
    06:11:10 29054 curl
    06:11:16 29195 curl
    06:11:24 25313 wget
    06:11:25 29404 curl
    06:11:28 29475 curl
    […]
    LATms HOST
    90.00 www.iovisor.org
    0.00 www.iovisor.org
    9.00 www.netflix.com
    35.00 www.netflix.com.au
    31.00 www.plumgrid.com
    3.00 www.facebook.com
    3.00 www.usenix.org
    72.00 foo
    1.00 foo
    Instruments getaddrinfo(), gethostbyname(), e.t.c.
    
slide 40:
    11. runqlat
    • Examine CPU scheduler run queue latency as a histogram:
    # runqlat -m 5
    Tracing run queue latency... Hit Ctrl-C to end.
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    : count
    : 3818
    : 39
    : 39
    : 62
    : 2214
    : 226
    distribution
    |****************************************|
    |***********************
    |**
    […]
    As efficient as possible: scheduler calls can become frequent
    
slide 41:
    12. profile
    • An efficient profiler. Count unique stacks in kernel:
    # profile
    Sampling at 49 Hertz of all threads by user + kernel stack... Ctrl-C to end.
    […]
    ffffffff813d0af8 __clear_user
    ffffffff813d5277 iov_iter_zero
    ffffffff814ec5f2 read_iter_zero
    ffffffff8120be9d __vfs_read
    ffffffff8120c385 vfs_read
    ffffffff8120d786 sys_read
    ffffffff817cc076 entry_SYSCALL_64_fastpath
    00007fc5652ad9b0 read
    dd (25036)
    This stack was sampled 7 Vmes
    […]
    
slide 42:
    Advanced Analysis
    • Find/draw a funcVonal diagram
    • Apply performance methods
    hDp://www.brendangregg.com/methodology.html
    Workload CharacterizaVon
    Latency Analysis
    USE Method
    Performance Mantras
    • Start with the Q's,
    then find the A's
    • Use mulV-tools:
    – funccount, trace, argdist, stackcount
    e.g., storage I/O subsystem:
    
slide 43:
    Performance Mantras
    1. Don't do it
    2. Do it, but don't do it again
    3. Do it less
    4. Do it later
    5. Do it when they're not looking
    6. Do it concurrently
    7. Do it cheaper
    eliminate unnecessary work
    from Craig Hanson and Pat Crain, and the performance engineering community
    
slide 44:
    trace
    # trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    TIME
    PID
    COMM
    FUNC
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    # trace -h
    [...]
    trace –K blk_account_io_start
    Trace this kernel function, and print info with a kernel stack trace
    trace 'do_sys_open "%s", arg2'
    Trace the open syscall and print the filename being opened
    trace custom events
    trace -h
    lists example
    one-liners
    trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    Trace the read syscall and print a message for reads >gt;20000 bytes
    trace r::do_sys_return
    Trace the return from the open syscall
    trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
    Trace the open() call from libc only if the flags (arg2) argument is 42
    trace 't:block:block_rq_complete "sectors=%d", args->gt;nr_sector'
    Trace the block_rq_complete kernel tracepoint and print # of tx sectors
    [...]
    by Sasha Goldshtein
    
slide 45:
    argdist
    # argdist -H 'p::tcp_cleanup_rbuf(struct sock *sk, int copied):int:copied'
    [15:34:45]
    copied
    : count
    distribution
    0 ->gt; 1
    : 15088
    |**********************************
    2 ->gt; 3
    : 0
    4 ->gt; 7
    : 0
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 4786
    |***********
    128 ->gt; 255
    : 1
    256 ->gt; 511
    : 1
    512 ->gt; 1023
    : 4
    1024 ->gt; 2047
    : 11
    funcVon |
    2048 ->gt; 4095
    : 5
    4096 ->gt; 8191
    : 27
    argument |
    8192 ->gt; 16383
    : 105
    distribuVons
    16384 ->gt; 32767
    : 0
    by Sasha Goldshtein
    
slide 46:
    Coming to a GUI near you
    BCC/BPF VISUALIZATIONS
    
slide 47:
    Latency Heatmaps
    
slide 48:
    CPU + Off-CPU Flame Graphs
    • Can now be
    BPF opVmized
    hDp://www.brendangregg.com/flamegraphs.html
    
slide 49:
    Conquer Performance
    On-CPU + off-CPU
    means we can
    measure everything
    Except someVmes
    one off-CPU stack
    isn't enough…
    
slide 50:
    Off-Wake Flame
    Graphs
    • Shows blocking stack with
    waker stack
    – BeDer understand why blocked
    – Merged in-kernel using BPF
    – Include mulVple waker stacks ==
    chain graphs
    • We couldn't do this before
    
slide 51:
    Overview for tool developers
    HOW TO PROGRAM BCC/BPF
    
slide 52:
    bcc Tutorials
    1. hDps://github.com/iovisor/bcc/blob/master/INSTALL.md
    2. …/docs/tutorial.md
    3. …/docs/tutorial_bcc_python_developer.md
    4. …/docs/reference_guide.md
    5. .../CONTRIBUTING-SCRIPTS.md
    
slide 53:
    bitehist.py Output
    # ./bitehist.py
    Tracing... Hit Ctrl-C to end.
    kbytes
    : count
    0 ->gt; 1
    : 3
    2 ->gt; 3
    : 0
    4 ->gt; 7
    : 211
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 1
    128 ->gt; 255
    : 800
    distribution
    |**********
    |**************************************|
    
slide 54:
    bitehist.py Code
    bcc examples/tracing/bitehist.py
    
slide 55:
    bytehist.py Annotated
    Map
    C BPF Program
    Python Program
    Event
    "kprobe__" is a shortcut for BPF.aDach_kprobe()
    StaVsVcs
    bcc examples/tracing/bitehist.py
    
slide 56:
    Current ComplicaVons
    IniValize all variables
    Extra bpf_probe_read()s
    BPF_PERF_OUTPUT()
    Verifier errors
    
slide 57:
    Bonus Round
    PLY
    
slide 58:
    File opens
    # ply -c 'kprobe:do_sys_open { printf("opened: %s\n", mem(arg(1), "128s")); }'
    1 probe active
    opened: /sys/kernel/debug/tracing/events/enable
    opened: /etc/ld.so.cache
    opened: /lib/x86_64-linux-gnu/libselinux.so.1
    opened: /lib/x86_64-linux-gnu/libc.so.6
    opened: /lib/x86_64-linux-gnu/libpcre.so.3
    opened: /lib/x86_64-linux-gnu/libdl.so.2
    opened: /lib/x86_64-linux-gnu/libpthread.so.0
    opened: /proc/filesystems
    opened: /usr/lib/locale/locale-archive
    opened: .
    [...]
    
slide 59:
    Count vfs calls
    # ply -c 'kprobe:vfs_* { @[func()].count(); }'
    WRN kprobe_attach_pattern: 'vfs_caches_init_early' will not be probed
    WRN kprobe_attach_pattern: 'vfs_caches_init' will not be probed
    49 probes active
    ^Cde-activating probes
    vfs_fstat
    vfs_getattr
    vfs_getattr_nosec
    vfs_open
    vfs_read
    vfs_write
    [...]
    
slide 60:
    Read return size
    # ply -c 'kretprobe:SyS_read { @ret.quantize(retval()); }'
    1 probe active
    ^Cde-activating probes
    @ret:
    [ 16,
    [ 32,
    [ 64,
    [ 128,
    [ 256,
    [ 512,
    15]
    31]
    63]
    127]
    255]
    511]
    1k)
    7 ┤█████████▍
    24 ┤████████████████████████████████│
    5 ┤██████▋
    0 ┤
    1 ┤█▍
    1 ┤█▍
    3 ┤████
    3 ┤████
    2 ┤██▋
    1 ┤█▍
    11 ┤██████████████▋
    
slide 61:
    Read return size (ASCII)
    # ply -A -c 'kretprobe:SyS_read { @ret.quantize(retval()); }'
    1 probe active
    ^Cde-activating probes
    @ret:
    [ 16,
    [ 32,
    [ 64,
    [ 128,
    [ 256,
    [ 512,
    15]
    31]
    63]
    127]
    255]
    511]
    1k)
    7 |###################
    12 |################################|
    7 |###################
    0 |
    1 |###
    2 |#####
    7 |###################
    3 |########
    2 |#####
    1 |###
    11 |#############################
    
slide 62:
    Read latency
    # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); }
    kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]);
    @start[tid()] = nil; }'
    2 probes active
    ^Cde-activating probes
    [...]
    @ns:
    [ 512,
    1k)
    [ 1k,
    2k)
    [ 2k,
    4k)
    [ 4k,
    8k)
    [ 8k, 16k)
    [ 16k, 32k)
    [ 32k, 64k)
    [ 64k, 128k)
    [128k, 256k)
    [256k, 512k)
    [512k,
    1M)
    [...]
    3 |########
    7 |###################
    12 |################################|
    3 |########
    2 |#####
    0 |
    0 |
    3 |########
    1 |###
    1 |###
    2 |#####
    
slide 63:
    CounVng Off-CPU stacks
    # ply -c 'kprobe:schedule { @[stack()].count() }'
    1 probe active
    ^Cde-activating probes
    schedule+0x1
    sys_exit+0x17
    do_syscall_64+0x5e
    return_from_SYSCALL_64
    schedule+0x1
    fuse_dev_read+0x63
    new_sync_read+0xd2
    __vfs_read+0x26
    vfs_read+0x96
    sys_read+0x55
    do_syscall_64+0x5e
    return_from_SYSCALL_64
    schedule+0x1
    do_syscall_64+0xa2
    return_from_SYSCALL_64
    [...]
    
slide 64:
    ply One-Liners
    # Trace file opens:
    ply -c 'kprobe:do_sys_open { printf("opened: %s\n", mem(arg(1), "128s")); }'
    # Counting vfs functions by process name:
    ply -c 'kprobe:vfs_* { @[comm(), func()].count(); }'
    # Counting off-CPU stacks:
    ply -c 'kprobe:schedule { @[stack()].count() }'
    # Syscall read return size as a histogram:
    ply -c 'kretprobe:SyS_read { @ret.quantize(retval()); }'
    # Syscall read latency as a histogram:
    ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); }
    kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]);
    @start[tid()] = nil; }'
    [...]
    also see ply/oneliners.md
    
slide 65:
    ply
    • A new BPF-based dynamic tracer for Linux
    – Created by Tobias Waldekranz
    – hDps://github.com/iovisor/ply hDps://wkz.github.io/ply/
    • High-level language
    – Simple one-liners
    – Short scripts
    • In development
    – kprobes and tracepoints only, uprobes/perf_events not yet
    – Successful so far as a proof of concept
    – Not producVon tested yet (bcc is)
    
slide 66:
    Future work
    CHALLENGES
    
slide 67:
    Challenges
    MarkeVng
    DocumentaVon
    Training
    Community
    Without these, we may have another irace: a built in "secret" of Linux. Not good for adopVon!
    hDps://www.iovisor.org project helps, but tracing (observability) is only one part.
    
slide 68:
    Take aways
    1. Understanding the value of Linux tracing superpowers
    2. Upgrade to Linux 4.4+ (4.9 is beDer)
    3. Ask for eBPF support in your perf analysis/monitoring tools
    Please contribute:
    - hDps://github.com/
    iovisor/bcc
    - hDps://github.com/
    iovisor/ply
    BPF Tracing in Linux
    • 3.19: sockets
    • 3.19: maps
    • 4.1: kprobes
    • 4.3: uprobes
    • 4.4: BPF output
    • 4.6: stacks
    • 4.7: tracepoints
    • 4.9: profiling
    • 4.9: PMCs
    
slide 69:
    Links & References
    iovisor bcc:
    - hDps://github.com/iovisor/bcc hDps://github.com/iovisor/bcc/tree/master/docs
    - hDp://www.brendangregg.com/blog/ (search for "bcc")
    - hDp://www.brendangregg.com/ebpf.html#bcc
    - hDp://blogs.microsoi.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/
    - I'll change your view of Linux tracing: hDps://www.youtube.com/watch?v=GsMs3n8CB6g
    - On designing tracing tools: hDps://www.youtube.com/watch?v=uibLwoVKjec
    BPF:
    - hDps://www.kernel.org/doc/DocumentaVon/networking/filter.txt
    - hDps://github.com/iovisor/bpf-docs
    - hDps://suchakra.wordpress.com/tag/bpf/
    Flame Graphs:
    - hDp://www.brendangregg.com/flamegraphs.html
    - hDp://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
    - hDp://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
    Dynamic InstrumentaVon:
    - hDp://ip.cs.wisc.edu/par-distr-sys/papers/Hollingsworth94Dynamic.pdf
    - hDps://en.wikipedia.org/wiki/DTrace
    - DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD, Brendan Gregg, Jim Mauro; PrenVce Hall 2011
    Neclix Tech Blog on Vector:
    - hDp://techblog.neclix.com/2015/04/introducing-vector-neclixs-on-host.html
    Linux Performance: hDp://www.brendangregg.com/linuxperf.html
    
slide 70:
    Thanks
    – QuesVons?
    – iovisor bcc: hDps://github.com/iovisor/bcc
    – hDp://www.brendangregg.com
    – hDp://slideshare.net/brendangregg
    – bgregg@neclix.com
    – @brendangregg
    Thanks to Alexei Starovoitov (Facebook), Brenden Blanco (PLUMgrid/
    VMware), Sasha Goldshtein (Sela), Daniel Borkmann (Cisco), Wang
    Nan (Huawei), and other BPF and bcc contributors!