Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Kernel Recipes 2017: Performance Analysis with BPF

Video: https://www.youtube.com/watch?v=nhxq6jLGc_w

Talk by Brendan Gregg at Kernel Recipes 2017 (Paris)

Description: "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.

Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).

This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."

next
prev
1/61
next
prev
2/61
next
prev
3/61
next
prev
4/61
next
prev
5/61
next
prev
6/61
next
prev
7/61
next
prev
8/61
next
prev
9/61
next
prev
10/61
next
prev
11/61
next
prev
12/61
next
prev
13/61
next
prev
14/61
next
prev
15/61
next
prev
16/61
next
prev
17/61
next
prev
18/61
next
prev
19/61
next
prev
20/61
next
prev
21/61
next
prev
22/61
next
prev
23/61
next
prev
24/61
next
prev
25/61
next
prev
26/61
next
prev
27/61
next
prev
28/61
next
prev
29/61
next
prev
30/61
next
prev
31/61
next
prev
32/61
next
prev
33/61
next
prev
34/61
next
prev
35/61
next
prev
36/61
next
prev
37/61
next
prev
38/61
next
prev
39/61
next
prev
40/61
next
prev
41/61
next
prev
42/61
next
prev
43/61
next
prev
44/61
next
prev
45/61
next
prev
46/61
next
prev
47/61
next
prev
48/61
next
prev
49/61
next
prev
50/61
next
prev
51/61
next
prev
52/61
next
prev
53/61
next
prev
54/61
next
prev
55/61
next
prev
56/61
next
prev
57/61
next
prev
58/61
next
prev
59/61
next
prev
60/61
next
prev
61/61

PDF: KernelRecipes_BPF_Perf_Analysis.pdf

Keywords (from pdftotext):

slide 1:
    Performance
    Analysis
    Superpowers
    with Linux BPF
    Brendan Gregg
    Senior Performance Architect
    Sep 2017
    
slide 2:
    DEMO
    
slide 3:
slide 4:
    bcc/BPF tools
    
slide 5:
    Agenda
    1. eBPF & bcc
    2. bcc/BPF CLI Tools
    3. bcc/BPF VisualizaCons
    
slide 6:
    Take aways
    1. Understand Linux tracing and enhanced BPF
    2. How to use eBPF tools
    3. Areas of future development
    
slide 7:
slide 8:
    Who at NeRlix will use BPF?
    
slide 9:
    Introducing enhanced BPF for tracing: kernel-level soVware
    BPF
    
slide 10:
    Ye Olde BPF
    Berkeley Packet Filter
    # tcpdump host 127.0.0.1 and port 22 -d
    OpCmizes packet filter
    (000) ldh
    [12]
    performance
    (001) jeq
    #0x800
    jt 2
    jf 18
    (002) ld
    [26]
    (003) jeq
    #0x7f000001
    jt 6
    jf 4
    (004) ld
    [30]
    2 x 32-bit registers
    (005) jeq
    #0x7f000001
    jt 6
    jf 18
    (006) ldb
    [23]
    & scratch memory
    (007) jeq
    #0x84
    jt 10
    jf 8
    (008) jeq
    #0x6
    jt 10
    jf 9
    (009) jeq
    #0x11
    jt 10
    jf 18
    User-defined bytecode
    (010) ldh
    [20]
    executed by an in-kernel
    (011) jset
    #0x1fff
    jt 18
    jf 12
    sandboxed virtual machine
    (012) ldxb
    4*([14]&0xf)
    (013) ldh
    [x + 14]
    Steven McCanne and Van Jacobson, 1993
    [...]
    
slide 11:
    Enhanced BPF
    aka eBPF or just "BPF"
    10 x 64-bit registers
    maps (hashes)
    stack traces
    ac?ons
    Alexei Starovoitov, 2014+
    
slide 12:
    BPF for Tracing, Internals
    Observability Program
    BPF
    bytecode
    BPF
    program
    event config
    output
    per-event
    data
    staCsCcs
    Kernel
    load
    verifier
    staCc tracing
    tracepoints
    acach
    dynamic tracing
    BPF
    kprobes
    uprobes
    async
    copy
    sampling, PMCs
    maps
    perf_events
    Enhanced BPF is also now used for SDNs, DDOS miCgaCon, intrusion detecCon, container security, …
    
slide 13:
    Event Tracing Efficiency
    E.g., tracing TCP retransmits
    Kernel
    Old way: packet capture
    tcpdump
    Analyzer
    1. read
    2. dump
    buffer
    1. read
    2. process
    3. print
    file system
    send
    receive
    disks
    New way: dynamic tracing
    Tracer
    1. configure
    2. read
    tcp_retransmit_skb()
    
slide 14:
    Linux Events & BPF Support
    BPF output
    Linux 4.4
    Linux 4.7
    BPF stacks
    Linux 4.6
    Linux 4.3
    Linux 4.1
    (version
    BPF
    support
    arrived)
    Linux 4.9
    Linux 4.9
    
slide 15:
    A Linux Tracing Timeline
    1990’s: StaCc tracers, prototype dynamic tracers
    2000: LTT + DProbes (dynamic tracing; not integrated)
    2004: kprobes (2.6.9)
    2005: DTrace (not Linux), SystemTap (out-of-tree)
    2008: Vrace (2.6.27)
    2009: perf_events (2.6.31)
    2009: tracepoints (2.6.32)
    2010-2017: Vrace & perf_events enhancements
    2012: uprobes (3.5)
    2014-2017: enhanced BPF patches: suppor?ng tracing events
    2016-2017: Vrace hist triggers
    also: LTTng, ktap, sysdig, ...
    
slide 16:
    Introducing BPF Complier CollecCon: user-level front-end
    BCC
    
slide 17:
    bcc
    • BPF Compiler CollecCon
    Tracing layers:
    – hcps://github.com/iovisor/bcc
    – Lead developer: Brenden Blanco
    bcc tool
    • Includes tracing tools
    • Provides BPF front-ends:
    Python
    Lua
    C++
    C helper libraries
    golang (gobpf)
    bcc tool
    bcc
    Python
    user
    kernel
    lua
    front-ends
    Kernel
    Events
    BPF
    
slide 18:
    Raw BPF
    samples/bpf/sock_example.c
    87 lines truncated
    
slide 19:
    C/BPF
    samples/bpf/tracex1_kern.c
    58 lines truncated
    
slide 20:
    bcc/BPF (C & Python)
    bcc examples/tracing/bitehist.py
    en?re program
    
slide 21:
    bpVrace
    hcps://github.com/ajor/bpVrace
    en?re program
    
slide 22:
    The Tracing Landscape, Sep 2017
    (less brutal)
    (my opinion)
    dtrace4L.
    Ease of use
    sysdig
    (many)
    perf
    LTTng
    recent changes
    (alpha)
    (brutal)
    ktap
    (h i s t t
    rigge
    rs)
    Vrace
    (mature)
    bpVrace
    ply/BPF
    stap
    bcc/BPF
    C/BPF
    Stage of
    Development
    Raw BPF
    Scope & Capability
    
slide 23:
    Performance analysis
    BCC/BPF CLI TOOLS
    
slide 24:
    Pre-BPF: Linux Perf Analysis in 60s
    1. uptime
    2. dmesg -T | tail
    3. vmstat 1
    4. mpstat -P ALL 1
    5. pidstat 1
    6. iostat -xz 1
    7. free -m
    8. sar -n DEV 1
    9. sar -n TCP,ETCP 1
    10. top
    hcp://techblog.neRlix.com/2015/11/linux-performance-analysis-in-60s.html
    
slide 25:
    bcc InstallaCon
    • hcps://github.com/iovisor/bcc/blob/master/INSTALL.md
    • eg, Ubuntu Xenial:
    # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" |\
    sudo tee /etc/apt/sources.list.d/iovisor.list
    # sudo apt-get update
    # sudo apt-get install bcc-tools
    – Also available as an Ubuntu snap
    – Ubuntu 16.04 is good, 16.10 becer: more tools work
    • Installs many tools
    – In /usr/share/bcc/tools, and …/tools/old for older kernels
    
slide 26:
    bcc General Performance Checklist
    execsnoop
    opensnoop
    ext4slower (…)
    biolatency
    biosnoop
    cachestat
    tcpconnect
    tcpaccept
    9. tcpretrans
    10. gethostlatency
    11. runqlat
    12. profile
    
slide 27:
    Discover short-lived process issues using execsnoop
    # execsnoop -t
    TIME(s) PCOMM
    dirname
    run
    run
    run
    run
    bash
    svstat
    perl
    grep
    sed
    cut
    xargs
    xargs
    xargs
    xargs
    echo
    [...]
    PID
    PPID
    RET ARGS
    0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh
    0 ./run
    -2 /command/bash
    -2 /usr/local/bin/bash
    -2 /usr/local/sbin/bash
    0 /bin/bash
    0 /command/svstat /service/nflx-httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;print $1||0;
    0 /bin/ps --ppid 1 -o pid,cmd,args
    0 /bin/grep org.apache.catalina
    0 /bin/sed s/^ *//;
    0 /usr/bin/cut -d -f 1
    0 /usr/bin/xargs
    -2 /command/echo
    -2 /usr/local/bin/echo
    -2 /usr/local/sbin/echo
    0 /bin/echo
    Efficient: only traces exec()
    
slide 28:
    Discover short-lived process issues using execsnoop
    # execsnoop -t
    TIME(s) PCOMM
    dirname
    run
    run
    run
    run
    bash
    svstat
    perl
    grep
    sed
    cut
    xargs
    xargs
    xargs
    xargs
    echo
    [...]
    PID
    PPID
    RET ARGS
    0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh
    0 ./run
    -2 /command/bash
    -2 /usr/local/bin/bash
    -2 /usr/local/sbin/bash
    0 /bin/bash
    0 /command/svstat /service/nflx-httpd
    0 /usr/bin/perl -e $l=gt;;$l=~/(\d+) sec/;print $1||0;
    0 /bin/ps --ppid 1 -o pid,cmd,args
    0 /bin/grep org.apache.catalina
    0 /bin/sed s/^ *//;
    0 /usr/bin/cut -d -f 1
    0 /usr/bin/xargs
    -2 /command/echo
    -2 /usr/local/bin/echo
    -2 /usr/local/sbin/echo
    0 /bin/echo
    Efficient: only traces exec()
    
slide 29:
    Exonerate or confirm storage latency outliers with ext4slower
    # /usr/share/bcc/tools/ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    17:31:42 postdrop
    15523 S 0
    17:31:42 cleanup
    15524 S 0
    17:32:09 titus-log-ship 19735 S 0
    17:35:37 dhclient
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:45 postdrop
    16187 S 0
    17:35:45 cleanup
    16188 S 0
    […]
    LAT(ms) FILENAME
    2.32 5630D406E4
    1.89 57BB7406EC
    1.94 slurper_checkpoint.db
    3.32 dhclient.eth0.leases
    26.62 system.journal
    1.56 system.journal
    1.73 system.journal
    2.41 C0369406E4
    6.52 C1B90406EC
    Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency
    Also: btrfsslower, xfsslower, zfsslower
    
slide 30:
    Exonerate or confirm storage latency outliers with ext4slower
    # /usr/share/bcc/tools/ext4slower 1
    Tracing ext4 operations slower than 1 ms
    TIME
    COMM
    PID
    T BYTES
    OFF_KB
    17:31:42 postdrop
    15523 S 0
    17:31:42 cleanup
    15524 S 0
    17:32:09 titus-log-ship 19735 S 0
    17:35:37 dhclient
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:39 systemd-journa 504
    S 0
    17:35:45 postdrop
    16187 S 0
    17:35:45 cleanup
    16188 S 0
    […]
    LAT(ms) FILENAME
    2.32 5630D406E4
    1.89 57BB7406EC
    1.94 slurper_checkpoint.db
    3.32 dhclient.eth0.leases
    26.62 system.journal
    1.56 system.journal
    1.73 system.journal
    2.41 C0369406E4
    6.52 C1B90406EC
    Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency
    Also: btrfsslower, xfsslower, zfsslower
    
slide 31:
    Iden?fy mul?modal disk I/O latency and outliers with biolatency
    # biolatency -mT 10
    Tracing block device I/O... Hit Ctrl-C to end.
    19:19:04
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    19:19:14
    msecs
    0 ->gt; 1
    2 ->gt; 3
    […]
    The "count" column is
    summarized in-kernel
    : count
    : 238
    : 424
    : 834
    : 506
    : 986
    : 97
    : 7
    : 27
    distribution
    |*********
    |*****************
    |*********************************
    |********************
    |****************************************|
    |***
    : count
    : 427
    : 424
    distribution
    |*******************
    |******************
    Average latency (iostat/sar) may not be represenCCve with mulCple modes or outliers
    
slide 32:
    Iden?fy mul?modal disk I/O latency and outliers with biolatency
    # biolatency -mT 10
    Tracing block device I/O... Hit Ctrl-C to end.
    19:19:04
    msecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    19:19:14
    msecs
    0 ->gt; 1
    2 ->gt; 3
    […]
    The "count" column is
    summarized in-kernel
    : count
    : 238
    : 424
    : 834
    : 506
    : 986
    : 97
    : 7
    : 27
    distribution
    |*********
    |*****************
    |*********************************
    |********************
    |****************************************|
    |***
    : count
    : 427
    : 424
    distribution
    |*******************
    |******************
    Average latency (iostat/sar) may not be represenCCve with mulCple modes or outliers
    
slide 33:
    Efficiently trace TCP sessions with PID, bytes, and dura?on using tcplife
    # /usr/share/bcc/tools/tcplife
    PID
    COMM
    LADDR
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    12030 upload-mes 127.0.0.1
    2509 java
    12030 upload-mes 127.0.0.1
    3964 mesos-slav 127.0.0.1
    12021 upload-sys 127.0.0.1
    2509 java
    2235 dockerd
    2235 dockerd
    [...]
    LPORT RADDR
    8078 100.82.130.159
    8078 100.82.78.215
    60778 100.82.207.252
    38884 100.82.208.178
    4243 127.0.0.1
    42166 127.0.0.1
    34020 127.0.0.1
    8078 127.0.0.1
    21196 127.0.0.1
    7101 127.0.0.1
    34022 127.0.0.1
    8078 127.0.0.1
    13730 100.82.136.233
    34314 100.82.64.53
    RPORT TX_KB RX_KB MS
    0 5.44
    0 135.32
    13 15126.87
    0 15568.25
    0 0.61
    0 0.67
    0 3.38
    11 3.41
    0 12.61
    0 12.64
    0 15.28
    372 15.31
    4 18.50
    8 56.73
    Dynamic tracing of TCP set state only; does not trace send/receive
    Also see: tcpconnect, tcpaccept, tcpretrans
    
slide 34:
    Efficiently trace TCP sessions with PID, bytes, and dura?on using tcplife
    # /usr/share/bcc/tools/tcplife
    PID
    COMM
    LADDR
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    2509 java
    12030 upload-mes 127.0.0.1
    2509 java
    12030 upload-mes 127.0.0.1
    3964 mesos-slav 127.0.0.1
    12021 upload-sys 127.0.0.1
    2509 java
    2235 dockerd
    2235 dockerd
    [...]
    LPORT RADDR
    8078 100.82.130.159
    8078 100.82.78.215
    60778 100.82.207.252
    38884 100.82.208.178
    4243 127.0.0.1
    42166 127.0.0.1
    34020 127.0.0.1
    8078 127.0.0.1
    21196 127.0.0.1
    7101 127.0.0.1
    34022 127.0.0.1
    8078 127.0.0.1
    13730 100.82.136.233
    34314 100.82.64.53
    RPORT TX_KB RX_KB MS
    0 5.44
    0 135.32
    13 15126.87
    0 15568.25
    0 0.61
    0 0.67
    0 3.38
    11 3.41
    0 12.61
    0 12.64
    0 15.28
    372 15.31
    4 18.50
    8 56.73
    Dynamic tracing of TCP set state only; does not trace send/receive
    Also see: tcpconnect, tcpaccept, tcpretrans
    
slide 35:
    Iden?fy DNS latency issues system wide with gethostlatency
    # /usr/share/bcc/tools/gethostlatency
    TIME
    PID
    COMM
    18:56:36 5055
    mesos-slave
    18:56:40 5590
    java
    18:56:51 5055
    mesos-slave
    18:56:53 30166 ncat
    18:56:56 6661
    java
    18:56:59 5589
    java
    18:57:03 5370
    java
    18:57:03 30259 sudo
    18:57:06 5055
    mesos-slave
    18:57:10 5590
    java
    18:57:21 5055
    mesos-slave
    18:57:29 5589
    java
    18:57:36 5055
    mesos-slave
    18:57:40 5590
    java
    18:57:51 5055
    mesos-slave
    […]
    LATms HOST
    0.01 100.82.166.217
    3.53 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    0.21 localhost
    2.19 atlas-alert-….prod.netflix.net
    1.50 ec2-…-207.compute-1.amazonaws.com
    0.04 localhost
    0.07 titusagent-mainvpc-m…3465
    0.01 100.82.166.217
    3.10 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    52.36 ec2-…-207.compute-1.amazonaws.com
    0.01 100.82.166.217
    1.83 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.
    
slide 36:
    Iden?fy DNS latency issues system wide with gethostlatency
    # /usr/share/bcc/tools/gethostlatency
    TIME
    PID
    COMM
    18:56:36 5055
    mesos-slave
    18:56:40 5590
    java
    18:56:51 5055
    mesos-slave
    18:56:53 30166 ncat
    18:56:56 6661
    java
    18:56:59 5589
    java
    18:57:03 5370
    java
    18:57:03 30259 sudo
    18:57:06 5055
    mesos-slave
    18:57:10 5590
    java
    18:57:21 5055
    mesos-slave
    18:57:29 5589
    java
    18:57:36 5055
    mesos-slave
    18:57:40 5590
    java
    18:57:51 5055
    mesos-slave
    […]
    LATms HOST
    0.01 100.82.166.217
    3.53 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    0.21 localhost
    2.19 atlas-alert-….prod.netflix.net
    1.50 ec2-…-207.compute-1.amazonaws.com
    0.04 localhost
    0.07 titusagent-mainvpc-m…3465
    0.01 100.82.166.217
    3.10 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    52.36 ec2-…-207.compute-1.amazonaws.com
    0.01 100.82.166.217
    1.83 ec2-…-79.compute-1.amazonaws.com
    0.01 100.82.166.217
    Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.
    
slide 37:
    Examine CPU scheduler latency as a histogram with runqlat
    # /usr/share/bcc/tools/runqlat 10
    Tracing run queue latency... Hit Ctrl-C to end.
    usecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    256 ->gt; 511
    512 ->gt; 1023
    1024 ->gt; 2047
    2048 ->gt; 4095
    4096 ->gt; 8191
    : count
    : 2810
    : 5248
    : 12369
    : 71312
    : 55705
    : 11775
    : 6230
    : 2758
    : 549
    : 46
    : 11
    : 4
    : 5
    distribution
    |**
    |******
    |****************************************|
    |*******************************
    |******
    |***
    […]
    As efficient as possible: scheduler calls can become frequent
    
slide 38:
    Examine CPU scheduler latency as a histogram with runqlat
    # /usr/share/bcc/tools/runqlat 10
    Tracing run queue latency... Hit Ctrl-C to end.
    usecs
    0 ->gt; 1
    2 ->gt; 3
    4 ->gt; 7
    8 ->gt; 15
    16 ->gt; 31
    32 ->gt; 63
    64 ->gt; 127
    128 ->gt; 255
    256 ->gt; 511
    512 ->gt; 1023
    1024 ->gt; 2047
    2048 ->gt; 4095
    4096 ->gt; 8191
    : count
    : 2810
    : 5248
    : 12369
    : 71312
    : 55705
    : 11775
    : 6230
    : 2758
    : 549
    : 46
    : 11
    : 4
    : 5
    distribution
    |**
    |******
    |****************************************|
    |*******************************
    |******
    |***
    […]
    As efficient as possible: scheduler calls can become frequent
    
slide 39:
    Construct programma?c one-liners with trace
    e.g. reads over 20000 bytes:
    # trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    TIME
    PID
    COMM
    FUNC
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    05:18:23 4490
    sys_read
    read 1048576 bytes
    # trace -h
    [...]
    trace –K blk_account_io_start
    Trace this kernel function, and print info with a kernel stack trace
    trace 'do_sys_open "%s", arg2'
    Trace the open syscall and print the filename being opened
    trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3'
    Trace the read syscall and print a message for reads >gt;20000 bytes
    trace r::do_sys_return
    Trace the return from the open syscall
    trace 'c:open (arg2 == 42) "%s %d", arg1, arg2'
    Trace the open() call from libc only if the flags (arg2) argument is 42
    [...]
    argdist by Sasha Goldshtein
    
slide 40:
    Create in-kernel summaries with argdist
    e.g. histogram of tcp_cleanup_rbuf() copied:
    # argdist -H 'p::tcp_cleanup_rbuf(struct sock *sk, int copied):int:copied'
    [15:34:45]
    copied
    : count
    distribution
    0 ->gt; 1
    : 15088
    |**********************************
    2 ->gt; 3
    : 0
    4 ->gt; 7
    : 0
    8 ->gt; 15
    : 0
    16 ->gt; 31
    : 0
    32 ->gt; 63
    : 0
    64 ->gt; 127
    : 4786
    |***********
    128 ->gt; 255
    : 1
    256 ->gt; 511
    : 1
    512 ->gt; 1023
    : 4
    1024 ->gt; 2047
    : 11
    2048 ->gt; 4095
    : 5
    4096 ->gt; 8191
    : 27
    8192 ->gt; 16383
    : 105
    16384 ->gt; 32767
    : 0
    argdist by Sasha Goldshtein
    
slide 41:
    Coming to a GUI near you
    BCC/BPF VISUALIZATIONS
    
slide 42:
    BPF metrics and analysis can be automated in GUIs
    Eg, NeRlix Vector (self-service UI):
    Flame Graphs
    Heat Maps
    Tracing Reports
    Should be open sourced; you may also build/buy your own
    
slide 43:
    Latency heatmaps show histograms over ?me
    
slide 44:
    Efficient on- and off-CPU flame graphs via kernel stack aggrega?on
    CPU
    Off-CPU
    via sampling
    via sched
    tracing
    
slide 45:
    Generic thread state digram
    Solve
    everything?
    
slide 46:
    Off-CPU Time (zoomed): gzip(1)
    Off-CPU doesn't always make sense:
    what is gzip blocked on?
    
slide 47:
    Wakeup ?me flame graphs show waker thread stacks
    gzip(1) is blocked on tar(1)!
    tar cf - * | gzip >gt; out.tar.gz
    Can't we associate off-CPU with wakeup stacks?
    
slide 48:
    Off-wake flame graphs: BPF can merge blocking plus waker stacks in-kernel
    Waker task
    Waker stack
    Stack
    DirecCon
    Wokeup
    Blocked stack
    Blocked task
    
slide 49:
    Chain graphs: merge all wakeup stacks
    
slide 50:
    BPF
    FUTURE WORK
    
slide 51:
    BCC Improvements
    • Challenges
    IniCalize all variables
    BPF_PERF_OUTPUT()
    Verifier errors
    SCll explicit bpf_probe_read()s.
    It's getng becer (thanks):
    • High-Level Languages
    – One-liners and scripts
    – Can use libbcc
    tcpconnlat.py
    
slide 52:
    ply
    • A new BPF-based language and tracer for Linux
    – Created by Tobias Waldekranz
    – hcps://github.com/iovisor/ply hcps://wkz.github.io/ply/
    – Promising, was in development
    # ply -c 'kprobe:do_sys_open { printf("opened: %s\n", mem(arg(1), "128s")); }'
    1 probe active
    opened: /sys/kernel/debug/tracing/events/enable
    opened: /etc/ld.so.cache
    opened: /lib/x86_64-linux-gnu/libselinux.so.1
    opened: /lib/x86_64-linux-gnu/libc.so.6
    opened: /proc/filesystems
    opened: /usr/lib/locale/locale-archive
    opened: .
    [...]
    
slide 53:
    ply programs are concise, such as measuring read latency
    # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); }
    kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]);
    @start[tid()] = nil; }'
    2 probes active
    ^Cde-activating probes
    [...]
    @ns:
    [ 512,
    1k)
    [ 1k,
    2k)
    [ 2k,
    4k)
    [ 4k,
    8k)
    [ 8k, 16k)
    [ 16k, 32k)
    [ 32k, 64k)
    [ 64k, 128k)
    [128k, 256k)
    [256k, 512k)
    [512k,
    1M)
    [...]
    3 |########
    7 |###################
    12 |################################|
    3 |########
    2 |#####
    0 |
    0 |
    3 |########
    1 |###
    1 |###
    2 |#####
    
slide 54:
    bpVrace
    • Another new BPF-based language and tracer for Linux
    – Created by Alastair Robertson
    – hcps://github.com/ajor/bpVrace
    – In acCve development
    # bpftrace -e 'kprobe:sys_open { printf("opened: %s\n", str(arg0)); }'
    Attaching 1 probe...
    opened: /sys/devices/system/cpu/online
    opened: /proc/1956/stat
    opened: /proc/1241/stat
    opened: /proc/net/dev
    opened: /proc/net/if_inet6
    opened: /sys/class/net/eth0/device/vendor
    opened: /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms
    [...]
    
slide 55:
    bperace programs are concise, such as measuring read latency
    # bpftrace -e 'kprobe:SyS_read { @start[tid] = nsecs; } kretprobe:SyS_read /@start[tid]/
    { @ns = quantize(nsecs - @start[tid]); @start[tid] = delete(); }'
    Attaching 2 probes...
    @ns:
    [0, 1]
    [2, 4)
    [4, 8)
    [8, 16)
    [16, 32)
    [32, 64)
    [64, 128)
    [128, 256)
    [256, 512)
    [512, 1k)
    [1k, 2k)
    [2k, 4k)
    [4k, 8k)
    [8k, 16k)
    [16k, 32k)
    [32k, 64k)
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    0 |
    6 |@@@@@
    20 |@@@@@@@@@@@@@@@@@@@
    4 |@@@
    14 |@@@@@@@@@@@@@
    53 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
    2 |@
    
slide 56:
    New Tooling/Metrics
    
slide 57:
    New VisualizaCons
    
slide 58:
    Case Studies
    Use it
    Solve something
    Write about it
    Talk about it
    • Recent posts:
    – hcps://blogs.dropbox.com/tech/2017/09/opCmizing-web-servers-for-high-throughput-andlow-latency/
    – hcps://joseuacik.github.io/kernel/scheduler/bcc/bpf/2017/08/03/sched-Cme.html
    
slide 59:
    Take aways
    1. Understand Linux tracing components
    2. Understand the role and state of enhanced BPF
    3. Discover opportuniCes for future development
    Please contribute:
    - hcps://github.com/
    iovisor/bcc
    - hcps://github.com/
    iovisor/ply
    BPF Tracing in Linux
    • 3.19: sockets
    • 3.19: maps
    • 4.1: kprobes
    • 4.3: uprobes
    • 4.4: BPF output
    • 4.6: stacks
    • 4.7: tracepoints
    • 4.9: profiling
    • 4.9: PMCs
    
slide 60:
    Links & References
    iovisor bcc:
    - hcps://github.com/iovisor/bcc hcps://github.com/iovisor/bcc/tree/master/docs
    - hcp://www.brendangregg.com/blog/ (search for "bcc")
    - hcp://www.brendangregg.com/ebpf.html#bcc
    - hcp://blogs.microsoV.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/
    - On designing tracing tools: hcps://www.youtube.com/watch?v=uibLwoVKjec
    bcc tutorial:
    - hcps://github.com/iovisor/bcc/blob/master/INSTALL.md
    - …/docs/tutorial.md …/docs/tutorial_bcc_python_developer.md …/docs/reference_guide.md
    - .../CONTRIBUTING-SCRIPTS.md
    ply: hcps://github.com/iovisor/ply
    bpVrace: hcps://github.com/ajor/bpVrace
    BPF:
    - hcps://www.kernel.org/doc/DocumentaCon/networking/filter.txt
    - hcps://github.com/iovisor/bpf-docs
    - hcps://suchakra.wordpress.com/tag/bpf/
    Dynamic tracing: Vp://Vp.cs.wisc.edu/paradyn/papers/Hollingsworth94Dynamic.pdf
    Flame Graphs:
    - hcp://www.brendangregg.com/flamegraphs.html
    - hcp://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
    - hcp://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
    NeRlix Tech Blog on Vector:
    - hcp://techblog.neRlix.com/2015/04/introducing-vector-neRlixs-on-host.html
    Linux Performance: hcp://www.brendangregg.com/linuxperf.html
    
slide 61:
    Thank You
    – QuesCons?
    – iovisor bcc: hcps://github.com/iovisor/bcc
    – hcp://www.brendangregg.com
    – hcp://slideshare.net/brendangregg
    – bgregg@neRlix.com
    – @brendangregg
    Thanks to Alexei Starovoitov (Facebook), Brenden Blanco (PLUMgrid/VMware), Sasha Goldshtein (Sela),
    Teng Qin (Facebook), Yonghong Song (Facebook), Daniel Borkmann (Cisco/Covalent), Wang Nan
    (Huawei), Vicent Mar| (GitHub), Paul Chaignon (Orange), and other BPF and bcc contributors!