USENIX ATC 2017: Performance Superpowers with Enhanced BPF
Video: https://www.youtube.com/watch?v=oc9000dM9-kTalk for USENIX ATC 2017 by Brendan Gregg.
Description: "The Berkeley Packet Filter (BPF) in Linux has been enhanced in very recent versions to do much more than just filter packets, and has become a hot area of operating systems innovation, with much more yet to be discovered. BPF is a sandboxed virtual machine that runs user-level defined programs in kernel context, and is part of many kernels. The Linux enhancements allow it to run custom programs on other events, including kernel- and user-level dynamic tracing (kprobes and uprobes), static tracing (tracepoints), and hardware events. This is finding uses for the generation of new performance analysis tools, network acceleration technologies, and security intrusion detection systems.
This talk will explain the BPF enhancements, then discuss the new performance observability tools that are in use and being created, especially from the BPF compiler collection (bcc) open source project. These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and much more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations.
Because these BPF enhancements are only in very recent Linux (such as Linux 4.9), most companies are not yet running new enough kernels to be exploring BPF yet. This will change in the next year or two, as companies including Netflix upgrade their kernels. This talk will give you a head start on this growing technology, and also discuss areas of future work and unsolved problems."
next prev 1/60 | |
next prev 2/60 | |
next prev 3/60 | |
next prev 4/60 | |
next prev 5/60 | |
next prev 6/60 | |
next prev 7/60 | |
next prev 8/60 | |
next prev 9/60 | |
next prev 10/60 | |
next prev 11/60 | |
next prev 12/60 | |
next prev 13/60 | |
next prev 14/60 | |
next prev 15/60 | |
next prev 16/60 | |
next prev 17/60 | |
next prev 18/60 | |
next prev 19/60 | |
next prev 20/60 | |
next prev 21/60 | |
next prev 22/60 | |
next prev 23/60 | |
next prev 24/60 | |
next prev 25/60 | |
next prev 26/60 | |
next prev 27/60 | |
next prev 28/60 | |
next prev 29/60 | |
next prev 30/60 | |
next prev 31/60 | |
next prev 32/60 | |
next prev 33/60 | |
next prev 34/60 | |
next prev 35/60 | |
next prev 36/60 | |
next prev 37/60 | |
next prev 38/60 | |
next prev 39/60 | |
next prev 40/60 | |
next prev 41/60 | |
next prev 42/60 | |
next prev 43/60 | |
next prev 44/60 | |
next prev 45/60 | |
next prev 46/60 | |
next prev 47/60 | |
next prev 48/60 | |
next prev 49/60 | |
next prev 50/60 | |
next prev 51/60 | |
next prev 52/60 | |
next prev 53/60 | |
next prev 54/60 | |
next prev 55/60 | |
next prev 56/60 | |
next prev 57/60 | |
next prev 58/60 | |
next prev 59/60 | |
next prev 60/60 |
PDF: USENIX_ATC2017_BPF_superpowers.pdf
Keywords (from pdftotext):
slide 1:
2017 USENIX Annual Technical Conference Performance Analysis Superpowers with Linux eBPF Brendan Gregg Senior Performance Architect Jul 2017slide 2:
slide 3:
Efficiently trace TCP sessions with PID, bytes, and dura:on using tcplife # /usr/share/bcc/tools/tcplife PID COMM LADDR 2509 java 2509 java 2509 java 2509 java 2509 java 2509 java 12030 upload-mes 127.0.0.1 2509 java 12030 upload-mes 127.0.0.1 3964 mesos-slav 127.0.0.1 12021 upload-sys 127.0.0.1 2509 java 2235 dockerd 2235 dockerd 12068 titus-reap 127.0.0.1 [...] LPORT RADDR 8078 100.82.130.159 8078 100.82.78.215 60778 100.82.207.252 38884 100.82.208.178 4243 127.0.0.1 42166 127.0.0.1 34020 127.0.0.1 8078 127.0.0.1 21196 127.0.0.1 7101 127.0.0.1 34022 127.0.0.1 8078 127.0.0.1 13730 100.82.136.233 34314 100.82.64.53 46476 127.0.0.1 RPORT TX_KB RX_KB MS 0 5.44 0 135.32 13 15126.87 0 15568.25 0 0.61 0 0.67 0 3.38 11 3.41 0 12.61 0 12.64 0 15.28 372 15.31 4 18.50 8 56.73 0 1.25slide 4:
bcc/BPF toolsslide 5:
Agenda 1. eBPF & bcc 2. bcc/BPF CLI Tools 3. bcc/BPF Visualiza?onsslide 6:
Take aways 1. Understand Linux tracing components 2. Understand the role and state of enhanced BPF 3. Discover opportuni?es for future developmentslide 7:
slide 8:
Who at NeNlix will use BPF?slide 9:
Introducing enhanced BPF for tracing: kernel-level soRware BPFslide 10:
Ye Olde BPF Berkeley Packet Filter # tcpdump host 127.0.0.1 and port 22 -d Op?mizes packet filter (000) ldh [12] performance (001) jeq #0x800 jt 2 jf 18 (002) ld [26] (003) jeq #0x7f000001 jt 6 jf 4 (004) ld [30] 2 x 32-bit registers (005) jeq #0x7f000001 jt 6 jf 18 (006) ldb [23] & scratch memory (007) jeq #0x84 jt 10 jf 8 (008) jeq #0x6 jt 10 jf 9 (009) jeq #0x11 jt 10 jf 18 User-defined bytecode (010) ldh [20] executed by an in-kernel (011) jset #0x1fff jt 18 jf 12 sandboxed virtual machine (012) ldxb 4*([14]&0xf) (013) ldh [x + 14] Steven McCanne and Van Jacobson, 1993 [...]slide 11:
Enhanced BPF aka eBPF or just "BPF" 10 x 64-bit registers maps (hashes) ac:ons Alexei Starovoitov, 2014+slide 12:
BPF for Tracing, Internals Observability Program BPF bytecode BPF program event config output per-event data sta?s?cs Kernel load verifier sta?c tracing tracepoints abach dynamic tracing BPF kprobes uprobes async copy sampling, PMCs maps perf_events Enhanced BPF is also now used for SDNs, DDOS mi?ga?on, intrusion detec?on, container security, …slide 13:
Dynamic Tracingslide 14:
1999: Kerninst hbp://www.paradyn.org/html/kerninst.htmlslide 15:
Event Tracing Efficiency E.g., tracing TCP retransmits Kernel Old way: packet capture tcpdump Analyzer 1. read 2. dump buffer 1. read 2. process 3. print file system send receive disks New way: dynamic tracing Tracer 1. configure 2. read tcp_retransmit_skb()slide 16:
Linux Events & BPF Support BPF output Linux 4.4 Linux 4.7 BPF stacks Linux 4.6 Linux 4.3 Linux 4.1 (version BPF support arrived) Linux 4.9 Linux 4.9slide 17:
A Linux Tracing Timeline 1990’s: Sta?c tracers, prototype dynamic tracers 2000: LTT + DProbes (dynamic tracing; not integrated) 2004: kprobes (2.6.9) 2005: DTrace (not Linux), SystemTap (out-of-tree) 2008: Rrace (2.6.27) 2009: perf_events (2.6.31) 2009: tracepoints (2.6.32) 2010-2016: Rrace & perf_events enhancements 2012: uprobes (3.5) 2014-2017: enhanced BPF patches: suppor:ng tracing events 2016-2017: Rrace hist triggers also: LTTng, ktap, sysdig, ...slide 18:
Introducing BPF Complier Collec?on: user-level front-end BCCslide 19:
bcc • BPF Compiler Collec?on Tracing layers: – hbps://github.com/iovisor/bcc – Lead developer: Brenden Blanco bcc tool • Includes tracing tools • Provides BPF front-ends: Python Lua C++ C helper libraries golang (gobpf) bcc tool bcc Python user kernel lua front-ends Kernel Events BPFslide 20:
Raw BPF samples/bpf/sock_example.c 87 lines truncatedslide 21:
C/BPF samples/bpf/tracex1_kern.c 58 lines truncatedslide 22:
bcc/BPF (C & Python) bcc examples/tracing/bitehist.py en:re programslide 23:
ply/BPF hbps://github.com/iovisor/ply/blob/master/README.md en:re programslide 24:
The Tracing Landscape, Jul 2017 (less brutal) (my opinion) dtrace4L. Ease of use sysdig (alpha) (brutal) ktap (many) perf LTTng recent changes ply/BPF (h i s t t rigge rs) Rrace (mature) stap bcc/BPF C/BPF Stage of Development Raw BPF Scope & Capabilityslide 25:
Performance analysis BCC/BPF CLI TOOLSslide 26:
Pre-BPF: Linux Perf Analysis in 60s 1. uptime 2. dmesg -T | tail 3. vmstat 1 4. mpstat -P ALL 1 5. pidstat 1 6. iostat -xz 1 7. free -m 8. sar -n DEV 1 9. sar -n TCP,ETCP 1 10. top hbp://techblog.neNlix.com/2015/11/linux-performance-analysis-in-60s.htmlslide 27:
bcc Installa?on • hbps://github.com/iovisor/bcc/blob/master/INSTALL.md • eg, Ubuntu Xenial: # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" |\ sudo tee /etc/apt/sources.list.d/iovisor.list # sudo apt-get update # sudo apt-get install bcc-tools – Also available as an Ubuntu snap – Ubuntu 16.04 is good, 16.10 beber: more tools work • Installs many tools – In /usr/share/bcc/tools, and …/tools/old for older kernelsslide 28:
bcc General Performance Checklist execsnoop opensnoop ext4slower (…) biolatency biosnoop cachestat tcpconnect tcpaccept 9. tcpretrans 10. gethostlatency 11. runqlat 12. profileslide 29:
Discover short-lived process issues using execsnoop # execsnoop -t TIME(s) PCOMM dirname run run run run bash svstat perl grep sed cut xargs xargs xargs xargs echo [...] PID PPID RET ARGS 0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh 0 ./run -2 /command/bash -2 /usr/local/bin/bash -2 /usr/local/sbin/bash 0 /bin/bash 0 /command/svstat /service/nflx-httpd 0 /usr/bin/perl -e $l=slide 30:gt;;$l=~/(\d+) sec/;print $1||0; 0 /bin/ps --ppid 1 -o pid,cmd,args 0 /bin/grep org.apache.catalina 0 /bin/sed s/^ *//; 0 /usr/bin/cut -d -f 1 0 /usr/bin/xargs -2 /command/echo -2 /usr/local/bin/echo -2 /usr/local/sbin/echo 0 /bin/echo Efficient: only traces exec()
Discover short-lived process issues using execsnoop # execsnoop -t TIME(s) PCOMM dirname run run run run bash svstat perl grep sed cut xargs xargs xargs xargs echo [...] PID PPID RET ARGS 0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh 0 ./run -2 /command/bash -2 /usr/local/bin/bash -2 /usr/local/sbin/bash 0 /bin/bash 0 /command/svstat /service/nflx-httpd 0 /usr/bin/perl -e $l=slide 31:gt;;$l=~/(\d+) sec/;print $1||0; 0 /bin/ps --ppid 1 -o pid,cmd,args 0 /bin/grep org.apache.catalina 0 /bin/sed s/^ *//; 0 /usr/bin/cut -d -f 1 0 /usr/bin/xargs -2 /command/echo -2 /usr/local/bin/echo -2 /usr/local/sbin/echo 0 /bin/echo Efficient: only traces exec()
Exonerate or confirm storage latency issues and outliers with ext4slower # /usr/share/bcc/tools/ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB 17:31:42 postdrop 15523 S 0 17:31:42 cleanup 15524 S 0 17:32:09 titus-log-ship 19735 S 0 17:35:37 dhclient S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:45 postdrop 16187 S 0 17:35:45 cleanup 16188 S 0 […] LAT(ms) FILENAME 2.32 5630D406E4 1.89 57BB7406EC 1.94 slurper_checkpoint.db 3.32 dhclient.eth0.leases 26.62 system.journal 1.56 system.journal 1.73 system.journal 2.41 C0369406E4 6.52 C1B90406EC Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency Also: btrfsslower, xfsslower, zfsslowerslide 32:
Exonerate or confirm storage latency issues and outliers with ext4slower # /usr/share/bcc/tools/ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB 17:31:42 postdrop 15523 S 0 17:31:42 cleanup 15524 S 0 17:32:09 titus-log-ship 19735 S 0 17:35:37 dhclient S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:45 postdrop 16187 S 0 17:35:45 cleanup 16188 S 0 […] LAT(ms) FILENAME 2.32 5630D406E4 1.89 57BB7406EC 1.94 slurper_checkpoint.db 3.32 dhclient.eth0.leases 26.62 system.journal 1.56 system.journal 1.73 system.journal 2.41 C0369406E4 6.52 C1B90406EC Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency Also: btrfsslower, xfsslower, zfsslowerslide 33:
Iden:fy mul:modal disk I/O latency and outliers with biolatency # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 19:19:14 msecs 0 ->gt; 1 2 ->gt; 3 […] The "count" column is summarized in-kernel : count : 238 : 424 : 834 : 506 : 986 : 97 : 7 : 27 distribution |********* |***************** |********************************* |******************** |****************************************| |*** : count : 427 : 424 distribution |******************* |****************** Average latency (iostat/sar) may not be represen??ve with mul?ple modes or outliersslide 34:
Iden:fy mul:modal disk I/O latency and outliers with biolatency # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 19:19:14 msecs 0 ->gt; 1 2 ->gt; 3 […] The "count" column is summarized in-kernel : count : 238 : 424 : 834 : 506 : 986 : 97 : 7 : 27 distribution |********* |***************** |********************************* |******************** |****************************************| |*** : count : 427 : 424 distribution |******************* |****************** Average latency (iostat/sar) may not be represen??ve with mul?ple modes or outliersslide 35:
Efficiently trace TCP sessions with PID, bytes, and dura:on using tcplife # /usr/share/bcc/tools/tcplife PID COMM LADDR 2509 java 2509 java 2509 java 2509 java 2509 java 2509 java 12030 upload-mes 127.0.0.1 2509 java 12030 upload-mes 127.0.0.1 3964 mesos-slav 127.0.0.1 12021 upload-sys 127.0.0.1 2509 java 2235 dockerd 2235 dockerd [...] LPORT RADDR 8078 100.82.130.159 8078 100.82.78.215 60778 100.82.207.252 38884 100.82.208.178 4243 127.0.0.1 42166 127.0.0.1 34020 127.0.0.1 8078 127.0.0.1 21196 127.0.0.1 7101 127.0.0.1 34022 127.0.0.1 8078 127.0.0.1 13730 100.82.136.233 34314 100.82.64.53 RPORT TX_KB RX_KB MS 0 5.44 0 135.32 13 15126.87 0 15568.25 0 0.61 0 0.67 0 3.38 11 3.41 0 12.61 0 12.64 0 15.28 372 15.31 4 18.50 8 56.73 Dynamic tracing of TCP set state only; does not trace send/receive Also see: tcpconnect, tcpaccept, tcpretransslide 36:
Efficiently trace TCP sessions with PID, bytes, and dura:on using tcplife # /usr/share/bcc/tools/tcplife PID COMM LADDR 2509 java 2509 java 2509 java 2509 java 2509 java 2509 java 12030 upload-mes 127.0.0.1 2509 java 12030 upload-mes 127.0.0.1 3964 mesos-slav 127.0.0.1 12021 upload-sys 127.0.0.1 2509 java 2235 dockerd 2235 dockerd [...] LPORT RADDR 8078 100.82.130.159 8078 100.82.78.215 60778 100.82.207.252 38884 100.82.208.178 4243 127.0.0.1 42166 127.0.0.1 34020 127.0.0.1 8078 127.0.0.1 21196 127.0.0.1 7101 127.0.0.1 34022 127.0.0.1 8078 127.0.0.1 13730 100.82.136.233 34314 100.82.64.53 RPORT TX_KB RX_KB MS 0 5.44 0 135.32 13 15126.87 0 15568.25 0 0.61 0 0.67 0 3.38 11 3.41 0 12.61 0 12.64 0 15.28 372 15.31 4 18.50 8 56.73 Dynamic tracing of TCP set state only; does not trace send/receive Also see: tcpconnect, tcpaccept, tcpretransslide 37:
Iden:fy DNS latency issues system wide with gethostlatency # /usr/share/bcc/tools/gethostlatency TIME PID COMM 18:56:36 5055 mesos-slave 18:56:40 5590 java 18:56:51 5055 mesos-slave 18:56:53 30166 ncat 18:56:56 6661 java 18:56:59 5589 java 18:57:03 5370 java 18:57:03 30259 sudo 18:57:06 5055 mesos-slave 18:57:10 5590 java 18:57:21 5055 mesos-slave 18:57:29 5589 java 18:57:36 5055 mesos-slave 18:57:40 5590 java 18:57:51 5055 mesos-slave […] LATms HOST 0.01 100.82.166.217 3.53 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 0.21 localhost 2.19 atlas-alert-….prod.netflix.net 1.50 ec2-…-207.compute-1.amazonaws.com 0.04 localhost 0.07 titusagent-mainvpc-m…3465 0.01 100.82.166.217 3.10 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 52.36 ec2-…-207.compute-1.amazonaws.com 0.01 100.82.166.217 1.83 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.slide 38:
Iden:fy DNS latency issues system wide with gethostlatency # /usr/share/bcc/tools/gethostlatency TIME PID COMM 18:56:36 5055 mesos-slave 18:56:40 5590 java 18:56:51 5055 mesos-slave 18:56:53 30166 ncat 18:56:56 6661 java 18:56:59 5589 java 18:57:03 5370 java 18:57:03 30259 sudo 18:57:06 5055 mesos-slave 18:57:10 5590 java 18:57:21 5055 mesos-slave 18:57:29 5589 java 18:57:36 5055 mesos-slave 18:57:40 5590 java 18:57:51 5055 mesos-slave […] LATms HOST 0.01 100.82.166.217 3.53 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 0.21 localhost 2.19 atlas-alert-….prod.netflix.net 1.50 ec2-…-207.compute-1.amazonaws.com 0.04 localhost 0.07 titusagent-mainvpc-m…3465 0.01 100.82.166.217 3.10 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 52.36 ec2-…-207.compute-1.amazonaws.com 0.01 100.82.166.217 1.83 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.slide 39:
Examine CPU scheduler run queue latency as a histogram with runqlat # /usr/share/bcc/tools/runqlat 10 Tracing run queue latency... Hit Ctrl-C to end. usecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 256 ->gt; 511 512 ->gt; 1023 1024 ->gt; 2047 2048 ->gt; 4095 4096 ->gt; 8191 : count : 2810 : 5248 : 12369 : 71312 : 55705 : 11775 : 6230 : 2758 : 549 : 46 : 11 : 4 : 5 distribution |** |****** |****************************************| |******************************* |****** |*** […] As efficient as possible: scheduler calls can become frequentslide 40:
Examine CPU scheduler run queue latency as a histogram with runqlat # /usr/share/bcc/tools/runqlat 10 Tracing run queue latency... Hit Ctrl-C to end. usecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 256 ->gt; 511 512 ->gt; 1023 1024 ->gt; 2047 2048 ->gt; 4095 4096 ->gt; 8191 : count : 2810 : 5248 : 12369 : 71312 : 55705 : 11775 : 6230 : 2758 : 549 : 46 : 11 : 4 : 5 distribution |** |****** |****************************************| |******************************* |****** |*** […] As efficient as possible: scheduler calls can become frequentslide 41:
Advanced Analysis • Find/draw a func?onal diagram • Apply performance methods hbp://www.brendangregg.com/methodology.html 1. Workload Characteriza?on 2. Latency Analysis 3. USE Method • Start with the Q's, then find the A's • Use mul?-tools: – funccount, trace, argdist, stackcount e.g., storage I/O subsystem:slide 42:
Construct programma:c one-liners with trace e.g. reads over 20000 bytes: # trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3' TIME PID COMM FUNC 05:18:23 4490 sys_read read 1048576 bytes 05:18:23 4490 sys_read read 1048576 bytes 05:18:23 4490 sys_read read 1048576 bytes # trace -h [...] trace –K blk_account_io_start Trace this kernel function, and print info with a kernel stack trace trace 'do_sys_open "%s", arg2' Trace the open syscall and print the filename being opened trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3' Trace the read syscall and print a message for reads >gt;20000 bytes trace r::do_sys_return Trace the return from the open syscall trace 'c:open (arg2 == 42) "%s %d", arg1, arg2' Trace the open() call from libc only if the flags (arg2) argument is 42 [...] argdist by Sasha Goldshteinslide 43:
Create in-kernel summaries with argdist e.g. histogram of tcp_cleanup_rbuf() copied: # argdist -H 'p::tcp_cleanup_rbuf(struct sock *sk, int copied):int:copied' [15:34:45] copied : count distribution 0 ->gt; 1 : 15088 |********************************** 2 ->gt; 3 : 0 4 ->gt; 7 : 0 8 ->gt; 15 : 0 16 ->gt; 31 : 0 32 ->gt; 63 : 0 64 ->gt; 127 : 4786 |*********** 128 ->gt; 255 : 1 256 ->gt; 511 : 1 512 ->gt; 1023 : 4 1024 ->gt; 2047 : 11 2048 ->gt; 4095 : 5 4096 ->gt; 8191 : 27 8192 ->gt; 16383 : 105 16384 ->gt; 32767 : 0 argdist by Sasha Goldshteinslide 44:
Coming to a GUI near you BCC/BPF VISUALIZATIONSslide 45:
BPF metrics and analysis can be automated in GUIs Eg, NeNlix Vector (self-service UI): Flame Graphs Heat Maps Tracing Reports Should be open sourced; you may also build/buy your ownslide 46:
Latency heatmaps show histograms over :meslide 47:
Efficient CPU and off-CPU flame graphs by coun:ng stacks in kernel context CPU Off-CPUslide 48:
Advanced off-CPU analysis: BPF can merge the blocking stack with the waker stack in-kernel Waker task Waker stack Stack Direc?on Wokeup Blocked stack Blocked taskslide 49:
BPF FUTURE WORKslide 50:
BCC Improvements Challenges: • Ini?alize all variables • Extra bpf_probe_read()s • BPF_PERF_OUTPUT() • Verifier errorsslide 51:
Higher-level Language • bcc's Python/C interface is ok, but verbose • Alternate higher-level language front end? – New front-ends can use exis?ng libbcc, and can be added as part of bcc itself – Whave a problem in search of a new language (instead of the other way around)slide 52:
ply • A new BPF-based language and tracer for Linux – Created by Tobias Waldekranz – hbps://github.com/iovisor/ply hbps://wkz.github.io/ply/ • High-level language – Simple one-liners – Short scripts • In development (?) – kprobes and tracepoints only, uprobes/perf_events not yet – Successful so far as a proof of concept – Not produc?on tested yet (bcc is)slide 53:
File opens can be traced using a short ply one-liner # ply -c 'kprobe:do_sys_open { printf("opened: %s\n", mem(arg(1), "128s")); }' 1 probe active opened: /sys/kernel/debug/tracing/events/enable opened: /etc/ld.so.cache opened: /lib/x86_64-linux-gnu/libselinux.so.1 opened: /lib/x86_64-linux-gnu/libc.so.6 opened: /lib/x86_64-linux-gnu/libpcre.so.3 opened: /lib/x86_64-linux-gnu/libdl.so.2 opened: /lib/x86_64-linux-gnu/libpthread.so.0 opened: /proc/filesystems opened: /usr/lib/locale/locale-archive opened: . [...]slide 54:
ply programs are concise, such as measuring read latency # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); } kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]); @start[tid()] = nil; }' 2 probes active ^Cde-activating probes [...] @ns: [ 512, 1k) [ 1k, 2k) [ 2k, 4k) [ 4k, 8k) [ 8k, 16k) [ 16k, 32k) [ 32k, 64k) [ 64k, 128k) [128k, 256k) [256k, 512k) [512k, 1M) [...] 3 |######## 7 |################### 12 |################################| 3 |######## 2 |##### 0 | 0 | 3 |######## 1 |### 1 |### 2 |#####slide 55:
New Tooling/Metricsslide 56:
New Visualiza?onsslide 57:
Case Studies Use it Solve something Write about it Talk about itslide 58:
Take aways 1. Understand Linux tracing components 2. Understand the role and state of enhanced BPF 3. Discover opportuni?es for future development Please contribute: - hbps://github.com/ iovisor/bcc - hbps://github.com/ iovisor/ply BPF Tracing in Linux • 3.19: sockets • 3.19: maps • 4.1: kprobes • 4.3: uprobes • 4.4: BPF output • 4.6: stacks • 4.7: tracepoints • 4.9: profiling • 4.9: PMCsslide 59:
Links & References iovisor bcc: - hbps://github.com/iovisor/bcc hbps://github.com/iovisor/bcc/tree/master/docs - hbp://www.brendangregg.com/blog/ (search for "bcc") - hbp://www.brendangregg.com/ebpf.html#bcc - hbp://blogs.microsoR.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/ - On designing tracing tools: hbps://www.youtube.com/watch?v=uibLwoVKjec bcc tutorial: - hbps://github.com/iovisor/bcc/blob/master/INSTALL.md - …/docs/tutorial.md - …/docs/tutorial_bcc_python_developer.md - …/docs/reference_guide.md - .../CONTRIBUTING-SCRIPTS.md ply: hbps://github.com/iovisor/ply BPF: - hbps://www.kernel.org/doc/Documenta?on/networking/filter.txt - hbps://github.com/iovisor/bpf-docs - hbps://suchakra.wordpress.com/tag/bpf/ Flame Graphs: - hbp://www.brendangregg.com/flamegraphs.html - hbp://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html - hbp://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html NeNlix Tech Blog on Vector: - hbp://techblog.neNlix.com/2015/04/introducing-vector-neNlixs-on-host.html Linux Performance: hbp://www.brendangregg.com/linuxperf.htmlslide 60:
2017 USENIX Annual Technical Conference Thank You – Ques?ons? – iovisor bcc: hbps://github.com/iovisor/bcc – hbp://www.brendangregg.com – hbp://slideshare.net/brendangregg – bgregg@neNlix.com – @brendangregg Thanks to Alexei Starovoitov (Facebook), Brenden Blanco (PLUMgrid/VMware), Sasha Goldshtein (Sela), Daniel Borkmann (Cisco), Wang Nan (Huawei), and other BPF and bcc contributors!