OSSNA 2017: Performance Analysis Superpowers with Linux BPF
Talk by Brendan Gregg for OSSNA 2017.Description: "Advanced performance observability and debugging have arrived built into the Linux 4.x series, thanks to enhancements to Berkeley Packet Filter (BPF, or eBPF) and the repurposing of its sandboxed virtual machine to provide programmatic capabilities to system tracing. Netflix has been investigating its use for new observability tools, monitoring, security uses, and more. This talk will be a dive deep on these new tracing, observability, and debugging capabilities, which sooner or later will be available to everyone who uses Linux. Whether you’re doing analysis over an ssh session, or via a monitoring GUI, BPF can be used to provide an efficient, custom, and deep level of detail into system and application performance.
This talk will also demonstrate the new open source tools that have been developed, which make use of kernel- and user-level dynamic tracing (kprobes and uprobes), and kernel- and user-level static tracing (tracepoints). These tools provide new insights for file system and storage performance, CPU scheduler performance, TCP performance, and a whole lot more. This is a major turning point for Linux systems engineering, as custom advanced performance instrumentation can be used safely in production environments, powering a new generation of tools and visualizations."
next prev 1/68 | |
next prev 2/68 | |
next prev 3/68 | |
next prev 4/68 | |
next prev 5/68 | |
next prev 6/68 | |
next prev 7/68 | |
next prev 8/68 | |
next prev 9/68 | |
next prev 10/68 | |
next prev 11/68 | |
next prev 12/68 | |
next prev 13/68 | |
next prev 14/68 | |
next prev 15/68 | |
next prev 16/68 | |
next prev 17/68 | |
next prev 18/68 | |
next prev 19/68 | |
next prev 20/68 | |
next prev 21/68 | |
next prev 22/68 | |
next prev 23/68 | |
next prev 24/68 | |
next prev 25/68 | |
next prev 26/68 | |
next prev 27/68 | |
next prev 28/68 | |
next prev 29/68 | |
next prev 30/68 | |
next prev 31/68 | |
next prev 32/68 | |
next prev 33/68 | |
next prev 34/68 | |
next prev 35/68 | |
next prev 36/68 | |
next prev 37/68 | |
next prev 38/68 | |
next prev 39/68 | |
next prev 40/68 | |
next prev 41/68 | |
next prev 42/68 | |
next prev 43/68 | |
next prev 44/68 | |
next prev 45/68 | |
next prev 46/68 | |
next prev 47/68 | |
next prev 48/68 | |
next prev 49/68 | |
next prev 50/68 | |
next prev 51/68 | |
next prev 52/68 | |
next prev 53/68 | |
next prev 54/68 | |
next prev 55/68 | |
next prev 56/68 | |
next prev 57/68 | |
next prev 58/68 | |
next prev 59/68 | |
next prev 60/68 | |
next prev 61/68 | |
next prev 62/68 | |
next prev 63/68 | |
next prev 64/68 | |
next prev 65/68 | |
next prev 66/68 | |
next prev 67/68 | |
next prev 68/68 |
PDF: OSS2017_BPF_superpowers.pdf
Keywords (from pdftotext):
slide 1:
Performance Analysis Superpowers with Linux BPF Brendan Gregg Sep 2017slide 2:
slide 3:
bcc/BPF toolsslide 4:
DEMOslide 5:
Agenda 1. eBPF & bcc 2. bcc/BPF CLI Tools 3. bcc/BPF Visualizationsslide 6:
Take aways 1. Understand Linux tracing and enhanced BPF 2. How to use BPF tools 3. Areas of future developmentslide 7:
slide 8:
Who at Ne/lix will use BPF?slide 9:
BPF Introducing enhanced BPF for tracing: kernel-level softwareslide 10:
Ye Olde BPF Berkeley Packet Filter # tcpdump host 127.0.0.1 and port 22 -d Optimizes packet filter (000) ldh [12] performance (001) jeq #0x800 jt 2 jf 18 (002) ld [26] (003) jeq #0x7f000001 jt 6 jf 4 (004) ld [30] 2 x 32-bit registers (005) jeq #0x7f000001 jt 6 jf 18 & scratch memory (006) ldb [23] (007) jeq #0x84 jt 10 jf 8 (008) jeq #0x6 jt 10 jf 9 (009) jeq #0x11 jt 10 jf 18 User-defined bytecode (010) ldh [20] executed by an in-kernel (011) jset #0x1fff jt 18 jf 12 sandboxed virtual machine (012) ldxb 4*([14]&0xf) (013) ldh [x + 14] Steven McCanne and Van Jacobson, 1993 [...]slide 11:
Enhanced BPF aka eBPF or just "BPF" 10 x 64-bit registers maps (hashes) actions Alexei Starovoitov, 2014+slide 12:
BPF for Tracing, Internals Observability Program BPF bytecode BPF program event config output per-event data statistics Kernel load verifier static tracing tracepoints attach dynamic tracing BPF kprobes uprobes async copy sampling, PMCs maps perf_events Enhanced BPF is also now used for SDNs, DDOS mitigation, intrusion detection, container security, …slide 13:
Dynamic Tracingslide 14:
1999: Kerninst http://www.paradyn.org/html/kerninst.htmlslide 15:
Event Tracing Efficiency E.g., tracing TCP retransmits Kernel Old way: packet capture tcpdump Analyzer 1. read 2. dump buffer 1. read 2. process 3. print file system send receive disks New way: dynamic tracing Tracer 1. configure 2. read tcp_retransmit_skb()slide 16:
Linux Events & BPF Support BPF output Linux 4.4 Linux 4.7 BPF stacks Linux 4.6 Linux 4.3 Linux 4.1 (version BPF support arrived) Linux 4.9 Linux 4.9slide 17:
A Linux Tracing Timeline 1990’s: Static tracers, prototype dynamic tracers 2000: LTT + DProbes (dynamic tracing; not integrated) 2004: kprobes (2.6.9) 2005: DTrace (not Linux), SystemTap (out-of-tree) 2008: ftrace (2.6.27) 2009: perf_events (2.6.31) 2009: tracepoints (2.6.32) 2010-2017: ftrace & perf_events enhancements 2012: uprobes (3.5) 2014-2017: enhanced BPF patches: supporting tracing events 2016-2017: ftrace hist triggers also: LTTng, ktap, sysdig, ...slide 18:
BCC Introducing BPF Complier Collection: user-level front-endslide 19:
bcc • BPF Compiler Collection Tracing layers: – https://github.com/iovisor/bcc – Lead developer: Brenden Blanco bcc tool • Includes tracing tools • Provides BPF front-ends: Python Lua C++ C helper libraries golang (gobpf) bcc tool bcc Python user kernel lua front-ends Kernel Events BPFslide 20:
Raw BPF samples/bpf/sock_example.c 87 lines truncatedslide 21:
C/BPF samples/bpf/tracex1_kern.c 58 lines truncatedslide 22:
bcc/BPF (C & Python) bcc examples/tracing/bitehist.py enBre programslide 23:
bpftrace hHps://github.com/ajor/bpJrace enBre programslide 24:
The Tracing Landscape, Sep 2017 Ease of use (less brutal) (my opinion) dtrace4L. sysdig (many) perf LTTng recent changes (alpha) (brutal) ktap (hist trigge rs) ftrace (mature) bpftrace ply/BPF stap bcc/BPF C/BPF Stage of Development Raw BPF Scope & Capabilityslide 25:
BCC/BPF CLI Tools Performance Analysisslide 26:
Pre-BPF: Linux Perf Analysis in 60s 1. uptime 2. dmesg -T | tail 3. vmstat 1 4. mpstat -P ALL 1 5. pidstat 1 6. iostat -xz 1 7. free -m 8. sar -n DEV 1 9. sar -n TCP,ETCP 1 10. top hHp://techblog.ne/lix.com/2015/11/linux-performance-analysis-in-60s.htmlslide 27:
bcc Installation • https://github.com/iovisor/bcc/blob/master/INSTALL.md • eg, Ubuntu Xenial: # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" |\ sudo tee /etc/apt/sources.list.d/iovisor.list # sudo apt-get update # sudo apt-get install bcc-tools – Also available as an Ubuntu snap – Ubuntu 16.04 is good, 16.10 better: more tools work • Installs many tools – In /usr/share/bcc/tools, and …/tools/old for older kernelsslide 28:
bcc General Performance Checklist execsnoop opensnoop ext4slower (…) biolatency biosnoop cachestat tcpconnect tcpaccept 9. tcpretrans 10. gethostlatency 11. runqlat 12. profileslide 29:
Discover short-lived process issues using execsnoop # execsnoop -t TIME(s) PCOMM dirname run run run run bash svstat perl grep sed cut xargs xargs xargs xargs echo [...] PID PPID RET ARGS 0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh 0 ./run -2 /command/bash -2 /usr/local/bin/bash -2 /usr/local/sbin/bash 0 /bin/bash 0 /command/svstat /service/nflx-httpd 0 /usr/bin/perl -e $l=slide 30:gt;;$l=~/(\d+) sec/;print $1||0; 0 /bin/ps --ppid 1 -o pid,cmd,args 0 /bin/grep org.apache.catalina 0 /bin/sed s/^ *//; 0 /usr/bin/cut -d -f 1 0 /usr/bin/xargs -2 /command/echo -2 /usr/local/bin/echo -2 /usr/local/sbin/echo 0 /bin/echo Efficient: only traces exec()
Discover short-lived process issues using execsnoop # execsnoop -t TIME(s) PCOMM dirname run run run run bash svstat perl grep sed cut xargs xargs xargs xargs echo [...] PID PPID RET ARGS 0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh 0 ./run -2 /command/bash -2 /usr/local/bin/bash -2 /usr/local/sbin/bash 0 /bin/bash 0 /command/svstat /service/nflx-httpd 0 /usr/bin/perl -e $l=slide 31:gt;;$l=~/(\d+) sec/;print $1||0; 0 /bin/ps --ppid 1 -o pid,cmd,args 0 /bin/grep org.apache.catalina 0 /bin/sed s/^ *//; 0 /usr/bin/cut -d -f 1 0 /usr/bin/xargs -2 /command/echo -2 /usr/local/bin/echo -2 /usr/local/sbin/echo 0 /bin/echo Efficient: only traces exec()
Exonerate or confirm storage latency outliers with ext4slower # /usr/share/bcc/tools/ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB 17:31:42 postdrop 15523 S 0 17:31:42 cleanup 15524 S 0 17:32:09 titus-log-ship 19735 S 0 17:35:37 dhclient S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:45 postdrop 16187 S 0 17:35:45 cleanup 16188 S 0 […] LAT(ms) FILENAME 2.32 5630D406E4 1.89 57BB7406EC 1.94 slurper_checkpoint.db 3.32 dhclient.eth0.leases 26.62 system.journal 1.56 system.journal 1.73 system.journal 2.41 C0369406E4 6.52 C1B90406EC Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency Also: btrfsslower, xfsslower, zfsslowerslide 32:
Exonerate or confirm storage latency outliers with ext4slower # /usr/share/bcc/tools/ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB 17:31:42 postdrop 15523 S 0 17:31:42 cleanup 15524 S 0 17:32:09 titus-log-ship 19735 S 0 17:35:37 dhclient S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:45 postdrop 16187 S 0 17:35:45 cleanup 16188 S 0 […] LAT(ms) FILENAME 2.32 5630D406E4 1.89 57BB7406EC 1.94 slurper_checkpoint.db 3.32 dhclient.eth0.leases 26.62 system.journal 1.56 system.journal 1.73 system.journal 2.41 C0369406E4 6.52 C1B90406EC Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency Also: btrfsslower, xfsslower, zfsslowerslide 33:
Identify multimodal disk I/O latency and outliers with biolatency # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 19:19:14 msecs 0 ->gt; 1 2 ->gt; 3 […] The "count" column is summarized in-kernel : count : 238 : 424 : 834 : 506 : 986 : 97 : 7 : 27 distribution |********* |***************** |********************************* |******************** |****************************************| |*** : count : 427 : 424 distribution |******************* |****************** Average latency (iostat/sar) may not be represen[[ve with mul[ple modes or outliersslide 34:
Identify multimodal disk I/O latency and outliers with biolatency # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 19:19:14 msecs 0 ->gt; 1 2 ->gt; 3 […] The "count" column is summarized in-kernel : count : 238 : 424 : 834 : 506 : 986 : 97 : 7 : 27 distribution |********* |***************** |********************************* |******************** |****************************************| |*** : count : 427 : 424 distribution |******************* |****************** Average latency (iostat/sar) may not be represen[[ve with mul[ple modes or outliersslide 35:
Efficiently trace TCP sessions with PID and bytes using tcplife # /usr/share/bcc/tools/tcplife PID COMM LADDR 2509 java 2509 java 2509 java 2509 java 2509 java 2509 java 12030 upload-mes 127.0.0.1 2509 java 12030 upload-mes 127.0.0.1 3964 mesos-slav 127.0.0.1 12021 upload-sys 127.0.0.1 2509 java 2235 dockerd 2235 dockerd [...] LPORT RADDR 8078 100.82.130.159 8078 100.82.78.215 60778 100.82.207.252 38884 100.82.208.178 4243 127.0.0.1 42166 127.0.0.1 34020 127.0.0.1 8078 127.0.0.1 21196 127.0.0.1 7101 127.0.0.1 34022 127.0.0.1 8078 127.0.0.1 13730 100.82.136.233 34314 100.82.64.53 RPORT TX_KB RX_KB MS 0 5.44 0 135.32 13 15126.87 0 15568.25 0 0.61 0 0.67 0 3.38 11 3.41 0 12.61 0 12.64 0 15.28 372 15.31 4 18.50 8 56.73 Dynamic tracing of TCP set state only; does not trace send/receive Also see: tcpconnect, tcpaccept, tcpretransslide 36:
Efficiently trace TCP sessions with PID and bytes using tcplife # /usr/share/bcc/tools/tcplife PID COMM LADDR 2509 java 2509 java 2509 java 2509 java 2509 java 2509 java 12030 upload-mes 127.0.0.1 2509 java 12030 upload-mes 127.0.0.1 3964 mesos-slav 127.0.0.1 12021 upload-sys 127.0.0.1 2509 java 2235 dockerd 2235 dockerd [...] LPORT RADDR 8078 100.82.130.159 8078 100.82.78.215 60778 100.82.207.252 38884 100.82.208.178 4243 127.0.0.1 42166 127.0.0.1 34020 127.0.0.1 8078 127.0.0.1 21196 127.0.0.1 7101 127.0.0.1 34022 127.0.0.1 8078 127.0.0.1 13730 100.82.136.233 34314 100.82.64.53 RPORT TX_KB RX_KB MS 0 5.44 0 135.32 13 15126.87 0 15568.25 0 0.61 0 0.67 0 3.38 11 3.41 0 12.61 0 12.64 0 15.28 372 15.31 4 18.50 8 56.73 Dynamic tracing of TCP set state only; does not trace send/receive Also see: tcpconnect, tcpaccept, tcpretransslide 37:
Identify DNS latency issues system wide with gethostlatency # /usr/share/bcc/tools/gethostlatency TIME PID COMM 18:56:36 5055 mesos-slave 18:56:40 5590 java 18:56:51 5055 mesos-slave 18:56:53 30166 ncat 18:56:56 6661 java 18:56:59 5589 java 18:57:03 5370 java 18:57:03 30259 sudo 18:57:06 5055 mesos-slave 18:57:10 5590 java 18:57:21 5055 mesos-slave 18:57:29 5589 java 18:57:36 5055 mesos-slave 18:57:40 5590 java 18:57:51 5055 mesos-slave […] LATms HOST 0.01 100.82.166.217 3.53 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 0.21 localhost 2.19 atlas-alert-….prod.netflix.net 1.50 ec2-…-207.compute-1.amazonaws.com 0.04 localhost 0.07 titusagent-mainvpc-m…3465 0.01 100.82.166.217 3.10 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 52.36 ec2-…-207.compute-1.amazonaws.com 0.01 100.82.166.217 1.83 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.slide 38:
Identify DNS latency issues system wide with gethostlatency # /usr/share/bcc/tools/gethostlatency TIME PID COMM 18:56:36 5055 mesos-slave 18:56:40 5590 java 18:56:51 5055 mesos-slave 18:56:53 30166 ncat 18:56:56 6661 java 18:56:59 5589 java 18:57:03 5370 java 18:57:03 30259 sudo 18:57:06 5055 mesos-slave 18:57:10 5590 java 18:57:21 5055 mesos-slave 18:57:29 5589 java 18:57:36 5055 mesos-slave 18:57:40 5590 java 18:57:51 5055 mesos-slave […] LATms HOST 0.01 100.82.166.217 3.53 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 0.21 localhost 2.19 atlas-alert-….prod.netflix.net 1.50 ec2-…-207.compute-1.amazonaws.com 0.04 localhost 0.07 titusagent-mainvpc-m…3465 0.01 100.82.166.217 3.10 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 52.36 ec2-…-207.compute-1.amazonaws.com 0.01 100.82.166.217 1.83 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.slide 39:
Examine CPU scheduler latency as a histogram with runqlat # /usr/share/bcc/tools/runqlat 10 Tracing run queue latency... Hit Ctrl-C to end. usecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 256 ->gt; 511 512 ->gt; 1023 1024 ->gt; 2047 2048 ->gt; 4095 4096 ->gt; 8191 : count : 2810 : 5248 : 12369 : 71312 : 55705 : 11775 : 6230 : 2758 : 549 : 46 : 11 : 4 : 5 distribution |** |****** |****************************************| |******************************* |****** |*** […] As efficient as possible: scheduler calls can become frequentslide 40:
Examine CPU scheduler latency as a histogram with runqlat # /usr/share/bcc/tools/runqlat 10 Tracing run queue latency... Hit Ctrl-C to end. usecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 256 ->gt; 511 512 ->gt; 1023 1024 ->gt; 2047 2048 ->gt; 4095 4096 ->gt; 8191 : count : 2810 : 5248 : 12369 : 71312 : 55705 : 11775 : 6230 : 2758 : 549 : 46 : 11 : 4 : 5 distribution |** |****** |****************************************| |******************************* |****** |*** […] As efficient as possible: scheduler calls can become frequentslide 41:
Construct programmatic one-liners with trace e.g. reads over 20000 bytes: # trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3' TIME PID COMM FUNC 05:18:23 4490 sys_read read 1048576 bytes 05:18:23 4490 sys_read read 1048576 bytes 05:18:23 4490 sys_read read 1048576 bytes # trace -h [...] trace –K blk_account_io_start Trace this kernel function, and print info with a kernel stack trace trace 'do_sys_open "%s", arg2' Trace the open syscall and print the filename being opened trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3' Trace the read syscall and print a message for reads >gt;20000 bytes trace r::do_sys_return Trace the return from the open syscall trace 'c:open (arg2 == 42) "%s %d", arg1, arg2' Trace the open() call from libc only if the flags (arg2) argument is 42 [...] argdist by Sasha Goldshteinslide 42:
Create in-kernel summaries with argdist e.g. histogram of tcp_cleanup_rbuf() copied: # argdist -H 'p::tcp_cleanup_rbuf(struct sock *sk, int copied):int:copied' [15:34:45] copied : count distribution 0 ->gt; 1 : 15088 |********************************** 2 ->gt; 3 : 0 4 ->gt; 7 : 0 8 ->gt; 15 : 0 16 ->gt; 31 : 0 32 ->gt; 63 : 0 64 ->gt; 127 : 4786 |*********** 128 ->gt; 255 : 1 256 ->gt; 511 : 1 512 ->gt; 1023 : 4 1024 ->gt; 2047 : 11 2048 ->gt; 4095 : 5 4096 ->gt; 8191 : 27 8192 ->gt; 16383 : 105 16384 ->gt; 32767 : 0 argdist by Sasha Goldshteinslide 43:
BCC/BPF Visualizations Coming to a GUI near youslide 44:
BPF metrics and analysis can be automated in GUIs Eg, Netflix Vector (self-service UI): Flame Graphs Heat Maps Tracing Reports Should be open sourced; you may also build/buy your ownslide 45:
Latency heatmaps show histograms over timeslide 46:
Optimize CPU flame graphs with BPF: count stacks in-kernelslide 47:
Generic thread state digram What about Off-CPU?slide 48:
Efficient Off-CPU flame graphs via scheduler tracing and BPF CPU Off-CPU Solve everything?slide 49:
Off-CPU Time (zoomed): gzip(1) Off-CPU doesn't always make sense: what is gzip blocked on?slide 50:
Wakeup time flame graphs show waker thread stacksslide 51:
Wakeup Time (zoomed): gzip(1) gzip(1) is blocked on tar(1)! tar cf - * | gzip >gt; out.tar.gz Can't we associate off-CPU with wakeup stacks?slide 52:
Off-wake flame graphs: BPF can merge blocking plus waker stacks in-kernel Waker task Waker stack Stack Direc[on Wokeup Blocked stack Blocked taskslide 53:
Another exampleslide 54:
Chain graphs: merge all wakeup stacksslide 55:
Future Work BPFslide 56:
BCC Improvements • Challenges Initialize all variables BPF_PERF_OUTPUT() Verifier errors Still explicit bpf_probe_read()s. It's getting better (thanks): • High-Level Languages – One-liners and scripts – Can use libbcc tcpconnlat.pyslide 57:
ply • A new BPF-based language and tracer for Linux – Created by Tobias Waldekranz – https://github.com/iovisor/ply https://wkz.github.io/ply/ – Promising, was in development # ply -c 'kprobe:do_sys_open { printf("opened: %s\n", mem(arg(1), "128s")); }' 1 probe active opened: /sys/kernel/debug/tracing/events/enable opened: /etc/ld.so.cache opened: /lib/x86_64-linux-gnu/libselinux.so.1 opened: /lib/x86_64-linux-gnu/libc.so.6 opened: /proc/filesystems opened: /usr/lib/locale/locale-archive opened: . [...]slide 58:
ply programs are concise, such as measuring read latency # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); } kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]); @start[tid()] = nil; }' 2 probes active ^Cde-activating probes [...] @ns: [ 512, 1k) [ 1k, 2k) [ 2k, 4k) [ 4k, 8k) [ 8k, 16k) [ 16k, 32k) [ 32k, 64k) [ 64k, 128k) [128k, 256k) [256k, 512k) [512k, 1M) [...] 3 |######## 7 |################### 12 |################################| 3 |######## 2 |##### 0 | 0 | 3 |######## 1 |### 1 |### 2 |#####slide 59:
bpftrace • Another new BPF-based language and tracer for Linux – Created by Alastair Robertson – https://github.com/ajor/bpftrace – In active development # bpftrace -e 'kprobe:sys_open { printf("opened: %s\n", str(arg0)); }' Attaching 1 probe... opened: /sys/devices/system/cpu/online opened: /proc/1956/stat opened: /proc/1241/stat opened: /proc/net/dev opened: /proc/net/if_inet6 opened: /sys/class/net/eth0/device/vendor opened: /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms [...]slide 60:
bpftrace programs are concise, such as measuring read latency # bpftrace -e 'kprobe:SyS_read { @start[tid] = nsecs; } kretprobe:SyS_read /@start[tid]/ { @ns = quantize(nsecs - @start[tid]); @start[tid] = delete(); }' Attaching 2 probes... @ns: [0, 1] [2, 4) [4, 8) [8, 16) [16, 32) [32, 64) [64, 128) [128, 256) [256, 512) [512, 1k) [1k, 2k) [2k, 4k) [4k, 8k) [8k, 16k) [16k, 32k) [32k, 64k) 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 |@@@@@ 20 |@@@@@@@@@@@@@@@@@@@ 4 |@@@ 14 |@@@@@@@@@@@@@ 53 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 2 |@slide 61:
New Tooling/Metricsslide 62:
New Visualizationsslide 63:
Case Studies Use it Solve something Write about it Talk about it • Recent posts: – https://blogs.dropbox.com/tech/2017/09/optimizing-web-servers-for-high-throughputand-low-latency/ – https://josefbacik.github.io/kernel/scheduler/bcc/bpf/2017/08/03/sched-time.htmlslide 64:
Advanced Analysis • Find/draw a functional diagram • Apply performance methods http://www.brendangregg.com/methodology.html Workload Characterization USE Method Latency Analysis Start with the Q's, then find the A's • Use multi-tools: – funccount, trace, argdist, stackcount e.g., storage I/O subsystemslide 65:
Take aways 1. Understand Linux tracing and enhanced BPF 2. How to use eBPF tools Upgrade to Linux 4.9+! 3. Areas of future development Please contribute: - hHps://github.com/ iovisor/bcc - hHps://github.com/ iovisor/ply BPF Tracing in Linux • 3.19: sockets • 3.19: maps • 4.1: kprobes • 4.3: uprobes • 4.4: BPF output • 4.6: stacks • 4.7: tracepoints • 4.9: profiling • 4.9: PMCsslide 66:
Links & References iovisor bcc: - https://github.com/iovisor/bcc https://github.com/iovisor/bcc/tree/master/docs - http://www.brendangregg.com/blog/ (search for "bcc") - http://www.brendangregg.com/ebpf.html#bcc - http://blogs.microsoft.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/ - On designing tracing tools: https://www.youtube.com/watch?v=uibLwoVKjec bcc tutorial: - https://github.com/iovisor/bcc/blob/master/INSTALL.md - …/docs/tutorial.md …/docs/tutorial_bcc_python_developer.md …/docs/reference_guide.md - .../CONTRIBUTING-SCRIPTS.md ply: https://github.com/iovisor/ply bpftrace: https://github.com/ajor/bpftrace BPF: - https://www.kernel.org/doc/Documentation/networking/filter.txt - https://github.com/iovisor/bpf-docs - https://suchakra.wordpress.com/tag/bpf/ Flame Graphs: - http://www.brendangregg.com/flamegraphs.html - http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html - http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html Netflix Tech Blog on Vector: - http://techblog.netflix.com/2015/04/introducing-vector-netflixs-on-host.html Linux Performance: http://www.brendangregg.com/linuxperf.htmlslide 67:
BPF @ Open Source Summit • Making the Kernel's Networking Data Path Programmable with BPF and XDP – Daniel Borkmann, Tuesday, 11:55am @ Georgia I/II • Performance Analysis Superpowers with Linux BPF – Brendan Gregg, this talk • Cilium - Container Security and Networking using BPF and XDP – Thomas Graf, Wednesday, 2:50pm @ Diamond Ballroom 6slide 68:
Thank You Questions? iovisor bcc: https://github.com/iovisor/bcc http://www.brendangregg.com http://slideshare.net/brendangregg bgregg@netflix.com @brendangregg Thanks to Alexei Starovoitov (Facebook), Brenden Blanco (PLUMgrid/VMware), Sasha Goldshtein (Sela), Teng Qin (Facebook), Yonghong Song (Facebook), Daniel Borkmann (Cisco/Covalent), Wang Nan (Huawei), Vicent Martí (GitHub), Paul Chaignon (Orange), and other BPF and bcc contributors!