Kernel Recipes 2017: Performance Analysis with BPF
Video: https://www.youtube.com/watch?v=nhxq6jLGc_wTalk by Brendan Gregg at Kernel Recipes 2017 (Paris)
Description: "The in-kernel Berkeley Packet Filter (BPF) has been enhanced in recent kernels to do much more than just filtering packets. It can now run user-defined programs on events, such as on tracepoints, kprobes, uprobes, and perf_events, allowing advanced performance analysis tools to be created. These can be used in production as the BPF virtual machine is sandboxed and will reject unsafe code, and are already in use at Netflix.
Beginning with the bpf() syscall in 3.18, enhancements have been added in many kernel versions since, with major features for BPF analysis landing in Linux 4.1, 4.4, 4.7, and 4.9. Specific capabilities these provide include custom in-kernel summaries of metrics, custom latency measurements, and frequency counting kernel and user stack traces on events. One interesting case involves saving stack traces on wake up events, and associating them with the blocked stack trace: so that we can see the blocking stack trace and the waker together, merged in kernel by a BPF program (that particular example is in the kernel as samples/bpf/offwaketime).
This talk will discuss the new BPF capabilities for performance analysis and debugging, and demonstrate the new open source tools that have been developed to use it, many of which are in the Linux Foundation iovisor bcc (BPF Compiler Collection) project. These include tools to analyze the CPU scheduler, TCP performance, file system performance, block I/O, and more."
next prev 1/61 | |
next prev 2/61 | |
next prev 3/61 | |
next prev 4/61 | |
next prev 5/61 | |
next prev 6/61 | |
next prev 7/61 | |
next prev 8/61 | |
next prev 9/61 | |
next prev 10/61 | |
next prev 11/61 | |
next prev 12/61 | |
next prev 13/61 | |
next prev 14/61 | |
next prev 15/61 | |
next prev 16/61 | |
next prev 17/61 | |
next prev 18/61 | |
next prev 19/61 | |
next prev 20/61 | |
next prev 21/61 | |
next prev 22/61 | |
next prev 23/61 | |
next prev 24/61 | |
next prev 25/61 | |
next prev 26/61 | |
next prev 27/61 | |
next prev 28/61 | |
next prev 29/61 | |
next prev 30/61 | |
next prev 31/61 | |
next prev 32/61 | |
next prev 33/61 | |
next prev 34/61 | |
next prev 35/61 | |
next prev 36/61 | |
next prev 37/61 | |
next prev 38/61 | |
next prev 39/61 | |
next prev 40/61 | |
next prev 41/61 | |
next prev 42/61 | |
next prev 43/61 | |
next prev 44/61 | |
next prev 45/61 | |
next prev 46/61 | |
next prev 47/61 | |
next prev 48/61 | |
next prev 49/61 | |
next prev 50/61 | |
next prev 51/61 | |
next prev 52/61 | |
next prev 53/61 | |
next prev 54/61 | |
next prev 55/61 | |
next prev 56/61 | |
next prev 57/61 | |
next prev 58/61 | |
next prev 59/61 | |
next prev 60/61 | |
next prev 61/61 |
PDF: KernelRecipes_BPF_Perf_Analysis.pdf
Keywords (from pdftotext):
slide 1:
Performance Analysis Superpowers with Linux BPF Brendan Gregg Senior Performance Architect Sep 2017slide 2:
DEMOslide 3:
slide 4:
bcc/BPF toolsslide 5:
Agenda 1. eBPF & bcc 2. bcc/BPF CLI Tools 3. bcc/BPF VisualizaConsslide 6:
Take aways 1. Understand Linux tracing and enhanced BPF 2. How to use eBPF tools 3. Areas of future developmentslide 7:
slide 8:
Who at NeRlix will use BPF?slide 9:
Introducing enhanced BPF for tracing: kernel-level soVware BPFslide 10:
Ye Olde BPF Berkeley Packet Filter # tcpdump host 127.0.0.1 and port 22 -d OpCmizes packet filter (000) ldh [12] performance (001) jeq #0x800 jt 2 jf 18 (002) ld [26] (003) jeq #0x7f000001 jt 6 jf 4 (004) ld [30] 2 x 32-bit registers (005) jeq #0x7f000001 jt 6 jf 18 (006) ldb [23] & scratch memory (007) jeq #0x84 jt 10 jf 8 (008) jeq #0x6 jt 10 jf 9 (009) jeq #0x11 jt 10 jf 18 User-defined bytecode (010) ldh [20] executed by an in-kernel (011) jset #0x1fff jt 18 jf 12 sandboxed virtual machine (012) ldxb 4*([14]&0xf) (013) ldh [x + 14] Steven McCanne and Van Jacobson, 1993 [...]slide 11:
Enhanced BPF aka eBPF or just "BPF" 10 x 64-bit registers maps (hashes) stack traces ac?ons Alexei Starovoitov, 2014+slide 12:
BPF for Tracing, Internals Observability Program BPF bytecode BPF program event config output per-event data staCsCcs Kernel load verifier staCc tracing tracepoints acach dynamic tracing BPF kprobes uprobes async copy sampling, PMCs maps perf_events Enhanced BPF is also now used for SDNs, DDOS miCgaCon, intrusion detecCon, container security, …slide 13:
Event Tracing Efficiency E.g., tracing TCP retransmits Kernel Old way: packet capture tcpdump Analyzer 1. read 2. dump buffer 1. read 2. process 3. print file system send receive disks New way: dynamic tracing Tracer 1. configure 2. read tcp_retransmit_skb()slide 14:
Linux Events & BPF Support BPF output Linux 4.4 Linux 4.7 BPF stacks Linux 4.6 Linux 4.3 Linux 4.1 (version BPF support arrived) Linux 4.9 Linux 4.9slide 15:
A Linux Tracing Timeline 1990’s: StaCc tracers, prototype dynamic tracers 2000: LTT + DProbes (dynamic tracing; not integrated) 2004: kprobes (2.6.9) 2005: DTrace (not Linux), SystemTap (out-of-tree) 2008: Vrace (2.6.27) 2009: perf_events (2.6.31) 2009: tracepoints (2.6.32) 2010-2017: Vrace & perf_events enhancements 2012: uprobes (3.5) 2014-2017: enhanced BPF patches: suppor?ng tracing events 2016-2017: Vrace hist triggers also: LTTng, ktap, sysdig, ...slide 16:
Introducing BPF Complier CollecCon: user-level front-end BCCslide 17:
bcc • BPF Compiler CollecCon Tracing layers: – hcps://github.com/iovisor/bcc – Lead developer: Brenden Blanco bcc tool • Includes tracing tools • Provides BPF front-ends: Python Lua C++ C helper libraries golang (gobpf) bcc tool bcc Python user kernel lua front-ends Kernel Events BPFslide 18:
Raw BPF samples/bpf/sock_example.c 87 lines truncatedslide 19:
C/BPF samples/bpf/tracex1_kern.c 58 lines truncatedslide 20:
bcc/BPF (C & Python) bcc examples/tracing/bitehist.py en?re programslide 21:
bpVrace hcps://github.com/ajor/bpVrace en?re programslide 22:
The Tracing Landscape, Sep 2017 (less brutal) (my opinion) dtrace4L. Ease of use sysdig (many) perf LTTng recent changes (alpha) (brutal) ktap (h i s t t rigge rs) Vrace (mature) bpVrace ply/BPF stap bcc/BPF C/BPF Stage of Development Raw BPF Scope & Capabilityslide 23:
Performance analysis BCC/BPF CLI TOOLSslide 24:
Pre-BPF: Linux Perf Analysis in 60s 1. uptime 2. dmesg -T | tail 3. vmstat 1 4. mpstat -P ALL 1 5. pidstat 1 6. iostat -xz 1 7. free -m 8. sar -n DEV 1 9. sar -n TCP,ETCP 1 10. top hcp://techblog.neRlix.com/2015/11/linux-performance-analysis-in-60s.htmlslide 25:
bcc InstallaCon • hcps://github.com/iovisor/bcc/blob/master/INSTALL.md • eg, Ubuntu Xenial: # echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" |\ sudo tee /etc/apt/sources.list.d/iovisor.list # sudo apt-get update # sudo apt-get install bcc-tools – Also available as an Ubuntu snap – Ubuntu 16.04 is good, 16.10 becer: more tools work • Installs many tools – In /usr/share/bcc/tools, and …/tools/old for older kernelsslide 26:
bcc General Performance Checklist execsnoop opensnoop ext4slower (…) biolatency biosnoop cachestat tcpconnect tcpaccept 9. tcpretrans 10. gethostlatency 11. runqlat 12. profileslide 27:
Discover short-lived process issues using execsnoop # execsnoop -t TIME(s) PCOMM dirname run run run run bash svstat perl grep sed cut xargs xargs xargs xargs echo [...] PID PPID RET ARGS 0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh 0 ./run -2 /command/bash -2 /usr/local/bin/bash -2 /usr/local/sbin/bash 0 /bin/bash 0 /command/svstat /service/nflx-httpd 0 /usr/bin/perl -e $l=slide 28:gt;;$l=~/(\d+) sec/;print $1||0; 0 /bin/ps --ppid 1 -o pid,cmd,args 0 /bin/grep org.apache.catalina 0 /bin/sed s/^ *//; 0 /usr/bin/cut -d -f 1 0 /usr/bin/xargs -2 /command/echo -2 /usr/local/bin/echo -2 /usr/local/sbin/echo 0 /bin/echo Efficient: only traces exec()
Discover short-lived process issues using execsnoop # execsnoop -t TIME(s) PCOMM dirname run run run run bash svstat perl grep sed cut xargs xargs xargs xargs echo [...] PID PPID RET ARGS 0 /usr/bin/dirname /apps/tomcat/bin/catalina.sh 0 ./run -2 /command/bash -2 /usr/local/bin/bash -2 /usr/local/sbin/bash 0 /bin/bash 0 /command/svstat /service/nflx-httpd 0 /usr/bin/perl -e $l=slide 29:gt;;$l=~/(\d+) sec/;print $1||0; 0 /bin/ps --ppid 1 -o pid,cmd,args 0 /bin/grep org.apache.catalina 0 /bin/sed s/^ *//; 0 /usr/bin/cut -d -f 1 0 /usr/bin/xargs -2 /command/echo -2 /usr/local/bin/echo -2 /usr/local/sbin/echo 0 /bin/echo Efficient: only traces exec()
Exonerate or confirm storage latency outliers with ext4slower # /usr/share/bcc/tools/ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB 17:31:42 postdrop 15523 S 0 17:31:42 cleanup 15524 S 0 17:32:09 titus-log-ship 19735 S 0 17:35:37 dhclient S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:45 postdrop 16187 S 0 17:35:45 cleanup 16188 S 0 […] LAT(ms) FILENAME 2.32 5630D406E4 1.89 57BB7406EC 1.94 slurper_checkpoint.db 3.32 dhclient.eth0.leases 26.62 system.journal 1.56 system.journal 1.73 system.journal 2.41 C0369406E4 6.52 C1B90406EC Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency Also: btrfsslower, xfsslower, zfsslowerslide 30:
Exonerate or confirm storage latency outliers with ext4slower # /usr/share/bcc/tools/ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB 17:31:42 postdrop 15523 S 0 17:31:42 cleanup 15524 S 0 17:32:09 titus-log-ship 19735 S 0 17:35:37 dhclient S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:39 systemd-journa 504 S 0 17:35:45 postdrop 16187 S 0 17:35:45 cleanup 16188 S 0 […] LAT(ms) FILENAME 2.32 5630D406E4 1.89 57BB7406EC 1.94 slurper_checkpoint.db 3.32 dhclient.eth0.leases 26.62 system.journal 1.56 system.journal 1.73 system.journal 2.41 C0369406E4 6.52 C1B90406EC Tracing at the file system is a more reliable and complete indicator than measuring disk I/O latency Also: btrfsslower, xfsslower, zfsslowerslide 31:
Iden?fy mul?modal disk I/O latency and outliers with biolatency # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 19:19:14 msecs 0 ->gt; 1 2 ->gt; 3 […] The "count" column is summarized in-kernel : count : 238 : 424 : 834 : 506 : 986 : 97 : 7 : 27 distribution |********* |***************** |********************************* |******************** |****************************************| |*** : count : 427 : 424 distribution |******************* |****************** Average latency (iostat/sar) may not be represenCCve with mulCple modes or outliersslide 32:
Iden?fy mul?modal disk I/O latency and outliers with biolatency # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 19:19:14 msecs 0 ->gt; 1 2 ->gt; 3 […] The "count" column is summarized in-kernel : count : 238 : 424 : 834 : 506 : 986 : 97 : 7 : 27 distribution |********* |***************** |********************************* |******************** |****************************************| |*** : count : 427 : 424 distribution |******************* |****************** Average latency (iostat/sar) may not be represenCCve with mulCple modes or outliersslide 33:
Efficiently trace TCP sessions with PID, bytes, and dura?on using tcplife # /usr/share/bcc/tools/tcplife PID COMM LADDR 2509 java 2509 java 2509 java 2509 java 2509 java 2509 java 12030 upload-mes 127.0.0.1 2509 java 12030 upload-mes 127.0.0.1 3964 mesos-slav 127.0.0.1 12021 upload-sys 127.0.0.1 2509 java 2235 dockerd 2235 dockerd [...] LPORT RADDR 8078 100.82.130.159 8078 100.82.78.215 60778 100.82.207.252 38884 100.82.208.178 4243 127.0.0.1 42166 127.0.0.1 34020 127.0.0.1 8078 127.0.0.1 21196 127.0.0.1 7101 127.0.0.1 34022 127.0.0.1 8078 127.0.0.1 13730 100.82.136.233 34314 100.82.64.53 RPORT TX_KB RX_KB MS 0 5.44 0 135.32 13 15126.87 0 15568.25 0 0.61 0 0.67 0 3.38 11 3.41 0 12.61 0 12.64 0 15.28 372 15.31 4 18.50 8 56.73 Dynamic tracing of TCP set state only; does not trace send/receive Also see: tcpconnect, tcpaccept, tcpretransslide 34:
Efficiently trace TCP sessions with PID, bytes, and dura?on using tcplife # /usr/share/bcc/tools/tcplife PID COMM LADDR 2509 java 2509 java 2509 java 2509 java 2509 java 2509 java 12030 upload-mes 127.0.0.1 2509 java 12030 upload-mes 127.0.0.1 3964 mesos-slav 127.0.0.1 12021 upload-sys 127.0.0.1 2509 java 2235 dockerd 2235 dockerd [...] LPORT RADDR 8078 100.82.130.159 8078 100.82.78.215 60778 100.82.207.252 38884 100.82.208.178 4243 127.0.0.1 42166 127.0.0.1 34020 127.0.0.1 8078 127.0.0.1 21196 127.0.0.1 7101 127.0.0.1 34022 127.0.0.1 8078 127.0.0.1 13730 100.82.136.233 34314 100.82.64.53 RPORT TX_KB RX_KB MS 0 5.44 0 135.32 13 15126.87 0 15568.25 0 0.61 0 0.67 0 3.38 11 3.41 0 12.61 0 12.64 0 15.28 372 15.31 4 18.50 8 56.73 Dynamic tracing of TCP set state only; does not trace send/receive Also see: tcpconnect, tcpaccept, tcpretransslide 35:
Iden?fy DNS latency issues system wide with gethostlatency # /usr/share/bcc/tools/gethostlatency TIME PID COMM 18:56:36 5055 mesos-slave 18:56:40 5590 java 18:56:51 5055 mesos-slave 18:56:53 30166 ncat 18:56:56 6661 java 18:56:59 5589 java 18:57:03 5370 java 18:57:03 30259 sudo 18:57:06 5055 mesos-slave 18:57:10 5590 java 18:57:21 5055 mesos-slave 18:57:29 5589 java 18:57:36 5055 mesos-slave 18:57:40 5590 java 18:57:51 5055 mesos-slave […] LATms HOST 0.01 100.82.166.217 3.53 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 0.21 localhost 2.19 atlas-alert-….prod.netflix.net 1.50 ec2-…-207.compute-1.amazonaws.com 0.04 localhost 0.07 titusagent-mainvpc-m…3465 0.01 100.82.166.217 3.10 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 52.36 ec2-…-207.compute-1.amazonaws.com 0.01 100.82.166.217 1.83 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.slide 36:
Iden?fy DNS latency issues system wide with gethostlatency # /usr/share/bcc/tools/gethostlatency TIME PID COMM 18:56:36 5055 mesos-slave 18:56:40 5590 java 18:56:51 5055 mesos-slave 18:56:53 30166 ncat 18:56:56 6661 java 18:56:59 5589 java 18:57:03 5370 java 18:57:03 30259 sudo 18:57:06 5055 mesos-slave 18:57:10 5590 java 18:57:21 5055 mesos-slave 18:57:29 5589 java 18:57:36 5055 mesos-slave 18:57:40 5590 java 18:57:51 5055 mesos-slave […] LATms HOST 0.01 100.82.166.217 3.53 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 0.21 localhost 2.19 atlas-alert-….prod.netflix.net 1.50 ec2-…-207.compute-1.amazonaws.com 0.04 localhost 0.07 titusagent-mainvpc-m…3465 0.01 100.82.166.217 3.10 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 52.36 ec2-…-207.compute-1.amazonaws.com 0.01 100.82.166.217 1.83 ec2-…-79.compute-1.amazonaws.com 0.01 100.82.166.217 Instruments using user-level dynamic tracing of getaddrinfo(), gethostbyname(), etc.slide 37:
Examine CPU scheduler latency as a histogram with runqlat # /usr/share/bcc/tools/runqlat 10 Tracing run queue latency... Hit Ctrl-C to end. usecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 256 ->gt; 511 512 ->gt; 1023 1024 ->gt; 2047 2048 ->gt; 4095 4096 ->gt; 8191 : count : 2810 : 5248 : 12369 : 71312 : 55705 : 11775 : 6230 : 2758 : 549 : 46 : 11 : 4 : 5 distribution |** |****** |****************************************| |******************************* |****** |*** […] As efficient as possible: scheduler calls can become frequentslide 38:
Examine CPU scheduler latency as a histogram with runqlat # /usr/share/bcc/tools/runqlat 10 Tracing run queue latency... Hit Ctrl-C to end. usecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 256 ->gt; 511 512 ->gt; 1023 1024 ->gt; 2047 2048 ->gt; 4095 4096 ->gt; 8191 : count : 2810 : 5248 : 12369 : 71312 : 55705 : 11775 : 6230 : 2758 : 549 : 46 : 11 : 4 : 5 distribution |** |****** |****************************************| |******************************* |****** |*** […] As efficient as possible: scheduler calls can become frequentslide 39:
Construct programma?c one-liners with trace e.g. reads over 20000 bytes: # trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3' TIME PID COMM FUNC 05:18:23 4490 sys_read read 1048576 bytes 05:18:23 4490 sys_read read 1048576 bytes 05:18:23 4490 sys_read read 1048576 bytes # trace -h [...] trace –K blk_account_io_start Trace this kernel function, and print info with a kernel stack trace trace 'do_sys_open "%s", arg2' Trace the open syscall and print the filename being opened trace 'sys_read (arg3 >gt; 20000) "read %d bytes", arg3' Trace the read syscall and print a message for reads >gt;20000 bytes trace r::do_sys_return Trace the return from the open syscall trace 'c:open (arg2 == 42) "%s %d", arg1, arg2' Trace the open() call from libc only if the flags (arg2) argument is 42 [...] argdist by Sasha Goldshteinslide 40:
Create in-kernel summaries with argdist e.g. histogram of tcp_cleanup_rbuf() copied: # argdist -H 'p::tcp_cleanup_rbuf(struct sock *sk, int copied):int:copied' [15:34:45] copied : count distribution 0 ->gt; 1 : 15088 |********************************** 2 ->gt; 3 : 0 4 ->gt; 7 : 0 8 ->gt; 15 : 0 16 ->gt; 31 : 0 32 ->gt; 63 : 0 64 ->gt; 127 : 4786 |*********** 128 ->gt; 255 : 1 256 ->gt; 511 : 1 512 ->gt; 1023 : 4 1024 ->gt; 2047 : 11 2048 ->gt; 4095 : 5 4096 ->gt; 8191 : 27 8192 ->gt; 16383 : 105 16384 ->gt; 32767 : 0 argdist by Sasha Goldshteinslide 41:
Coming to a GUI near you BCC/BPF VISUALIZATIONSslide 42:
BPF metrics and analysis can be automated in GUIs Eg, NeRlix Vector (self-service UI): Flame Graphs Heat Maps Tracing Reports Should be open sourced; you may also build/buy your ownslide 43:
Latency heatmaps show histograms over ?meslide 44:
Efficient on- and off-CPU flame graphs via kernel stack aggrega?on CPU Off-CPU via sampling via sched tracingslide 45:
Generic thread state digram Solve everything?slide 46:
Off-CPU Time (zoomed): gzip(1) Off-CPU doesn't always make sense: what is gzip blocked on?slide 47:
Wakeup ?me flame graphs show waker thread stacks gzip(1) is blocked on tar(1)! tar cf - * | gzip >gt; out.tar.gz Can't we associate off-CPU with wakeup stacks?slide 48:
Off-wake flame graphs: BPF can merge blocking plus waker stacks in-kernel Waker task Waker stack Stack DirecCon Wokeup Blocked stack Blocked taskslide 49:
Chain graphs: merge all wakeup stacksslide 50:
BPF FUTURE WORKslide 51:
BCC Improvements • Challenges IniCalize all variables BPF_PERF_OUTPUT() Verifier errors SCll explicit bpf_probe_read()s. It's getng becer (thanks): • High-Level Languages – One-liners and scripts – Can use libbcc tcpconnlat.pyslide 52:
ply • A new BPF-based language and tracer for Linux – Created by Tobias Waldekranz – hcps://github.com/iovisor/ply hcps://wkz.github.io/ply/ – Promising, was in development # ply -c 'kprobe:do_sys_open { printf("opened: %s\n", mem(arg(1), "128s")); }' 1 probe active opened: /sys/kernel/debug/tracing/events/enable opened: /etc/ld.so.cache opened: /lib/x86_64-linux-gnu/libselinux.so.1 opened: /lib/x86_64-linux-gnu/libc.so.6 opened: /proc/filesystems opened: /usr/lib/locale/locale-archive opened: . [...]slide 53:
ply programs are concise, such as measuring read latency # ply -A -c 'kprobe:SyS_read { @start[tid()] = nsecs(); } kretprobe:SyS_read /@start[tid()]/ { @ns.quantize(nsecs() - @start[tid()]); @start[tid()] = nil; }' 2 probes active ^Cde-activating probes [...] @ns: [ 512, 1k) [ 1k, 2k) [ 2k, 4k) [ 4k, 8k) [ 8k, 16k) [ 16k, 32k) [ 32k, 64k) [ 64k, 128k) [128k, 256k) [256k, 512k) [512k, 1M) [...] 3 |######## 7 |################### 12 |################################| 3 |######## 2 |##### 0 | 0 | 3 |######## 1 |### 1 |### 2 |#####slide 54:
bpVrace • Another new BPF-based language and tracer for Linux – Created by Alastair Robertson – hcps://github.com/ajor/bpVrace – In acCve development # bpftrace -e 'kprobe:sys_open { printf("opened: %s\n", str(arg0)); }' Attaching 1 probe... opened: /sys/devices/system/cpu/online opened: /proc/1956/stat opened: /proc/1241/stat opened: /proc/net/dev opened: /proc/net/if_inet6 opened: /sys/class/net/eth0/device/vendor opened: /proc/sys/net/ipv4/neigh/eth0/retrans_time_ms [...]slide 55:
bperace programs are concise, such as measuring read latency # bpftrace -e 'kprobe:SyS_read { @start[tid] = nsecs; } kretprobe:SyS_read /@start[tid]/ { @ns = quantize(nsecs - @start[tid]); @start[tid] = delete(); }' Attaching 2 probes... @ns: [0, 1] [2, 4) [4, 8) [8, 16) [16, 32) [32, 64) [64, 128) [128, 256) [256, 512) [512, 1k) [1k, 2k) [2k, 4k) [4k, 8k) [8k, 16k) [16k, 32k) [32k, 64k) 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 |@@@@@ 20 |@@@@@@@@@@@@@@@@@@@ 4 |@@@ 14 |@@@@@@@@@@@@@ 53 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 2 |@slide 56:
New Tooling/Metricsslide 57:
New VisualizaConsslide 58:
Case Studies Use it Solve something Write about it Talk about it • Recent posts: – hcps://blogs.dropbox.com/tech/2017/09/opCmizing-web-servers-for-high-throughput-andlow-latency/ – hcps://joseuacik.github.io/kernel/scheduler/bcc/bpf/2017/08/03/sched-Cme.htmlslide 59:
Take aways 1. Understand Linux tracing components 2. Understand the role and state of enhanced BPF 3. Discover opportuniCes for future development Please contribute: - hcps://github.com/ iovisor/bcc - hcps://github.com/ iovisor/ply BPF Tracing in Linux • 3.19: sockets • 3.19: maps • 4.1: kprobes • 4.3: uprobes • 4.4: BPF output • 4.6: stacks • 4.7: tracepoints • 4.9: profiling • 4.9: PMCsslide 60:
Links & References iovisor bcc: - hcps://github.com/iovisor/bcc hcps://github.com/iovisor/bcc/tree/master/docs - hcp://www.brendangregg.com/blog/ (search for "bcc") - hcp://www.brendangregg.com/ebpf.html#bcc - hcp://blogs.microsoV.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/ - On designing tracing tools: hcps://www.youtube.com/watch?v=uibLwoVKjec bcc tutorial: - hcps://github.com/iovisor/bcc/blob/master/INSTALL.md - …/docs/tutorial.md …/docs/tutorial_bcc_python_developer.md …/docs/reference_guide.md - .../CONTRIBUTING-SCRIPTS.md ply: hcps://github.com/iovisor/ply bpVrace: hcps://github.com/ajor/bpVrace BPF: - hcps://www.kernel.org/doc/DocumentaCon/networking/filter.txt - hcps://github.com/iovisor/bpf-docs - hcps://suchakra.wordpress.com/tag/bpf/ Dynamic tracing: Vp://Vp.cs.wisc.edu/paradyn/papers/Hollingsworth94Dynamic.pdf Flame Graphs: - hcp://www.brendangregg.com/flamegraphs.html - hcp://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html - hcp://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html NeRlix Tech Blog on Vector: - hcp://techblog.neRlix.com/2015/04/introducing-vector-neRlixs-on-host.html Linux Performance: hcp://www.brendangregg.com/linuxperf.htmlslide 61:
Thank You – QuesCons? – iovisor bcc: hcps://github.com/iovisor/bcc – hcp://www.brendangregg.com – hcp://slideshare.net/brendangregg – bgregg@neRlix.com – @brendangregg Thanks to Alexei Starovoitov (Facebook), Brenden Blanco (PLUMgrid/VMware), Sasha Goldshtein (Sela), Teng Qin (Facebook), Yonghong Song (Facebook), Daniel Borkmann (Cisco/Covalent), Wang Nan (Huawei), Vicent Mar| (GitHub), Paul Chaignon (Orange), and other BPF and bcc contributors!