SCALE17x: eBPF Perf Tools 2019
Video: https://youtu.be/P2hbiWTB2w4?t=158eBPF Performance Tools 2019, by Brendan Gregg for SCaLE17x. This talk includes a live demo of tracing Minecraft using eBPF (this demo is not in the slides).
next prev 1/39 | |
next prev 2/39 | |
next prev 3/39 | |
next prev 4/39 | |
next prev 5/39 | |
next prev 6/39 | |
next prev 7/39 | |
next prev 8/39 | |
next prev 9/39 | |
next prev 10/39 | |
next prev 11/39 | |
next prev 12/39 | |
next prev 13/39 | |
next prev 14/39 | |
next prev 15/39 | |
next prev 16/39 | |
next prev 17/39 | |
next prev 18/39 | |
next prev 19/39 | |
next prev 20/39 | |
next prev 21/39 | |
next prev 22/39 | |
next prev 23/39 | |
next prev 24/39 | |
next prev 25/39 | |
next prev 26/39 | |
next prev 27/39 | |
next prev 28/39 | |
next prev 29/39 | |
next prev 30/39 | |
next prev 31/39 | |
next prev 32/39 | |
next prev 33/39 | |
next prev 34/39 | |
next prev 35/39 | |
next prev 36/39 | |
next prev 37/39 | |
next prev 38/39 | |
next prev 39/39 |
PDF: SCALE2019_eBPF_Perf_Tools.pdf
Keywords (from pdftotext):
slide 1:
# biolatency.bt Attaching 3 probes... Tracing block device I/O... Hit Ctrl-C to end. eBPF Perf Tools 2019 @usecs: [256, 512) [512, 1K) [1K, 2K) [2K, 4K) [4K, 8K) [8K, 16K) [16K, 32K) [32K, 64K) [64K, 128K) [128K, 256K) SCaLE Mar 2019 2 | 10 |@ 426 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 230 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 9 |@ 128 |@@@@@@@@@@@@@@@ 68 |@@@@@@@@ 0 | 0 | 10 |@ Brendan Greggslide 2:
LIVE DEMO eBPF Minecraft Analysisslide 3:
Enhanced BPF Linux 4.* also known as just "BPF" User-Defined BPF Programs SDN Configuration DDoS Mitigation Kernel Runtime Event Targets verifier sockets Intrusion Detection Container Security kprobes BPF Observability Firewalls (bpfilter) Device Drivers uprobes tracepoints BPF actions perf_eventsslide 4:
eBPF bcc Linux 4.4+ https://github.com/iovisor/bccslide 5:
eBPF bpftrace (aka BPFtrace) Linux 4.9+ # Files opened by process bpftrace -e 't:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->gt;filename)) }' # Read size distribution by process bpftrace -e 't:syscalls:sys_exit_read { @[comm] = hist(args->gt;ret) }' # Count VFS calls bpftrace -e 'kprobe:vfs_* { @[func]++ }' # Show vfs_read latency as a histogram bpftrace -e 'k:vfs_read { @[tid] = nsecs } kr:vfs_read /@[tid]/ { @ns = hist(nsecs - @[tid]); delete(@tid) }’ # Trace user-level function Bpftrace -e 'uretprobe:bash:readline { printf(“%s\n”, str(retval)) }’ https://github.com/iovisor/bpftraceslide 6:
eBPF is solving new things: off-CPU + wakeup analysisslide 7:
Raw BPF samples/bpf/sock_example.c 87 lines truncatedslide 8:
C/BPF samples/bpf/tracex1_kern.c 58 lines truncatedslide 9:
bcc/BPF (C & Python) bcc examples/tracing/bitehist.py entire programslide 10:
bpftrace bpftrace -e 'kr:vfs_read { @ = hist(retval); }' https://github.com/iovisor/bpftrace entire programslide 11:
(brutal) Ease of use (less brutal) The Tracing Landscape, Mar 2019 (my opinion) (eBPF) (0.9) bpftrace ply/BPF sysdig (many) perf stap LTTng (hist recent changes (alpha) (mature) trigg e rs) ftrace Stage of Development bcc/BPF C/BPF Raw BPF Scope & Capabilityslide 12:
e.g., identify multimodal disk I/O latency and outliers with bcc/eBPF biolatency # biolatency -mT 10 Tracing block device I/O... Hit Ctrl-C to end. 19:19:04 msecs 0 ->gt; 1 2 ->gt; 3 4 ->gt; 7 8 ->gt; 15 16 ->gt; 31 32 ->gt; 63 64 ->gt; 127 128 ->gt; 255 19:19:14 msecs 0 ->gt; 1 2 ->gt; 3 […] : count : 238 : 424 : 834 : 506 : 986 : 97 : 7 : 27 distribution |********* |***************** |********************************* |******************** |****************************************| |*** : count : 427 : 424 distribution |******************* |******************slide 13:
bcc/eBPF programs can be laborious: biolatency # define BPF program bpf_text = """ #includeslide 14:gt; #include gt; typedef struct disk_key { char disk[DISK_NAME_LEN]; u64 slot; } disk_key_t; BPF_HASH(start, struct request *); STORAGE // time block I/O int trace_req_start(struct pt_regs *ctx, struct request *req) u64 ts = bpf_ktime_get_ns(); start.update(&req, &ts); return 0; // output int trace_req_completion(struct pt_regs *ctx, struct request *req) u64 *tsp, delta; // fetch timestamp and calculate delta tsp = start.lookup(&req); if (tsp == 0) { return 0; // missed issue delta = bpf_ktime_get_ns() - *tsp; FACTOR // store as histogram STORE start.delete(&req); return 0; """ # code substitutions if args.milliseconds: bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000000;') label = "msecs" else: bpf_text = bpf_text.replace('FACTOR', 'delta /= 1000;') label = "usecs" if args.disks: bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist, disk_key_t);') bpf_text = bpf_text.replace('STORE', 'disk_key_t key = {.slot = bpf_log2l(delta)}; ' + 'void *__tmp = (void *)req->gt;rq_disk->gt;disk_name; ' + 'bpf_probe_read(&key.disk, sizeof(key.disk), __tmp); ' + 'dist.increment(key);') else: bpf_text = bpf_text.replace('STORAGE', 'BPF_HISTOGRAM(dist);') bpf_text = bpf_text.replace('STORE', 'dist.increment(bpf_log2l(delta));') if debug or args.ebpf: print(bpf_text) if args.ebpf: exit() # load BPF program b = BPF(text=bpf_text) if args.queued: b.attach_kprobe(event="blk_account_io_start", fn_name="trace_req_start") else: b.attach_kprobe(event="blk_start_request", fn_name="trace_req_start") b.attach_kprobe(event="blk_mq_start_request", fn_name="trace_req_start") b.attach_kprobe(event="blk_account_io_completion", fn_name="trace_req_completion") print("Tracing block device I/O... Hit Ctrl-C to end.") # output exiting = 0 if args.interval else 1 dist = b.get_table("dist") while (1): try: sleep(int(args.interval)) except KeyboardInterrupt: exiting = 1 print() if args.timestamp: print("%-8s\n" % strftime("%H:%M:%S"), end="") dist.print_log2_hist(label, "disk") dist.clear() countdown -= 1 if exiting or countdown == 0: exit()
… rewritten in bpftrace (launched Oct 2018)! #!/usr/local/bin/bpftrace BEGIN printf("Tracing block device I/O... Hit Ctrl-C to end.\n"); kprobe:blk_account_io_start @start[arg0] = nsecs; kprobe:blk_account_io_completion /@start[arg0]/ @usecs = hist((nsecs - @start[arg0]) / 1000); delete(@start[arg0]);slide 15:
… rewritten in bpftrace # biolatency.bt Attaching 3 probes... Tracing block device I/O... Hit Ctrl-C to end. @usecs: [256, 512) [512, 1K) [1K, 2K) [2K, 4K) [4K, 8K) [8K, 16K) [16K, 32K) [32K, 64K) [64K, 128K) [128K, 256K) 2 | 10 |@ 426 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 230 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 9 |@ 128 |@@@@@@@@@@@@@@@ 68 |@@@@@@@@ 0 | 0 | 10 |@slide 16:
bcc canned complex tools, agents bpftrace one-liners, custom scriptsslide 17:
bccslide 18:
eBPF bcc Linux 4.4+ https://github.com/iovisor/bccslide 19:
bpftraceslide 20:
eBPF bpftrace Linux 4.9+ https://github.com/iovisor/bccslide 21:
bpftrace Development v0.80 Jan-2019 Dec 2016 Oct 2018 Major Features (v1) v0.90 Mar?2019 Minor Features (v1) v1.0 ?2019 Stable Docs API Stability Known Bug Fixes Packaging More Bug Fixesslide 22:
bpftrace Syntax bpftrace -e ‘k:do_nanosleep /pid >gt; 100/ { @[comm]++ }’ Probe Filter (optional) Actionslide 23:
Probesslide 24:
Probe Type Shortcuts tracepoint Kernel static tracepoints usdt User-level statically defined tracing kprobe Kernel function tracing kretprobe Kernel function returns uprobe User-level function tracing uretprobe User-level function returns profile Timed sampling across all CPUs interval Interval output software Kernel software events hardware Processor hardware eventsslide 25:
Filters /pid == 181/ ● /comm != “sshd”/ ● /@ts[tid]/slide 26:
Actions Per-event output printf() system() join() time() Map Summaries @ = count() or @++ @ = hist() The following is in the https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.mdslide 27:
Functions Log2 histogram hist(n) lhist(n, min, max, step) Linear hist. count() Count events sum(n) Sum value min(n) Minimum value printf(fmt, ...) Print formatted print(@x[, top[, div]]) Print map delete(@x) Delete map element clear(@x) Delete all keys/values reg(n) Register lookup join(a) Join string array max(n) Maximum value avg(n) Average value stats(n) Statistics time(fmt) Print formatted time str(s) String system(fmt) Run shell command sym(p) Resolve kernel addr exit() Quit bpftrace usym(p) Resolve user addr kaddr(n) Resolve kernel symbol uaddr(n) Resolve user symbolslide 28:
Variable Types Basic Variables @global @thread_local[tid] $scratch Associative Arrays @array[key] = value Buitins pidslide 29:
Builtin Variables pid Process ID (kernel tgid) arg0, arg1, ... Function arguments tid Thread ID (kernel pid) retval Return value cgroup Current Cgroup ID func Function name uid User ID probe Full name of the probe gid Group ID curtask Current task_struct (u64) nsecs Nanosecond timestamp rand cpu Processor ID comm Process name stack Kernel stack trace ustack User stack trace Random number (u32)slide 30:
biolatency (again) #!/usr/local/bin/bpftrace BEGIN printf("Tracing block device I/O... Hit Ctrl-C to end.\n"); kprobe:blk_account_io_start @start[arg0] = nsecs; kprobe:blk_account_io_completion /@start[arg0]/ @usecs = hist((nsecs - @start[arg0]) / 1000); delete(@start[arg0]);slide 31:
bpftrace Internalsslide 32:
Issues All major capabilities exist Many minor things https://github.com/iovisor/bpftrace/issuesslide 33:
Other Toolsslide 34:
Netflix Vector: BPF heat maps https://medium.com/netflix-techblog/extending-vector-with-ebpf-to-inspect-host-and-container-performance5da3af4c584bslide 35:
Anticipated Worldwide Audience BPF Tool Developers: – Raw BPF:slide 36:gt;200 – bpftrace: >gt;5,000 BPF Tool Users: – CLI tools (of any type): >gt;20,000 – GUIs (fronting any type): >gt;200,000
Other Tools cloudflare/ebpf_exporter kubectl-trace sysdig eBPF supportslide 37:
Take Aways Easily explore systems with bcc/bpftrace Contribute: see bcc/bpftrace issue list Share: posts, talksslide 38:
URLs - https://github.com/iovisor/bcc https://github.com/iovisor/bcc/blob/master/docs/tutorial.md https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md - https://github.com/iovisor/bpftrace https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.mdslide 39:
Thanks bpftrace Alastair Robertson (creator) Netflix: myself so for Sthima: Mary Marchini, Willian Gaspar Facebook: Jon Haslam, Dan Xu Augusto Mecking Caringi, Dale Hamel, ... eBPF/bcc Facebook: Alexei Starovoitov, Teng Qin, Yonghong Song, Martin Lau, Mark Drayton, … Netflix: myself VMware: Brenden Blanco Sasha Goldsthein, Paul Chaignon, ...