YOW! 2022: Visualizing Performance: The Developer's Guide to Flame Graphs
Talk by Brendan Gregg for YOW! 2022.Description: "Flame graphs are a visualization that helps developers easily find performance bottlenecks to cut computing costs and improve end-user experience. They can be used to answer many questions, including how software is consuming resources, especially CPUs, and how that consumption has changed since the last software version. Flame graphs are now a standard for CPU profiling and have been adopted in many programming languages and observability products, and are the basis for multiple startups. They were defined in "The Flame Graph" in the Communications of the ACM, by their creator, Brendan Gregg.
This talk covers the origins of flame graphs, how you can create them using open source software, and how to interpret them. In practice, flame graphs don't always work completely due to problems walking stack traces, resolving symbols, and other issues; this talk explains the problems and shows you the latest techniques for fixing them.
Flame graphs are a tool for a bigger mission: To understand the performance of everything, all software and hardware. Advanced types of flame graphs that help further this goal will be explained, including differential, off-CPU, memory, disk, and network events. Many of these advanced flame graph types require newer kernel technologies to make practical, especially extended BPF (eBPF), and will see adoption in the years ahead."
PDF: YOW2022_flame_graphs.pdf
Keywords (from pdftotext):
slide 1:
YOW! 2022 Visualizing Performance The Developer’s Guide to Flame Graphs Brendan Gregg Intel Fellow Dec 2022slide 2:
Statement from the heart I’d like to begin by acknowledging the Traditional Owners of this land and pay my respects to Elders past and present.slide 3:
My Dream To Completely Understand the Performance of Everythingslide 4:
Flame Graphs Kernel A visualization of software Can also visualize CPU and other resource usage Now a staple in performance engineering Java User-levelslide 5:
Agenda 1. Implementations 2. CPU Flame graphs 3. Stacks & Symbols 4. Advanced flame graphsslide 6:
Take Aways 1. Interpret CPU flame graphs 2. Understand runtime challenges 3. Why eBPF for advanced flame graphs A new tool to lower your cost, latency, and carbon Slides online: https://www.brendangregg.com/Slides/YOW2022_flame_graphs.pdfslide 7:
1. IMPLEMENTATIONSslide 8:
Quick Tour of Some Examples More examples in later “bonus slides” section. (Note: This is not an an endorsement of any company/product or sponsored by anyone.)slide 9:
My original flamegraph.pl (2011; using Perl/SVG/JavaScript) https://github.com/brendangregg/FlameGraphslide 10:
Martin Spier d3-flame-graph (my colleague at Netflix; 2015; D3) Source: https://github.com/spiermar/d3-flame-graph https://martinspier.io/slide 11:
Facebook: Strobelight (2014) Source: https://tracingsummit.org/ts/2014/files/TracingSummit2014-Tracing-at-Facebook-Scale.pdfslide 12:
Node.js: 0x (2016) Source: https://github.com/davidmarkclements/0x (David Mark Clements)slide 13:
Qt: Creator (2016) Source: https://www.qt.io/blog/2016/05/11/qt-creator-4-0-0-releasedslide 14:
Python: vprof (2016) Source: https://github.com/nvdv/vprof (Nick Volynets)slide 15:
Microsoft: WPA / ETW (2016) Source: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/graphs#flame_graphsslide 16:
LinkedIn: ODP (2017) Source: https://engineering.linkedin.com/blog/2017/01/odp--an-infrastructure-for-on-demand-serviceprofilingslide 17:
Oracle: Developer Studio Performance Analyzer (2017) Source: https://www.oracle.com/technetwork/server-storage/solarisstudio/documentation/ o11-151-perf-analyzer-brief-1405338.pdfslide 18:
Windows: PerfView (2017) Source: https://github.com/Microsoft/perfview/pull/440 (Adam Sitnik)slide 19:
Google: pprof (2017) Source: https://github.com/google/pprof/pull/188 (Martin Spier)slide 20:
Linux: hotspot (2017) Source: https://github.com/KDAB/hotspot (Milian Wolff)slide 21:
Eclipse Foundation: TraceCompass (2018) Source: https://www.eclipse.org/tracecompass/index.htmlslide 22:
Java: Java Mission Control (2018) Source: https://github.com/thegreystone/jmc-flame-view (Marcus Hirt)slide 23:
Netflix: FlameScope (2018) Source: https://netflixtechblog.com/netflix-flamescope-a57ca19d47bb (Brendan Gregg, Martin Spier)slide 24:
Netflix: FlameCommander (2019) Source: https://www.youtube.com/watch?v=L58GrWcrD00 (Martin Spier, Jason Koch, Susie Xia, Brendan Gregg)slide 25:
AMD: uProf (2019) Source: https://developer.amd.com/amd-uprof/?sf215410082=1slide 26:
Java: YourKit (2019) Source: https://www.yourkit.com/docs/java/help/cpu_flame_graph.jspslide 27:
Java: IntelliJ IDEA (2019) Source: https://blog.jetbrains.com/idea/2019/06/intellij-idea-2019-2-eap-4-profiling-tools-structural-searchpreview-and-more/slide 28:
Firefox: Profiler (2019) Source: https://profiler.firefox.comslide 29:
Linux: perf script flamegraph (2020) Source: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/ monitoring_and_managing_system_status_and_performance/getting-started-withflamegraphs_monitoring-and-managing-system-status-and-performance (Andreas Gerstmayr)slide 30:
MathWorks: MATLAB Profiler (2020) Source: https://www.mathworks.com/help/matlab/matlab_prog/profiling-for-improving-performance.htmlslide 31:
AWS: CodeGuru (2020) Source: https://aws.amazon.com/codeguru/features/slide 32:
Google: Cloud Profiler (2020) Source: https://cloud.google.com/profiler/docs/focusing-profilesslide 33:
Intel: vTune (2021) Source: https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/ user-interface-reference/window-flame-graph.htmlslide 34:
Splunk: AlwaysOn Profiling flame graph (2021) Source: https://docs.splunk.com/Observability/apm/profiling/using-the-flamegraph.htmlslide 35:
New Relic: flame graphs (2021) Source: https://docs.newrelic.com/whats-new/2021/07/whats-new-july-8-realtime-profiling-java/slide 36:
DataDog: profiling flame graph (2021) Source: https://www.datadoghq.com/knowledge-center/distributed-tracing/flame-graph/slide 37:
Granulate: gprofiler (2022; now Intel) Source: https://docs.gprofiler.io/about-gprofiler/gprofiler-features/views/flame-graphslide 38:
Microsoft Visual Studio: Flame Graph (2022) Source: https://learn.microsoft.com/en-us/visualstudio/profiling/flame-graphslide 39:
GrafanaLabs: Grafana flame graph (2022) Source: https://grafana.com/docs/grafana/next/panels-visualizations/visualizations/flame-graphslide 40:
Flame Graph Adoption Implementations: >gt;80 Related open source projects: >gt;400 Commercial product adoptions: >gt;30 New startups: 4 (so far) Startup exits: 1 (so far) Industry investment: >gt;AUD$1B End users: ? (a lot)slide 41:
2. CPU PROFILING An Introduction to Flame Graphsslide 42:
Stack Traces A code path snapshot. e.g., from jstack(1): $ jstack 1819 […] "main" prio=10 tid=0x00007ff304009000 nid=0x7361 runnable [0x00007ff30d4f9000] java.lang.Thread.State: RUNNABLE at Func_abc.func_c(Func_abc.java:6) at Func_abc.func_b(Func_abc.java:16) at Func_abc.func_a(Func_abc.java:23) at Func_abc.main(Func_abc.java:27) running parent g.parent g.g.parentslide 43:
CPU Profiling Record stacks at a timed interval Pros: Low (deterministic) overhead Cons: Coarse accuracy, but usually sufficient stack samples: syscall on-CPU time off-CPU block interruptslide 44:
Stack Depth Stack Samples g() g() g() f() f() f() d() d() d() d() e() d() c() i() c() c() c() i() c() b() h() b() b() b() h() b() a() a() a() a() a() a() a() Timeslide 45:
Stack Depth Stack Samples g() g() g() f() f() f() d() d() d() d() e() d() c() i() c() c() c() i() c() b() h() b() b() b() h() b() a() a() a() a() a() a() a() Timeslide 46:
Stack Depth Stack Samples g() g() g() f() f() f() d() d() d() d() e() d() c() i() c() c() c() i() c() b() h() b() b() b() h() b() a() a() a() a() a() a() a() Timeslide 47:
Example Profile (“hair graph”)slide 48:
Stack Depth Stack Samples: Merged e() g() g() f() f() d() d() d() c() i() c() i() c() b() h() b() h() b() a() Timeslide 49:
Example Profile: Mergedslide 50:
Stack Depth Alphabet Sort g() g() g() e() f() f() f() d() d() d() d() d() c() c() c() c() c() i() i() b() b() b() b() b() h() h() a() a() a() a() a() a() a() Alphabetslide 51:
Alphabet Merged (“Flame Graph”) Stack Depth g() e() f() d() c() i() b() h() a() Alphabetslide 52:
Example Profile: Flame Graphslide 53:
Example Profile: Flame Graph (with code hues)slide 54:
Replay 1/3: Time Columnsslide 55:
Replay 2/3: Time Merged (aka “Flame Chart”)slide 56:
Replay 3/3: Flame Graphslide 57:
Origin (2011): CPU Profiling # dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ { @[ustack()] = count(); } tick-60s { exit(0); }' [… over 500,000 lines truncated …] libc.so.1`__priocntlset+0xa libc.so.1`getparam+0x83 libc.so.1`pthread_getschedparam+0x3c libc.so.1`pthread_setschedprio+0x1f mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab mysqld`_Z10do_commandP3THD+0x198 mysqld`handle_one_connection+0x1a6 libc.so.1`_thrp_setup+0x8d libc.so.1`_lwp_start mysqld`_Z13add_to_statusP17system_status_varS0_+0x47 mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67 mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222 mysqld`_Z10do_commandP3THD+0x198 mysqld`handle_one_connection+0x1a6 libc.so.1`_thrp_setup+0x8d libc.so.1`_lwp_startslide 58:
Full outputslide 59:
… as a Flame Graphslide 60:
Linux example: perf Profiling # perf record -F 99 -ag -- sleep 30 [ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote 2.745 MB perf.data (~119930 samples) ] # perf report -n -stdio […] # Overhead Samples Command Shared Object Symbol # ........ ............ ....... ................. ............................. 20.42% bash [kernel.kallsyms] [k] xen_hypercall_xen_version --- xen_hypercall_xen_version check_events |--44.13%-- syscall_trace_enter tracesys |--35.58%-- __GI___libc_fcntl |--65.26%-- do_redirection_internal do_redirections execute_builtin_or_function execute_simple_command [… ~13,000 lines truncated …] call tree summaryslide 61:
Full perf Outputslide 62:
… as a Flame Graphslide 63:
Inspiration Neelakanth Nadgir’s function_call_graph.rb (2007): It was inspired by Roch Bourbonnais’s CallStackAnalyzer, which was inspired by Jan Boerhout’s vftrace. The x-axis is time, and it shows a complete function trace. Flame graphs are different: The x-axis is the population, and they can show function traces or stack samples. # more flamegraph.pl […] # This was inspired by Neelakanth Nadgir's excellent function_call_graph.rb # program, which visualized function entry and return trace events. As Neel # wrote: "The output displayed is inspired by Roch's CallStackAnalyzer which # was in turn inspired by the work on vftrace by Jan Boerhout". See: # https://blogs.oracle.com/realneel/entry/visualizing_callstacks_via_dtrace_and […] Image source: https://blogs.oracle.com/realneel/entry/visualizing_callstacks_via_dtrace_andslide 64:
Flame Graph Summary Visualizes a collection of stack traces – x-axis: population: e.g., alphabetical sort to maximize merging – y-axis: stack depth – color: random (default) or a dimension Original implementation: Perl + SVG + JavaScript https://github.com/brendangregg/FlameGraph Takes input from many different profilers References: http://www.brendangregg.com/flamegraphs.html http://queue.acm.org/detail.cfm?id=2927301 "The Flame Graph" CACM, June 2016slide 65:
Flame Graph Interpretation g() e() f() d() c() i() b() h() a()slide 66:
Flame Graph Interpretation (1/4) Top edge shows who is running on-CPU, and how much (width) g() e() f() d() c() i() b() h() a()slide 67:
Flame Graph Interpretation (2/4) Top-down shows ancestry e.g., from g(): g() e() f() d() c() i() b() h() a()slide 68:
Flame Graph Interpretation (3/4) Widths are proportional to presence in samples e.g., comparing b() to h() (incl. children) g() e() f() d() c() i() b() h() a()slide 69:
Flame Graph Interpretation (4/4) Colors randomized to differentiate frames Or used for code type; e.g.: Kernel Java JVM (C++) C green == JIT (e.g., Java) aqua == inlined red == user-level orange == kernel yellow == C++ magenta == search termslide 70:
CPU Flame Graph Tips & Tricks A) Check sample count (bottom frame): idle system? E.g., 49 Hertz x 30 sec x 16 CPUs == 23,520 samples at 100% CPU utilization.slide 71: Icicle Graph with Leaf Merge "hair" flamegraph.pl --inverted --reverse Reveals common functions called from many locationsslide 72:Flame Graph Interactivity Essentials: – Mouse-over for frame info (tool tips, status bar) – Click to zoom – Search (Ctrl-F or button) search button Nice to have: Merge control: root, leaf, middle Y-axis direction: flame or icicle Flame chart toggle Canned searches Collapse filters Code links search matches in magentaslide 73:Which way up? stack depth stack depth My original flamegraph.pl has --inverted for an “icicle graph” Either way is fine! Icicle layout helps avoid scrolling when starting at the top Let the end-user choose Source: https://learn.microsoft.com/en-us/visualstudio/profiling/flame-graphslide 74:Differential Flame Graphs Hues: Differential – red == more samples – blue == less samples Intensity: – Degree of difference Other examples – flamegraphdiff This spectrum can show other metrics, like CPI. more less Remember to show elided frames!slide 75:Poor Man’s Differential Flame Graphs Toggle between tabs in your browser - Like searching for Pluto! Or flip between slides - Cue exciting demo!slide 76:Poor Man’s Differential Flame Graphs Toggle between tabs in your browser - Like searching for Pluto! Or flip between slides - Cue exciting demo!slide 77:System CPU Profilers • Linux – perf_events (aka "perf") – bcc profile (eBPF-based) • Windows – XPerf, WPA (now has flame graphs!) • OS X – Instruments • And many others… Tip: use system profilers whenever possible. Runtime profilers (e.g., Java JVMTI-based) are user space and typically don't include kernel CPU time or kernel stacks.slide 78:Linux CPU Flame Graphs Linux 5.8+ via perf for simplicity (2020): perf script flamegraph -F 49 -a -- sleep 30 Generates flamegraph.html. One command! Thanks Andreas Gerstmayr. Linux 4.9+ via eBPF for efficiency (2016): apt-get install bpfcc-tools git clone https://github.com/brendangregg/FlameGraph profile-bcc.py -dF 49 30 | ./FlameGraph/flamegraph.pl >gt; perf.svg eBPF (no longer an acronym) is the name of an in-kernel execution environment, used in this case for aggregating stack samples in kernel context Most efficient: no perf.data file, summarizes in-kernel Some runtimes (e.g., JVM) require extra steps for stacks & symbols (next section)slide 79:Older Linux CPU Flame Graphs Linux 2.6+ via perf.data and perf script (2009): git clone https://github.com/brendangregg/FlameGraph; cd FlameGraph perf record -F 49 -a –g -- sleep 30 perf script | ./stackcollapse-perf.pl |./flamegraph.pl >gt; perf.svg Linux 4.5 can use folded output (2016): Skips the CPU-costly stackcollapse-perf.pl step; see: http://www.brendangregg.com/blog/2016-04-30/linux-perf-folded.htmlslide 80:Linux Profiling Optimizations Linux 2.6 Linux 4.5 capture stacks capture stacks count stacks (BPF) perf record perf record profile.py write samples perf.data reads samples perf script write text stackcollapse-perf.pl folded output flamegraph.pl Linux 4.9 write samples perf.data reads samples perf report –g folded folded output folded report awk folded output flamegraph.pl flamegraph.plslide 81:GUI Automation There are many options nowadays. I’ve worked on five: Netflix Vector (now retired!): Netflix FlameScope (covered later) Netflix FlameCommander (continuous profiling; not open source yet) I’m now helping with Intel vTune and Intel gProfiler Open source examples include Granulate gProfiler, Eclipse TraceCompass, Grafana flame graphs, Firefox profiler, and more (see implementation slides). Build your own!slide 82:Flame Charts (2013) Inspired by flame graphs: https://bugs.webkit.org/show_bug.cgi?id=111162slide 83:Chrome DevTools Flame Charts (2022)slide 84:Firefox Profiler Flame Graph (2022) flame graph flame chartslide 85:Flame Charts x-axis: time Flame Graphs x-axis: population alphabet sort or another frame merging algorithmslide 86:3. STACKS AND SYMBOLS And Other Issuesslide 87:Broken Stack Traces are Common # perf record –F 99 –a –g – sleep 30 # perf script […] java 4579 cpu-clock: 7f417908c10b [unknown] (/tmp/perf-4458.map) java 4579 cpu-clock: 7f41792fc65f [unknown] (/tmp/perf-4458.map) a2d53351ff7da603 [unknown] ([unknown]) […] should probably have more framesslide 88:… as a Flame Graph broken java stacks “grass”slide 89:Fixing Stack Walking A. Frame pointer-based Fix by disabling that compiler optimization: gcc's -fno-omit-frame-pointer Pros: simple, supported by many tools Cons: might cost a little extra CPU (usuallyslide 90: Fixing Java Stack Traces # perf script […] java 4579 cpu-clock: 7f417908c10b [unknown] (/tmp/… java 4579 cpu-clock: 7f41792fc65f [unknown] (/tmp/… a2d53351ff7da603 [unknown] ([unkn… […] I prototyped JVM frame pointers. Oracle rewrote it and included it in Java as -XX:+PreserveFramePointer (JDK 8 u60b19) # perf script […] java 8131 cpu-clock: 7fff76f2dce1 [unknown] ([vdso]) 7fd3173f7a93 os::javaTimeMillis() (/usr/lib/jvm… 7fd301861e46 [unknown] (/tmp/perf-8131.map) 7fd30184def8 [unknown] (/tmp/perf-8131.map) 7fd30174f544 [unknown] (/tmp/perf-8131.map) 7fd30175d3a8 [unknown] (/tmp/perf-8131.map) 7fd30166d51c [unknown] (/tmp/perf-8131.map) 7fd301750f34 [unknown] (/tmp/perf-8131.map) 7fd3016c2280 [unknown] (/tmp/perf-8131.map) 7fd301b02ec0 [unknown] (/tmp/perf-8131.map) 7fd3016f9888 [unknown] (/tmp/perf-8131.map) 7fd3016ece04 [unknown] (/tmp/perf-8131.map) 7fd30177783c [unknown] (/tmp/perf-8131.map) 7fd301600aa8 [unknown] (/tmp/perf-8131.map) 7fd301a4484c [unknown] (/tmp/perf-8131.map) 7fd3010072e0 [unknown] (/tmp/perf-8131.map) 7fd301007325 [unknown] (/tmp/perf-8131.map) 7fd301007325 [unknown] (/tmp/perf-8131.map) 7fd3010004e7 [unknown] (/tmp/perf-8131.map) 7fd3171df76a JavaCalls::call_helper(JavaValue*,… 7fd3171dce44 JavaCalls::call_virtual(JavaValue*… 7fd3171dd43a JavaCalls::call_virtual(JavaValue*… 7fd31721b6ce thread_entry(JavaThread*, Thread*)… 7fd3175389e0 JavaThread::thread_main_inner() (/… 7fd317538cb2 JavaThread::run() (/usr/lib/jvm/nf… 7fd3173f6f52 java_start(Thread*) (/usr/lib/jvm/… 7fd317a7e182 start_thread (/lib/x86_64-linux-gn…slide 91:Fixed Stacks Flame Graph Java stacks (but no symbols, yet)slide 92:Fixing Native Symbols A. Add a -dbgsym package, if available B. Recompile from sourceslide 93:Fixing JIT Symbols (Java, Node.js, …) Just-in-time runtimes don't have a pre-compiled symbol table So Linux perf looks for an externally provided symbol file # perf script Failed to open /tmp/perf-8131.map, continuing without symbols […] java 8131 cpu-clock: 7fff76f2dce1 [unknown] ([vdso]) 7fd3173f7a93 os::javaTimeMillis() (/usr/lib/jvm… 7fd301861e46 [unknown] (/tmp/perf-8131.map) […] This can be created by runtimes; e.g., Java's perf-map-agent Not the only solution; can also integrate with JIT-based walkers, or have an external symbol translator (perf script, or eBPF-based).slide 94:Fixed Symbols (zoom)slide 95:2014: Java Profiling (broken stacks) Java Profilers System Profilersslide 96:2018: Java Profiling (fixed stacks) Kernel Java JVM CPU Mixed-mode Flame Graphslide 97:Mixed-Mode Case Study Exception handling consuming CPUslide 98:Other Issues • JIT Symbol churn Take before and after snapshots, or use perf’s timestamped symbol logs. • Containers Are symbol files read from the right namespace? Should now work. • Stack Depth limits Linux perf had a 127 frame limit, now tunable. Thanks Arnaldo Carvalho de Melo! broken stacks A Java microservice with a stack depth of >gt; 900 perf_event_max_stack=1024slide 99:Inlining • Many frames may be missing (inlined) Flame graph may still make enough sense • Inlining can often be be tuned e.g. Java's -XX:-Inline to disable, but can be 80% slower Java's -XX:MaxInlineSize and -XX:InlineSmallCode can be tuned a little to reveal more frames: can even improve performance! • Runtimes can un-inline on demand So that exception stack traces make sense e.g. Java's perf-map-agent can un-inline (unfoldall option)slide 100:Language/Runtime Issues Each may have special stack/symbol instructions Java, Node.js, Python, Ruby, C++, Go, … See: https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html Check if flame graphs are already in the “official” profiler Try an Internet searchslide 101:4. ADVANCED FLAME GRAPHSslide 102:Flame graphs can visualize any stack trace collection On Linux, stacks from any of these events:slide 103:Page Faults Show what triggered main memory (resident) to grow: # perf record -e page-faults -p PID -g -- sleep 120 "fault" as (physical) main memory is allocated on-demand, when a virtual page is first populated Low overhead tool to solve some types of memory leak RES column in top(1) grows becauseslide 104:Other Memory Sources http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.htmlslide 105:Disk I/O Requests Shows who issued disk I/O (sync reads & writes): # perf record -e block:block_rq_insert -a -g -- sleep 60 GC? This JVM has swapped out!slide 106:Context Switches Show why Java blocked and stopped running on-CPU: Identifies locks, I/O, sleeps If code path shouldn't block and looks random, it's an involuntary context switch. I often filter these, but I’ve usually already solved this type of issue (CPU load) long before trying advanced flame graphs. E.g., analyzing framework differences: epoll sys_poll futex futex rxNetty Tomcatslide 107:TCP Events TCP transmit, using eBPF: # bpftrace -e 'kprobe:tcp_sendmsg { @[kstack, ustack] = count(); }' For eBPF, can cost noticeable overhead for high packet rates (test and measure) For perf, can have prohibitive overhead due to the trace, dump, post-process cycle Note that TCP receive is async, so stack traces are meaningless. Trace socket read instead. Can also trace TCP connect, accept Lower frequency, therefore lower overhead TCP sendsslide 108:CPU Cache Misses In this example, sampling via Last Level Cache loads: # perf record -e LLC-loads -c 10000 -a -g -- sleep 5; jmaps # perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso >gt; out.stacks -c is the count (samples once per count) Can also sample hits, misses, stalls Needs PEBS for IP accuracy Precise Event Based Sampling for Instruction Pointer accuracy. Not yet enabled in AWS EC2 VMs.slide 109:CPI Flame Graph Cycles Per Instruction (CPI) – red == instruction heavy – blue == cycle heavy (likely memory stall cycles) zoomed:slide 110:Off-CPU Analysis Off-CPU analysis is the study of blocking states, or the code-path (stack trace) that led to themslide 111:Off-CPU Time Flame Graph Off-CPU time More info http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.htmlslide 112:Off-CPU Time (zoomed): tar(1) directory read from disk Only showing kernel stacks in this example file read from disk fstat from disk path read from disk pipe writeslide 113:CPU + Off-CPU Flame Graphs: See Everything CPU Off-CPU Everything (All thread time)slide 114:Off-Wake Time Flame Graph Waker stack(s) Wokeup Blocked stack Uses Linux enhanced BPF to merge off-CPU and waker stack in kernel contextslide 115:Chain Graphs Waker stack(s) Walking the chain of wakeup stacks to reach root causeslide 116:FlameScope Flame graphs can hide time-based issues of variation and perturbations. FlameScope uses subsecond-offeset heat maps to show these issues. They can then be selected for the corresponding flame graph. https://brendangregg.com/blog/2018-12-15/flamescope-origin.html https://www.brendangregg.com/HeatMaps/subsecondoffset.htmlslide 117:FlameScope Example How many patterns can you see? https://www.brendangregg.com/blog/2018-11-08/flamescope-pattern-recognition.htmlslide 118:Agenda Recap 1. Implementations 2. CPU Flame graphs 3. Stacks & Symbols 4. Advanced flame graphsslide 119:Take Aways 1. Interpret CPU flame graphs 2. Understand runtime challenges 3. Why eBPF for advanced flame graphs A new tool to lower your cost, latency, and carbonslide 120:Links & References Flame Graphs Linux perf Linux eBPF "The Flame Graph" Communications of the ACM, Vol. 56, No. 6 (June 2016) http://queue.acm.org/detail.cfm?id=2927301 http://www.brendangregg.com/flamegraphs.html http://www.brendangregg.com/flamegraphs.html#Updates http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html http://techblog.netflix.com/2015/07/java-in-flames.html http://techblog.netflix.com/2016/04/saving-13-million-computational-minutes.html http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html http://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html https://brendangregg.com/blog/2018-12-15/flamescope-origin.html https://github.com/brendangregg/FlameGraph https://github.com/spiermar/d3-flame-graph https://github.com/Netflix/flamescope http://corpaul.github.io/flamegraphdiff/ https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/user-interface-reference/window-fla me-graph.html https://gprofiler.io/ https://perf.wiki.kernel.org/index.php/Main_Page http://www.brendangregg.com/perf.html https://ebpf.io/ https://www.brendangregg.com/ebpf.html These slides: https://www.brendangregg.com/Slides/YOW2022_flame_graphs.pdfslide 121:YOW! 2022 Thank you! http://www.brendangregg.com brendan@intel.com @brendangregg Questions?slide 122:BONUS SLIDESslide 123:More Implementations These are in addition to the earlier examples. (Note: This is not an an endorsement of any company/product or sponsored by anyone.)slide 124:Java: SPF4J (2012) Source: http://zolyfarkas.github.io/spf4j/#slide 125:OSX: Instruments (2012; converter) Source: https://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/slide 126:Ruby: mini-profiler (2013) Source: https://samsaffron.com/archive/2013/03/19/flame-graphs-in-ruby-miniprofilerslide 127:Julia: ProfileView.jl (2013) Source: https://github.com/timholy/ProfileView.jl (Tim Holy)slide 128:Windows: Xperf (2013; converter) Source: https://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/ (Bruce Dawson)slide 129:Perl: NYTProf (2013) Source: https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/ (Tim Bunce)slide 130:Erlang: Eflame (2013) Source: https://github.com/proger/eflame (Volodymyr Ky)slide 131:Ruby: ruby-prof-flamegraph (2014) Source: https://github.com/oozou/ruby-prof-flamegraphslide 132:Node.js: flamegraph (2015) Source: https://github.com/thlorenz/flamegraph (Thorsten Lorenz)slide 133:Haskell: ghc-prof-flamegraph (2015) Source: https://www.fpcomplete.com/blog/2015/04/ghc-prof-flamegraph/ (Francesco Mazzoli)slide 134:Differentials: Flamegraphdiff (2015) Source: http://corpaul.github.io/flamegraphdiff/ (Cor-Paul Bezemer)slide 135:Java: jfr-flame-graph (2015) Source: http://isuru-perera.blogspot.com/2015/05/flame-graphs-with-java-flight-recordings.html (M. Isuru Tharanga Chrishantha Perera)slide 136:Clojure: Flames (2015) Source: https://github.com/jstepien/flames/ (Jan Stępień)slide 137:Python: python-flamegraph (2015) Source: https://github.com/evanhempel/python-flamegraph (Evan Hempel)slide 138:Strongloop: Arc (2015) Source: https://es.slideshare.net/jguerrero999/nodejs-transaction-tracing-root-cause-analysis-withstrongloop-arcslide 139:Java: perfj (2015) Source: https://github.com/coderplay/perfj (Min Zhou)slide 140:Golang: Uber go-torch (2015) Source: https://github.com/uber-archive/go-torchslide 141:Intel: processor trace converter (2015) Source: http://halobates.de/blog/p/329 (Andi Kleen)slide 142:Nylas: perftools (2015) Source: https://www.nylas.com/blog/performance/ (code by Eben Freeman)slide 143:Django: djdt-flamegraph (2015) Source: https://github.com/blopker/djdt-flamegraph (Bo Lopker)slide 144:NodeSource: Nsolid (Node.js; 2015) Source: https://nodesource.com/blog/understanding-cpu-flame-graphsslide 145:D3: d3-flame-graphs (2015) Source: https://cimi.io/d3-flame-graphs/ (Alex Ciminian)slide 146:Golang: Goprofui (2015) Source: https://github.com/wirelessregistry/goprofui (Srdjan Marinovic, Julia Allyce)slide 147:Rust: flame (2016) Source: https://github.com/llogiq/flame (Ty Overby)slide 148:Dell Cloud Manager: Gumshoe Load Investigator (2016) "This haystack is looking more like a needle every minute" -- source: https://youtu.be/GGJFZfwXJ44?t=225 Source: https://github.com/worstcase/gumshoe (Jonathan Newbrough)slide 149:Uber: pyflame (Python; 2016) Source: https://www.uber.com/en-AU/blog/pyflame-python-profiler/slide 150:Android: erlang-atrace-flamegraphs (2017) Source: https://blog.rhye.org/post/android-profiling-flamegraphs/ (Ross Schlaikjer)slide 151:Java: grav (heap allocations; 2017) Source: https://epickrram.blogspot.com/2017/09/heap-allocation-flamegraphs.html (Mark Price)slide 152:Nudge: APM (for Java; 2017) Source: https://nudge-apm.com/features/#profilingslide 153:Java: clj-async-profiler (2017) Source: http://clojure-goes-fast.com/blog/profiling-tool-async-profiler/ (Alexander Yakushev)slide 154:.NET: codetrack (2017) Source: https://www.getcodetrack.com/slide 155:Node.js: Flamebearer (2018) Source: https://github.com/mapbox/flamebearer (Volodymyr Agafonkin)slide 156:Opsian: always-on flame graphs (2018) Source: https://www.opsian.com/blog/always-on-production-flame-graphs/slide 157:Speedscope: left heavy view (2018) Source: https://jamie-wong.com/post/speedscope/ (Jamie Wong)slide 158:AppDynamics: flame graph (2018; now Cisco) Source: https://docs.appdynamics.com/appd/20.x/en/application-monitoring/troubleshooting-applications/ event-loop-blocking-in-node-js#EventLoopBlockinginNode.js-FlameGraphslide 159:Inferno: flame graph (Rust port; 2019) Source: https://github.com/jonhoo/inferno (Jon Gjengset)slide 160:SAP: HANA Dump Analyzer (2019) Source: https://blogs.sap.com/2019/04/22/visualizing-olap-requests-on-sap-hana-system-with-concurrencyflame-graph-using-sap-hana-dump-analyzer/slide 161:Backtrace: flame graph (2019) Source: https://support.backtrace.io/hc/en-us/articles/360040515971-Flame-graphsslide 162:Instana: flame graph (2020; now IBM) Source: https://www.ibm.com/docs/en/instana-observability/current?topic=processes-analyzing-profilesslide 163:ej-technologies: JProfiler Flame Graph (for Java; 2020) Source: https://www.ej-technologies.com/resources/jprofiler/help/doc/main/cpu.htmlslide 164:Samsung: QA-Board (2020) Source: https://samsung.github.io/qaboard/blog/2020/06/24/flame-graphsslide 165:Microsoft Visual Studio: vscode-js-profile-flame (for JavaScript; 2020) Left Heavy view Source: https://marketplace.visualstudio.com/items?itemName=ms-vscode.vscode-js-profile-flameslide 166:Pyroscope: flame graph (2020) Source: https://pyroscope.io/blog/what-is-a-flamegraph/slide 167:Uber: pprof++ (2021; for Golang) Source: https://www.uber.com/en-AU/blog/pprof-go-profiler/ (Pengfei Su)slide 168:Lightstep: flame graph (2021) Source: https://www.instana.com/blog/instana-announces-the-industrys-first-commercial-continuousproduction-profiler/slide 169:Dynatrace: allocation flame graph (2021) Source: https://www.dynatrace.com/support/help/how-to-use-dynatrace/diagnostics/memory-profilingslide 170:Pixie Labs: pod performance flamegraph (2021) Source: https://docs.pixielabs.ai/tutorials/pixie-101/profiler/slide 171:Apache Flink: flame graphs (2021) off-CPU Source: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/flame_graphs/slide 172:Embrace: Application-Not-Responding flame graph (2021) Source: https://blog.embrace.io/solve-anrs-with-flame-graphs/slide 173:Polar Signals: parca Continuous Profiling (2021) Source: https://www.polarsignals.com/blog/posts/2022/08/30/optimizing-with-continuous-profiling/slide 174:Dockyard: Flame On (for Elixir apps; 2022) Source: https://dockyard.com/blog/2022/02/22/profiling-elixir-applications-with-flame-graphs-and-flame-on (Mike Binns)slide 175:OpenResty: Xray (2022) Source: https://openresty.com/en/xray (Yichun Zhang)slide 176:Elastic: universal profiling (2022) Source: https://www.elastic.co/observability/universal-profilingslide 177:… and more (Dec 2022) Thanks for all the open source contributions!