Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

YOW! 2022: Visualizing Performance: The Developer's Guide to Flame Graphs

Talk by Brendan Gregg for YOW! 2022.

Description: "Flame graphs are a visualization that helps developers easily find performance bottlenecks to cut computing costs and improve end-user experience. They can be used to answer many questions, including how software is consuming resources, especially CPUs, and how that consumption has changed since the last software version. Flame graphs are now a standard for CPU profiling and have been adopted in many programming languages and observability products, and are the basis for multiple startups. They were defined in "The Flame Graph" in the Communications of the ACM, by their creator, Brendan Gregg.

This talk covers the origins of flame graphs, how you can create them using open source software, and how to interpret them. In practice, flame graphs don't always work completely due to problems walking stack traces, resolving symbols, and other issues; this talk explains the problems and shows you the latest techniques for fixing them.

Flame graphs are a tool for a bigger mission: To understand the performance of everything, all software and hardware. Advanced types of flame graphs that help further this goal will be explained, including differential, off-CPU, memory, disk, and network events. Many of these advanced flame graph types require newer kernel technologies to make practical, especially extended BPF (eBPF), and will see adoption in the years ahead."

next
prev
1/177
next
prev
2/177
next
prev
3/177
next
prev
4/177
next
prev
5/177
next
prev
6/177
next
prev
7/177
next
prev
8/177
next
prev
9/177
next
prev
10/177
next
prev
11/177
next
prev
12/177
next
prev
13/177
next
prev
14/177
next
prev
15/177
next
prev
16/177
next
prev
17/177
next
prev
18/177
next
prev
19/177
next
prev
20/177
next
prev
21/177
next
prev
22/177
next
prev
23/177
next
prev
24/177
next
prev
25/177
next
prev
26/177
next
prev
27/177
next
prev
28/177
next
prev
29/177
next
prev
30/177
next
prev
31/177
next
prev
32/177
next
prev
33/177
next
prev
34/177
next
prev
35/177
next
prev
36/177
next
prev
37/177
next
prev
38/177
next
prev
39/177
next
prev
40/177
next
prev
41/177
next
prev
42/177
next
prev
43/177
next
prev
44/177
next
prev
45/177
next
prev
46/177
next
prev
47/177
next
prev
48/177
next
prev
49/177
next
prev
50/177
next
prev
51/177
next
prev
52/177
next
prev
53/177
next
prev
54/177
next
prev
55/177
next
prev
56/177
next
prev
57/177
next
prev
58/177
next
prev
59/177
next
prev
60/177
next
prev
61/177
next
prev
62/177
next
prev
63/177
next
prev
64/177
next
prev
65/177
next
prev
66/177
next
prev
67/177
next
prev
68/177
next
prev
69/177
next
prev
70/177
next
prev
71/177
next
prev
72/177
next
prev
73/177
next
prev
74/177
next
prev
75/177
next
prev
76/177
next
prev
77/177
next
prev
78/177
next
prev
79/177
next
prev
80/177
next
prev
81/177
next
prev
82/177
next
prev
83/177
next
prev
84/177
next
prev
85/177
next
prev
86/177
next
prev
87/177
next
prev
88/177
next
prev
89/177
next
prev
90/177
next
prev
91/177
next
prev
92/177
next
prev
93/177
next
prev
94/177
next
prev
95/177
next
prev
96/177
next
prev
97/177
next
prev
98/177
next
prev
99/177
next
prev
100/177
next
prev
101/177
next
prev
102/177
next
prev
103/177
next
prev
104/177
next
prev
105/177
next
prev
106/177
next
prev
107/177
next
prev
108/177
next
prev
109/177
next
prev
110/177
next
prev
111/177
next
prev
112/177
next
prev
113/177
next
prev
114/177
next
prev
115/177
next
prev
116/177
next
prev
117/177
next
prev
118/177
next
prev
119/177
next
prev
120/177
next
prev
121/177
next
prev
122/177
next
prev
123/177
next
prev
124/177
next
prev
125/177
next
prev
126/177
next
prev
127/177
next
prev
128/177
next
prev
129/177
next
prev
130/177
next
prev
131/177
next
prev
132/177
next
prev
133/177
next
prev
134/177
next
prev
135/177
next
prev
136/177
next
prev
137/177
next
prev
138/177
next
prev
139/177
next
prev
140/177
next
prev
141/177
next
prev
142/177
next
prev
143/177
next
prev
144/177
next
prev
145/177
next
prev
146/177
next
prev
147/177
next
prev
148/177
next
prev
149/177
next
prev
150/177
next
prev
151/177
next
prev
152/177
next
prev
153/177
next
prev
154/177
next
prev
155/177
next
prev
156/177
next
prev
157/177
next
prev
158/177
next
prev
159/177
next
prev
160/177
next
prev
161/177
next
prev
162/177
next
prev
163/177
next
prev
164/177
next
prev
165/177
next
prev
166/177
next
prev
167/177
next
prev
168/177
next
prev
169/177
next
prev
170/177
next
prev
171/177
next
prev
172/177
next
prev
173/177
next
prev
174/177
next
prev
175/177
next
prev
176/177
next
prev
177/177

PDF: YOW2022_flame_graphs.pdf

Keywords (from pdftotext):

slide 1:
    YOW! 2022
    Visualizing Performance
    The Developer’s Guide to Flame Graphs
    Brendan Gregg
    Intel Fellow
    Dec 2022
    
slide 2:
    Statement from the heart
    I’d like to begin by acknowledging the Traditional Owners of this land and pay my
    respects to Elders past and present.
    
slide 3:
    My Dream
    To Completely Understand
    the Performance of Everything
    
slide 4:
    Flame Graphs
    Kernel
    A visualization of software
    Can also visualize CPU and
    other resource usage
    Now a staple in performance
    engineering
    Java
    User-level
    
slide 5:
    Agenda
    1. Implementations
    2. CPU Flame graphs
    3. Stacks & Symbols
    4. Advanced flame graphs
    
slide 6:
    Take Aways
    1. Interpret CPU flame graphs
    2. Understand runtime challenges
    3. Why eBPF for advanced flame graphs
    A new tool to lower your cost, latency, and carbon
    Slides online:
    https://www.brendangregg.com/Slides/YOW2022_flame_graphs.pdf
    
slide 7:
    1. IMPLEMENTATIONS
    
slide 8:
    Quick Tour of Some Examples
    More examples in later “bonus slides” section.
    (Note: This is not an an endorsement of any company/product or sponsored by anyone.)
    
slide 9:
    My original flamegraph.pl (2011; using Perl/SVG/JavaScript)
    https://github.com/brendangregg/FlameGraph
    
slide 10:
    Martin Spier d3-flame-graph (my colleague at Netflix; 2015; D3)
    Source: https://github.com/spiermar/d3-flame-graph https://martinspier.io/
    
slide 11:
    Facebook: Strobelight (2014)
    Source: https://tracingsummit.org/ts/2014/files/TracingSummit2014-Tracing-at-Facebook-Scale.pdf
    
slide 12:
    Node.js: 0x (2016)
    Source: https://github.com/davidmarkclements/0x (David Mark Clements)
    
slide 13:
    Qt: Creator (2016)
    Source: https://www.qt.io/blog/2016/05/11/qt-creator-4-0-0-released
    
slide 14:
    Python: vprof (2016)
    Source: https://github.com/nvdv/vprof (Nick Volynets)
    
slide 15:
    Microsoft: WPA / ETW (2016)
    Source: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/graphs#flame_graphs
    
slide 16:
    LinkedIn: ODP (2017)
    Source: https://engineering.linkedin.com/blog/2017/01/odp--an-infrastructure-for-on-demand-serviceprofiling
    
slide 17:
    Oracle: Developer Studio Performance Analyzer (2017)
    Source: https://www.oracle.com/technetwork/server-storage/solarisstudio/documentation/
    o11-151-perf-analyzer-brief-1405338.pdf
    
slide 18:
    Windows: PerfView (2017)
    Source: https://github.com/Microsoft/perfview/pull/440 (Adam Sitnik)
    
slide 19:
    Google: pprof (2017)
    Source: https://github.com/google/pprof/pull/188 (Martin Spier)
    
slide 20:
    Linux: hotspot (2017)
    Source: https://github.com/KDAB/hotspot (Milian Wolff)
    
slide 21:
    Eclipse Foundation: TraceCompass (2018)
    Source: https://www.eclipse.org/tracecompass/index.html
    
slide 22:
    Java: Java Mission Control (2018)
    Source: https://github.com/thegreystone/jmc-flame-view (Marcus Hirt)
    
slide 23:
    Netflix: FlameScope (2018)
    Source: https://netflixtechblog.com/netflix-flamescope-a57ca19d47bb (Brendan Gregg, Martin Spier)
    
slide 24:
    Netflix: FlameCommander (2019)
    Source: https://www.youtube.com/watch?v=L58GrWcrD00 (Martin Spier, Jason Koch, Susie Xia, Brendan Gregg)
    
slide 25:
    AMD: uProf (2019)
    Source: https://developer.amd.com/amd-uprof/?sf215410082=1
    
slide 26:
    Java: YourKit (2019)
    Source: https://www.yourkit.com/docs/java/help/cpu_flame_graph.jsp
    
slide 27:
    Java: IntelliJ IDEA (2019)
    Source: https://blog.jetbrains.com/idea/2019/06/intellij-idea-2019-2-eap-4-profiling-tools-structural-searchpreview-and-more/
    
slide 28:
    Firefox: Profiler (2019)
    Source: https://profiler.firefox.com
    
slide 29:
    Linux: perf script flamegraph (2020)
    Source: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/
    monitoring_and_managing_system_status_and_performance/getting-started-withflamegraphs_monitoring-and-managing-system-status-and-performance (Andreas Gerstmayr)
    
slide 30:
    MathWorks: MATLAB Profiler (2020)
    Source: https://www.mathworks.com/help/matlab/matlab_prog/profiling-for-improving-performance.html
    
slide 31:
    AWS: CodeGuru (2020)
    Source: https://aws.amazon.com/codeguru/features/
    
slide 32:
    Google: Cloud Profiler (2020)
    Source: https://cloud.google.com/profiler/docs/focusing-profiles
    
slide 33:
    Intel: vTune (2021)
    Source: https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/
    user-interface-reference/window-flame-graph.html
    
slide 34:
    Splunk: AlwaysOn Profiling flame graph (2021)
    Source: https://docs.splunk.com/Observability/apm/profiling/using-the-flamegraph.html
    
slide 35:
    New Relic: flame graphs (2021)
    Source: https://docs.newrelic.com/whats-new/2021/07/whats-new-july-8-realtime-profiling-java/
    
slide 36:
    DataDog: profiling flame graph (2021)
    Source: https://www.datadoghq.com/knowledge-center/distributed-tracing/flame-graph/
    
slide 37:
    Granulate: gprofiler (2022; now Intel)
    Source: https://docs.gprofiler.io/about-gprofiler/gprofiler-features/views/flame-graph
    
slide 38:
    Microsoft Visual Studio: Flame Graph (2022)
    Source: https://learn.microsoft.com/en-us/visualstudio/profiling/flame-graph
    
slide 39:
    GrafanaLabs: Grafana flame graph (2022)
    Source: https://grafana.com/docs/grafana/next/panels-visualizations/visualizations/flame-graph
    
slide 40:
    Flame Graph Adoption
    Implementations: >gt;80
    Related open source projects: >gt;400
    Commercial product adoptions: >gt;30
    New startups: 4 (so far)
    Startup exits: 1 (so far)
    Industry investment: >gt;AUD$1B
    End users: ? (a lot)
    
slide 41:
    2. CPU PROFILING
    An Introduction to Flame Graphs
    
slide 42:
    Stack Traces
    A code path snapshot. e.g., from jstack(1):
    $ jstack 1819
    […]
    "main" prio=10 tid=0x00007ff304009000
    nid=0x7361 runnable [0x00007ff30d4f9000]
    java.lang.Thread.State: RUNNABLE
    at Func_abc.func_c(Func_abc.java:6)
    at Func_abc.func_b(Func_abc.java:16)
    at Func_abc.func_a(Func_abc.java:23)
    at Func_abc.main(Func_abc.java:27)
    running
    parent
    g.parent
    g.g.parent
    
slide 43:
    CPU Profiling
    Record stacks at a timed interval
    Pros: Low (deterministic) overhead
    Cons: Coarse accuracy, but usually sufficient
    stack
    samples:
    syscall
    on-CPU
    time
    off-CPU
    block
    interrupt
    
slide 44:
    Stack Depth
    Stack Samples
    g()
    g()
    g()
    f()
    f()
    f()
    d()
    d()
    d()
    d()
    e()
    d()
    c()
    i()
    c()
    c()
    c()
    i()
    c()
    b()
    h()
    b()
    b()
    b()
    h()
    b()
    a()
    a()
    a()
    a()
    a()
    a()
    a()
    Time
    
slide 45:
    Stack Depth
    Stack Samples
    g()
    g()
    g()
    f()
    f()
    f()
    d()
    d()
    d()
    d()
    e()
    d()
    c()
    i()
    c()
    c()
    c()
    i()
    c()
    b()
    h()
    b()
    b()
    b()
    h()
    b()
    a()
    a()
    a()
    a()
    a()
    a()
    a()
    Time
    
slide 46:
    Stack Depth
    Stack Samples
    g()
    g()
    g()
    f()
    f()
    f()
    d()
    d()
    d()
    d()
    e()
    d()
    c()
    i()
    c()
    c()
    c()
    i()
    c()
    b()
    h()
    b()
    b()
    b()
    h()
    b()
    a()
    a()
    a()
    a()
    a()
    a()
    a()
    Time
    
slide 47:
    Example Profile (“hair graph”)
    
slide 48:
    Stack Depth
    Stack Samples: Merged
    e()
    g()
    g()
    f()
    f()
    d()
    d()
    d()
    c()
    i()
    c()
    i()
    c()
    b()
    h()
    b()
    h()
    b()
    a()
    Time
    
slide 49:
    Example Profile: Merged
    
slide 50:
    Stack Depth
    Alphabet Sort
    g()
    g()
    g()
    e()
    f()
    f()
    f()
    d()
    d()
    d()
    d()
    d()
    c()
    c()
    c()
    c()
    c()
    i()
    i()
    b()
    b()
    b()
    b()
    b()
    h()
    h()
    a()
    a()
    a()
    a()
    a()
    a()
    a()
    Alphabet
    
slide 51:
    Alphabet Merged (“Flame Graph”)
    Stack Depth
    g()
    e()
    f()
    d()
    c()
    i()
    b()
    h()
    a()
    Alphabet
    
slide 52:
    Example Profile: Flame Graph
    
slide 53:
    Example Profile: Flame Graph (with code hues)
    
slide 54:
    Replay 1/3: Time Columns
    
slide 55:
    Replay 2/3: Time Merged (aka “Flame Chart”)
    
slide 56:
    Replay 3/3: Flame Graph
    
slide 57:
    Origin (2011): CPU Profiling
    # dtrace -x ustackframes=100 -n 'profile-997 /execname == "mysqld"/ {
    @[ustack()] = count(); } tick-60s { exit(0); }'
    [… over 500,000 lines truncated …]
    libc.so.1`__priocntlset+0xa
    libc.so.1`getparam+0x83
    libc.so.1`pthread_getschedparam+0x3c
    libc.so.1`pthread_setschedprio+0x1f
    mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x9ab
    mysqld`_Z10do_commandP3THD+0x198
    mysqld`handle_one_connection+0x1a6
    libc.so.1`_thrp_setup+0x8d
    libc.so.1`_lwp_start
    mysqld`_Z13add_to_statusP17system_status_varS0_+0x47
    mysqld`_Z22calc_sum_of_all_statusP17system_status_var+0x67
    mysqld`_Z16dispatch_command19enum_server_commandP3THDPcj+0x1222
    mysqld`_Z10do_commandP3THD+0x198
    mysqld`handle_one_connection+0x1a6
    libc.so.1`_thrp_setup+0x8d
    libc.so.1`_lwp_start
    
slide 58:
    Full output
    
slide 59:
    … as a Flame Graph
    
slide 60:
    Linux example: perf Profiling
    # perf record -F 99 -ag -- sleep 30
    [ perf record: Woken up 9 times to write data ]
    [ perf record: Captured and wrote 2.745 MB perf.data (~119930 samples) ]
    # perf report -n -stdio
    […]
    # Overhead
    Samples Command
    Shared Object
    Symbol
    # ........ ............ ....... ................. .............................
    20.42%
    bash [kernel.kallsyms] [k] xen_hypercall_xen_version
    --- xen_hypercall_xen_version
    check_events
    |--44.13%-- syscall_trace_enter
    tracesys
    |--35.58%-- __GI___libc_fcntl
    |--65.26%-- do_redirection_internal
    do_redirections
    execute_builtin_or_function
    execute_simple_command
    [… ~13,000 lines truncated …]
    call tree
    summary
    
slide 61:
    Full perf Output
    
slide 62:
    … as a Flame Graph
    
slide 63:
    Inspiration
    Neelakanth Nadgir’s function_call_graph.rb (2007):
    It was inspired by Roch Bourbonnais’s CallStackAnalyzer,
    which was inspired by Jan Boerhout’s vftrace.
    The x-axis is time, and it shows a complete function trace.
    Flame graphs are different: The x-axis is the population,
    and they can show function traces or stack samples.
    # more flamegraph.pl
    […]
    # This was inspired by Neelakanth Nadgir's excellent function_call_graph.rb
    # program, which visualized function entry and return trace events. As Neel
    # wrote: "The output displayed is inspired by Roch's CallStackAnalyzer which
    # was in turn inspired by the work on vftrace by Jan Boerhout". See:
    # https://blogs.oracle.com/realneel/entry/visualizing_callstacks_via_dtrace_and
    […]
    Image source: https://blogs.oracle.com/realneel/entry/visualizing_callstacks_via_dtrace_and
    
slide 64:
    Flame Graph Summary
    Visualizes a collection of stack traces
    – x-axis: population: e.g., alphabetical sort to maximize merging
    – y-axis: stack depth
    – color: random (default) or a dimension
    Original implementation: Perl + SVG + JavaScript
    https://github.com/brendangregg/FlameGraph
    Takes input from many different profilers
    References:
    http://www.brendangregg.com/flamegraphs.html
    http://queue.acm.org/detail.cfm?id=2927301
    "The Flame Graph" CACM, June 2016
    
slide 65:
    Flame Graph Interpretation
    g()
    e()
    f()
    d()
    c()
    i()
    b()
    h()
    a()
    
slide 66:
    Flame Graph Interpretation (1/4)
    Top edge shows who is running on-CPU,
    and how much (width)
    g()
    e()
    f()
    d()
    c()
    i()
    b()
    h()
    a()
    
slide 67:
    Flame Graph Interpretation (2/4)
    Top-down shows ancestry
    e.g., from g():
    g()
    e()
    f()
    d()
    c()
    i()
    b()
    h()
    a()
    
slide 68:
    Flame Graph Interpretation (3/4)
    Widths are proportional to presence in samples
    e.g., comparing b() to h() (incl. children)
    g()
    e()
    f()
    d()
    c()
    i()
    b()
    h()
    a()
    
slide 69:
    Flame Graph Interpretation (4/4)
    Colors randomized to
    differentiate frames
    Or used for code type;
    e.g.:
    Kernel
    Java
    JVM
    (C++) C
    green == JIT (e.g., Java)
    aqua == inlined
    red == user-level
    orange == kernel
    yellow == C++
    magenta == search term
    
slide 70:
    CPU Flame Graph Tips & Tricks
    A) Check sample count (bottom frame): idle system?
    E.g., 49 Hertz x 30 sec x 16 CPUs == 23,520 samples at 100% CPU utilization.
    
slide 71:
    Icicle Graph with Leaf Merge
    "hair"
    flamegraph.pl --inverted --reverse
    Reveals common functions
    called from many locations
    
slide 72:
    Flame Graph Interactivity
    Essentials:
    – Mouse-over for frame info (tool tips, status bar)
    – Click to zoom
    – Search (Ctrl-F or button)
    search
    button
    Nice to have:
    Merge control: root, leaf, middle
    Y-axis direction: flame or icicle
    Flame chart toggle
    Canned searches
    Collapse filters
    Code links
    search matches in magenta
    
slide 73:
    Which way up?
    stack depth
    stack depth
    My original flamegraph.pl has --inverted for an “icicle graph”
    Either way is fine!
    Icicle layout helps avoid scrolling
    when starting at the top
    Let the end-user choose
    Source: https://learn.microsoft.com/en-us/visualstudio/profiling/flame-graph
    
slide 74:
    Differential Flame Graphs
    Hues:
    Differential
    – red == more samples
    – blue == less samples
    Intensity:
    – Degree of difference
    Other examples
    – flamegraphdiff
    This spectrum can show other metrics, like CPI.
    more
    less
    Remember to show elided frames!
    
slide 75:
    Poor Man’s Differential Flame Graphs
    Toggle between tabs in
    your browser
    - Like searching for Pluto!
    Or flip between slides
    - Cue exciting demo!
    
slide 76:
    Poor Man’s Differential Flame Graphs
    Toggle between tabs in
    your browser
    - Like searching for Pluto!
    Or flip between slides
    - Cue exciting demo!
    
slide 77:
    System CPU Profilers
    • Linux
    – perf_events (aka "perf")
    – bcc profile (eBPF-based)
    • Windows
    – XPerf, WPA (now has flame graphs!)
    • OS X
    – Instruments
    • And many others…
    Tip: use system profilers
    whenever possible. Runtime
    profilers (e.g., Java JVMTI-based)
    are user space and typically
    don't include kernel CPU time or
    kernel stacks.
    
slide 78:
    Linux CPU Flame Graphs
    Linux 5.8+ via perf for simplicity (2020):
    perf script flamegraph -F 49 -a -- sleep 30
    Generates flamegraph.html. One command! Thanks Andreas Gerstmayr.
    Linux 4.9+ via eBPF for efficiency (2016):
    apt-get install bpfcc-tools
    git clone https://github.com/brendangregg/FlameGraph
    profile-bcc.py -dF 49 30 | ./FlameGraph/flamegraph.pl >gt; perf.svg
    eBPF (no longer an acronym) is the name of an in-kernel execution environment, used in this
    case for aggregating stack samples in kernel context
    Most efficient: no perf.data file, summarizes in-kernel
    Some runtimes (e.g., JVM) require extra steps for stacks & symbols (next section)
    
slide 79:
    Older Linux CPU Flame Graphs
    Linux 2.6+ via perf.data and perf script (2009):
    git clone https://github.com/brendangregg/FlameGraph; cd FlameGraph
    perf record -F 49 -a –g -- sleep 30
    perf script | ./stackcollapse-perf.pl |./flamegraph.pl >gt; perf.svg
    Linux 4.5 can use folded output (2016):
    Skips the CPU-costly stackcollapse-perf.pl step; see:
    http://www.brendangregg.com/blog/2016-04-30/linux-perf-folded.html
    
slide 80:
    Linux Profiling Optimizations
    Linux 2.6
    Linux 4.5
    capture stacks
    capture stacks
    count stacks (BPF)
    perf record
    perf record
    profile.py
    write samples
    perf.data
    reads samples
    perf script
    write text
    stackcollapse-perf.pl
    folded output
    flamegraph.pl
    Linux 4.9
    write samples
    perf.data
    reads samples
    perf report –g
    folded
    folded
    output
    folded report
    awk
    folded output
    flamegraph.pl
    flamegraph.pl
    
slide 81:
    GUI Automation
    There are many options nowadays. I’ve worked on five:
    Netflix Vector (now retired!):
    Netflix FlameScope (covered later)
    Netflix FlameCommander (continuous profiling; not open source yet)
    I’m now helping with Intel vTune and Intel gProfiler
    Open source examples include Granulate gProfiler, Eclipse TraceCompass, Grafana flame graphs, Firefox
    profiler, and more (see implementation slides). Build your own!
    
slide 82:
    Flame Charts (2013)
    Inspired by flame graphs: https://bugs.webkit.org/show_bug.cgi?id=111162
    
slide 83:
    Chrome DevTools Flame Charts (2022)
    
slide 84:
    Firefox Profiler Flame Graph (2022)
    flame graph
    flame chart
    
slide 85:
    Flame Charts
    x-axis: time
    Flame Graphs
    x-axis: population
    alphabet sort or another frame merging algorithm
    
slide 86:
    3. STACKS AND SYMBOLS
    And Other Issues
    
slide 87:
    Broken Stack Traces are Common
    # perf record –F 99 –a –g – sleep 30
    # perf script
    […]
    java 4579 cpu-clock:
    7f417908c10b [unknown] (/tmp/perf-4458.map)
    java
    4579 cpu-clock:
    7f41792fc65f [unknown] (/tmp/perf-4458.map)
    a2d53351ff7da603 [unknown] ([unknown])
    […]
    should probably have more frames
    
slide 88:
    … as a Flame Graph
    broken java stacks
    “grass”
    
slide 89:
    Fixing Stack Walking
    A. Frame pointer-based
    Fix by disabling that compiler optimization: gcc's -fno-omit-frame-pointer
    Pros: simple, supported by many tools
    Cons: might cost a little extra CPU (usually 
slide 90:
    Fixing Java Stack Traces
    # perf script
    […]
    java 4579 cpu-clock:
    7f417908c10b [unknown] (/tmp/…
    java
    4579 cpu-clock:
    7f41792fc65f [unknown] (/tmp/…
    a2d53351ff7da603 [unknown] ([unkn…
    […]
    I prototyped JVM frame pointers. Oracle
    rewrote it and included it in Java as
    -XX:+PreserveFramePointer
    (JDK 8 u60b19)
    # perf script
    […]
    java 8131 cpu-clock:
    7fff76f2dce1 [unknown] ([vdso])
    7fd3173f7a93 os::javaTimeMillis() (/usr/lib/jvm…
    7fd301861e46 [unknown] (/tmp/perf-8131.map)
    7fd30184def8 [unknown] (/tmp/perf-8131.map)
    7fd30174f544 [unknown] (/tmp/perf-8131.map)
    7fd30175d3a8 [unknown] (/tmp/perf-8131.map)
    7fd30166d51c [unknown] (/tmp/perf-8131.map)
    7fd301750f34 [unknown] (/tmp/perf-8131.map)
    7fd3016c2280 [unknown] (/tmp/perf-8131.map)
    7fd301b02ec0 [unknown] (/tmp/perf-8131.map)
    7fd3016f9888 [unknown] (/tmp/perf-8131.map)
    7fd3016ece04 [unknown] (/tmp/perf-8131.map)
    7fd30177783c [unknown] (/tmp/perf-8131.map)
    7fd301600aa8 [unknown] (/tmp/perf-8131.map)
    7fd301a4484c [unknown] (/tmp/perf-8131.map)
    7fd3010072e0 [unknown] (/tmp/perf-8131.map)
    7fd301007325 [unknown] (/tmp/perf-8131.map)
    7fd301007325 [unknown] (/tmp/perf-8131.map)
    7fd3010004e7 [unknown] (/tmp/perf-8131.map)
    7fd3171df76a JavaCalls::call_helper(JavaValue*,…
    7fd3171dce44 JavaCalls::call_virtual(JavaValue*…
    7fd3171dd43a JavaCalls::call_virtual(JavaValue*…
    7fd31721b6ce thread_entry(JavaThread*, Thread*)…
    7fd3175389e0 JavaThread::thread_main_inner() (/…
    7fd317538cb2 JavaThread::run() (/usr/lib/jvm/nf…
    7fd3173f6f52 java_start(Thread*) (/usr/lib/jvm/…
    7fd317a7e182 start_thread (/lib/x86_64-linux-gn…
    
slide 91:
    Fixed Stacks Flame Graph
    Java stacks
    (but no symbols, yet)
    
slide 92:
    Fixing Native Symbols
    A. Add a -dbgsym package, if available
    B. Recompile from source
    
slide 93:
    Fixing JIT Symbols (Java, Node.js, …)
    Just-in-time runtimes don't have a pre-compiled symbol table
    So Linux perf looks for an externally provided symbol file
    # perf script
    Failed to open /tmp/perf-8131.map, continuing without symbols
    […]
    java 8131 cpu-clock:
    7fff76f2dce1 [unknown] ([vdso])
    7fd3173f7a93 os::javaTimeMillis() (/usr/lib/jvm…
    7fd301861e46 [unknown] (/tmp/perf-8131.map)
    […]
    This can be created by runtimes; e.g., Java's perf-map-agent
    Not the only solution; can also integrate with JIT-based walkers, or have an external symbol
    translator (perf script, or eBPF-based).
    
slide 94:
    Fixed Symbols (zoom)
    
slide 95:
    2014: Java Profiling (broken stacks)
    Java Profilers
    System Profilers
    
slide 96:
    2018: Java Profiling (fixed stacks)
    Kernel
    Java
    JVM
    CPU Mixed-mode Flame Graph
    
slide 97:
    Mixed-Mode Case Study
    Exception handling consuming CPU
    
slide 98:
    Other Issues
    • JIT Symbol churn
    Take before and after snapshots, or use perf’s timestamped symbol logs.
    • Containers
    Are symbol files read from the right namespace? Should now work.
    • Stack Depth limits
    Linux perf had a 127 frame limit, now tunable. Thanks Arnaldo Carvalho de Melo!
    broken stacks
    A Java microservice
    with a stack depth
    of >gt; 900
    perf_event_max_stack=1024
    
slide 99:
    Inlining
    • Many frames may be missing (inlined)
    Flame graph may still make enough sense
    • Inlining can often be be tuned
    e.g. Java's -XX:-Inline to disable, but can be 80% slower
    Java's -XX:MaxInlineSize and -XX:InlineSmallCode can be tuned
    a little to reveal more frames: can even improve performance!
    • Runtimes can un-inline on demand
    So that exception stack traces make sense
    e.g. Java's perf-map-agent can un-inline (unfoldall option)
    
slide 100:
    Language/Runtime Issues
    Each may have special stack/symbol instructions
    Java, Node.js, Python, Ruby, C++, Go, …
    See: https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
    Check if flame graphs are already in the “official” profiler
    Try an Internet search
    
slide 101:
    4. ADVANCED FLAME GRAPHS
    
slide 102:
    Flame graphs can visualize any stack trace collection
    On Linux, stacks from any of
    these events:
    
slide 103:
    Page Faults
    Show what triggered main memory (resident) to grow:
    # perf record -e page-faults -p PID -g -- sleep 120
    "fault" as (physical) main memory is allocated on-demand, when a virtual page is first populated
    Low overhead tool to solve some types of memory leak
    RES column in top(1)
    grows
    because
    
slide 104:
    Other Memory Sources
    http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html
    
slide 105:
    Disk I/O Requests
    Shows who issued disk I/O (sync reads & writes):
    # perf record -e block:block_rq_insert -a -g -- sleep 60
    GC? This JVM has swapped out!
    
slide 106:
    Context Switches
    Show why Java blocked and stopped running on-CPU:
    Identifies locks, I/O, sleeps
    If code path shouldn't block and looks random, it's an involuntary context switch. I often filter these, but I’ve
    usually already solved this type of issue (CPU load) long before trying advanced flame graphs.
    E.g., analyzing framework differences:
    epoll
    sys_poll
    futex
    futex
    rxNetty
    Tomcat
    
slide 107:
    TCP Events
    TCP transmit, using eBPF:
    # bpftrace -e 'kprobe:tcp_sendmsg { @[kstack, ustack] = count(); }'
    For eBPF, can cost noticeable overhead for high packet rates (test and measure)
    For perf, can have prohibitive overhead due to the trace, dump, post-process cycle
    Note that TCP receive is async, so stack traces are meaningless. Trace socket read instead.
    Can also trace TCP connect, accept
    Lower frequency, therefore lower overhead
    TCP sends
    
slide 108:
    CPU Cache Misses
    In this example, sampling via Last Level Cache loads:
    # perf record -e LLC-loads -c 10000 -a -g -- sleep 5; jmaps
    # perf script -f comm,pid,tid,cpu,time,event,ip,sym,dso >gt; out.stacks
    -c is the count (samples once per count)
    Can also sample hits, misses, stalls
    Needs PEBS for IP accuracy
    Precise Event Based Sampling for Instruction Pointer
    accuracy. Not yet enabled in AWS EC2 VMs.
    
slide 109:
    CPI Flame Graph
    Cycles Per Instruction (CPI)
    – red == instruction heavy
    – blue == cycle heavy
    (likely memory stall cycles)
    zoomed:
    
slide 110:
    Off-CPU Analysis
    Off-CPU analysis is the study of blocking states,
    or the code-path (stack trace) that led to them
    
slide 111:
    Off-CPU Time Flame Graph
    Off-CPU time
    More info http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
    
slide 112:
    Off-CPU Time (zoomed): tar(1)
    directory read
    from disk
    Only showing kernel stacks in this example
    file read
    from disk
    fstat from disk
    path read from disk
    pipe write
    
slide 113:
    CPU + Off-CPU Flame Graphs: See Everything
    CPU
    Off-CPU
    Everything
    (All thread time)
    
slide 114:
    Off-Wake Time Flame Graph
    Waker stack(s)
    Wokeup
    Blocked stack
    Uses Linux enhanced BPF to merge off-CPU and waker stack in kernel context
    
slide 115:
    Chain Graphs
    Waker stack(s)
    Walking the chain of wakeup stacks to reach root cause
    
slide 116:
    FlameScope
    Flame graphs can hide time-based issues of variation and perturbations.
    FlameScope uses subsecond-offeset heat maps to show these issues.
    They can then be selected for the corresponding flame graph.
    https://brendangregg.com/blog/2018-12-15/flamescope-origin.html
    https://www.brendangregg.com/HeatMaps/subsecondoffset.html
    
slide 117:
    FlameScope Example
    How many patterns can you see?
    https://www.brendangregg.com/blog/2018-11-08/flamescope-pattern-recognition.html
    
slide 118:
    Agenda Recap
    1. Implementations
    2. CPU Flame graphs
    3. Stacks & Symbols
    4. Advanced flame graphs
    
slide 119:
    Take Aways
    1. Interpret CPU flame graphs
    2. Understand runtime challenges
    3. Why eBPF for advanced flame graphs
    A new tool to lower your cost, latency, and carbon
    
slide 120:
    Links & References
    Flame Graphs
    Linux perf
    Linux eBPF
    "The Flame Graph" Communications of the ACM, Vol. 56, No. 6 (June 2016)
    http://queue.acm.org/detail.cfm?id=2927301
    http://www.brendangregg.com/flamegraphs.html
    http://www.brendangregg.com/flamegraphs.html#Updates
    http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
    http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html
    http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html
    http://techblog.netflix.com/2015/07/java-in-flames.html
    http://techblog.netflix.com/2016/04/saving-13-million-computational-minutes.html
    http://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html
    http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
    http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
    http://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html
    https://brendangregg.com/blog/2018-12-15/flamescope-origin.html
    https://github.com/brendangregg/FlameGraph
    https://github.com/spiermar/d3-flame-graph
    https://github.com/Netflix/flamescope
    http://corpaul.github.io/flamegraphdiff/
    https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/user-interface-reference/window-fla
    me-graph.html
    https://gprofiler.io/
    https://perf.wiki.kernel.org/index.php/Main_Page
    http://www.brendangregg.com/perf.html
    https://ebpf.io/ https://www.brendangregg.com/ebpf.html
    These slides: https://www.brendangregg.com/Slides/YOW2022_flame_graphs.pdf
    
slide 121:
    YOW! 2022
    Thank you!
    http://www.brendangregg.com
    brendan@intel.com
    @brendangregg
    Questions?
    
slide 122:
    BONUS SLIDES
    
slide 123:
    More Implementations
    These are in addition to the earlier examples.
    (Note: This is not an an endorsement of any company/product or sponsored by anyone.)
    
slide 124:
    Java: SPF4J (2012)
    Source: http://zolyfarkas.github.io/spf4j/#
    
slide 125:
    OSX: Instruments (2012; converter)
    Source: https://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/
    
slide 126:
    Ruby: mini-profiler (2013)
    Source: https://samsaffron.com/archive/2013/03/19/flame-graphs-in-ruby-miniprofiler
    
slide 127:
    Julia: ProfileView.jl (2013)
    Source: https://github.com/timholy/ProfileView.jl (Tim Holy)
    
slide 128:
    Windows: Xperf (2013; converter)
    Source: https://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/
    (Bruce Dawson)
    
slide 129:
    Perl: NYTProf (2013)
    Source: https://blog.timbunce.org/2013/04/08/nytprof-v5-flaming-precision/ (Tim Bunce)
    
slide 130:
    Erlang: Eflame (2013)
    Source: https://github.com/proger/eflame (Volodymyr Ky)
    
slide 131:
    Ruby: ruby-prof-flamegraph (2014)
    Source: https://github.com/oozou/ruby-prof-flamegraph
    
slide 132:
    Node.js: flamegraph (2015)
    Source: https://github.com/thlorenz/flamegraph (Thorsten Lorenz)
    
slide 133:
    Haskell: ghc-prof-flamegraph (2015)
    Source: https://www.fpcomplete.com/blog/2015/04/ghc-prof-flamegraph/ (Francesco Mazzoli)
    
slide 134:
    Differentials: Flamegraphdiff (2015)
    Source: http://corpaul.github.io/flamegraphdiff/ (Cor-Paul Bezemer)
    
slide 135:
    Java: jfr-flame-graph (2015)
    Source: http://isuru-perera.blogspot.com/2015/05/flame-graphs-with-java-flight-recordings.html
    (M. Isuru Tharanga Chrishantha Perera)
    
slide 136:
    Clojure: Flames (2015)
    Source: https://github.com/jstepien/flames/ (Jan Stępień)
    
slide 137:
    Python: python-flamegraph (2015)
    Source: https://github.com/evanhempel/python-flamegraph (Evan Hempel)
    
slide 138:
    Strongloop: Arc (2015)
    Source: https://es.slideshare.net/jguerrero999/nodejs-transaction-tracing-root-cause-analysis-withstrongloop-arc
    
slide 139:
    Java: perfj (2015)
    Source: https://github.com/coderplay/perfj (Min Zhou)
    
slide 140:
    Golang: Uber go-torch (2015)
    Source: https://github.com/uber-archive/go-torch
    
slide 141:
    Intel: processor trace converter (2015)
    Source: http://halobates.de/blog/p/329 (Andi Kleen)
    
slide 142:
    Nylas: perftools (2015)
    Source: https://www.nylas.com/blog/performance/ (code by Eben Freeman)
    
slide 143:
    Django: djdt-flamegraph (2015)
    Source: https://github.com/blopker/djdt-flamegraph (Bo Lopker)
    
slide 144:
    NodeSource: Nsolid (Node.js; 2015)
    Source: https://nodesource.com/blog/understanding-cpu-flame-graphs
    
slide 145:
    D3: d3-flame-graphs (2015)
    Source: https://cimi.io/d3-flame-graphs/ (Alex Ciminian)
    
slide 146:
    Golang: Goprofui (2015)
    Source: https://github.com/wirelessregistry/goprofui (Srdjan Marinovic, Julia Allyce)
    
slide 147:
    Rust: flame (2016)
    Source: https://github.com/llogiq/flame (Ty Overby)
    
slide 148:
    Dell Cloud Manager: Gumshoe Load Investigator (2016)
    "This haystack is looking more like a needle every minute" -- source: https://youtu.be/GGJFZfwXJ44?t=225
    Source: https://github.com/worstcase/gumshoe (Jonathan Newbrough)
    
slide 149:
    Uber: pyflame (Python; 2016)
    Source: https://www.uber.com/en-AU/blog/pyflame-python-profiler/
    
slide 150:
    Android: erlang-atrace-flamegraphs (2017)
    Source: https://blog.rhye.org/post/android-profiling-flamegraphs/ (Ross Schlaikjer)
    
slide 151:
    Java: grav (heap allocations; 2017)
    Source: https://epickrram.blogspot.com/2017/09/heap-allocation-flamegraphs.html (Mark Price)
    
slide 152:
    Nudge: APM (for Java; 2017)
    Source: https://nudge-apm.com/features/#profiling
    
slide 153:
    Java: clj-async-profiler (2017)
    Source: http://clojure-goes-fast.com/blog/profiling-tool-async-profiler/ (Alexander Yakushev)
    
slide 154:
    .NET: codetrack (2017)
    Source: https://www.getcodetrack.com/
    
slide 155:
    Node.js: Flamebearer (2018)
    Source: https://github.com/mapbox/flamebearer (Volodymyr Agafonkin)
    
slide 156:
    Opsian: always-on flame graphs (2018)
    Source: https://www.opsian.com/blog/always-on-production-flame-graphs/
    
slide 157:
    Speedscope: left heavy view (2018)
    Source: https://jamie-wong.com/post/speedscope/ (Jamie Wong)
    
slide 158:
    AppDynamics: flame graph (2018; now Cisco)
    Source: https://docs.appdynamics.com/appd/20.x/en/application-monitoring/troubleshooting-applications/
    event-loop-blocking-in-node-js#EventLoopBlockinginNode.js-FlameGraph
    
slide 159:
    Inferno: flame graph (Rust port; 2019)
    Source: https://github.com/jonhoo/inferno (Jon Gjengset)
    
slide 160:
    SAP: HANA Dump Analyzer (2019)
    Source: https://blogs.sap.com/2019/04/22/visualizing-olap-requests-on-sap-hana-system-with-concurrencyflame-graph-using-sap-hana-dump-analyzer/
    
slide 161:
    Backtrace: flame graph (2019)
    Source: https://support.backtrace.io/hc/en-us/articles/360040515971-Flame-graphs
    
slide 162:
    Instana: flame graph (2020; now IBM)
    Source: https://www.ibm.com/docs/en/instana-observability/current?topic=processes-analyzing-profiles
    
slide 163:
    ej-technologies: JProfiler Flame Graph (for Java; 2020)
    Source: https://www.ej-technologies.com/resources/jprofiler/help/doc/main/cpu.html
    
slide 164:
    Samsung: QA-Board (2020)
    Source: https://samsung.github.io/qaboard/blog/2020/06/24/flame-graphs
    
slide 165:
    Microsoft Visual Studio: vscode-js-profile-flame (for JavaScript; 2020)
    Left Heavy view
    Source: https://marketplace.visualstudio.com/items?itemName=ms-vscode.vscode-js-profile-flame
    
slide 166:
    Pyroscope: flame graph (2020)
    Source: https://pyroscope.io/blog/what-is-a-flamegraph/
    
slide 167:
    Uber: pprof++ (2021; for Golang)
    Source: https://www.uber.com/en-AU/blog/pprof-go-profiler/ (Pengfei Su)
    
slide 168:
    Lightstep: flame graph (2021)
    Source: https://www.instana.com/blog/instana-announces-the-industrys-first-commercial-continuousproduction-profiler/
    
slide 169:
    Dynatrace: allocation flame graph (2021)
    Source: https://www.dynatrace.com/support/help/how-to-use-dynatrace/diagnostics/memory-profiling
    
slide 170:
    Pixie Labs: pod performance flamegraph (2021)
    Source: https://docs.pixielabs.ai/tutorials/pixie-101/profiler/
    
slide 171:
    Apache Flink: flame graphs (2021)
    off-CPU
    Source: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/debugging/flame_graphs/
    
slide 172:
    Embrace: Application-Not-Responding flame graph (2021)
    Source: https://blog.embrace.io/solve-anrs-with-flame-graphs/
    
slide 173:
    Polar Signals: parca Continuous Profiling (2021)
    Source: https://www.polarsignals.com/blog/posts/2022/08/30/optimizing-with-continuous-profiling/
    
slide 174:
    Dockyard: Flame On (for Elixir apps; 2022)
    Source: https://dockyard.com/blog/2022/02/22/profiling-elixir-applications-with-flame-graphs-and-flame-on
    (Mike Binns)
    
slide 175:
    OpenResty: Xray (2022)
    Source: https://openresty.com/en/xray (Yichun Zhang)
    
slide 176:
    Elastic: universal profiling (2022)
    Source: https://www.elastic.co/observability/universal-profiling
    
slide 177:
    … and more
    (Dec 2022)
    Thanks for all the open source contributions!