Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Flame Graphs


CPU Flame Graph

Flame graphs are a visualization of hierarchical data, created to visualize stack traces of profiled software so that the most frequent code-paths to be identified quickly and accurately. They can be generated using my open source programs on github.com/brendangregg/FlameGraph, which create interactive SVGs. My colleague on the Netflix performance engineering team, Martin Spier, created an open source d3 version: d3-flame-graph. See the Updates section for other implementations.

The following pages (or posts) introduce different types of flame graphs:

  1. CPU
  2. AI/GPU
  3. Memory
  4. Off-CPU
  5. Hot/Cold
  6. Differential

The example on the right is a portion of a CPU flame graph, showing MySQL codepaths that are consuming CPU cycles, and by how much.

Flame graphs can also be used for any hierarchical data. E.g., file system contents (see instructions; comparisons with treemaps and sunbursts).

On this page: Summary, OSes, Presentation, Variations, Origin, Updates.

Summary

The x-axis shows the stack profile population, sorted alphabetically (it is not the passage of time), and the y-axis shows stack depth, counting from zero at the bottom. Each rectangle represents a stack frame. The wider a frame is is, the more often it was present in the stacks. The top edge shows what is on-CPU, and beneath it is its ancestry. Original flame graphs use random colors to help visually differentiate adjacent frames. Variations include inverting the y-axis (an "icicle graph"), changing the hue to indicate code type, and using a color spectrum to convey an additional dimension.

Flame graphs are both a static and dynamic visualization. As a static visualization, a flame graph can be saved as an image, included in print (books), and will still convey the "big picture" as only the most frequent frames have enough width for labels. A dynamic visualization allows interactive features to aid navigation and comprehension, including:

This visualization is fully explained in my ACMQ article The Flame Graph, also published in Communications of the ACM, Vol. 59 No. 6.

Also see my CPU Flame Graphs page, and the presentation below.

Presentation

I gave an updated talk explaining flame graphs at USENIX ATC 2017 titled Visualizing Performance with Flame Graphs, which is on youtube and slideshare (PDF)

My first talk on flame graphs was at USENIX LISA 2013, which ended up as a plenary talk (youtube, slideshare, PDF):

Operating Systems

Some operating system profilers now have built-in support for flame graphs:

Flame graphs can also be generated from any profile data that contains stack traces, including from the following profiling tools:

Once you have a profiler that can generate meaningful stacks, converting them into a flame graph is usually the easy step.

There are also numerous profiling products and companies that now support flame graphs. See the Updates section below.

Variations

Icicle charts are flame graphs upside down. Some people prefer it that way. My flamegraph.pl creates them using --inverted. I prefer the standard "flame" layout, where the y-axis is counting stack depth upwards from zero at the bottom. I'm also used to scanning them top-down to look for plateaus. But for very deep stacks the flame graph layout (with a GUI that starts at the top) often means the initial view may be mostly empty (a few thin interrupt stacks) forcing the developer to scroll down to find the bulk of the profile. For developers who prefer reading root-to-leaf anyway, an icicle layout instead means that the starting point is always on screen without needing to scroll. For that reason, many flame graph implementations use the icicle layout by default instead. Others use the flame graph layout but begin showing the bottom so that the root frames are on screen. I don't have a strong opinion about this, do whichever you prefer! Preferably include a toggle so that the end user can pick their preferred layout.

Flame charts were first added by Google Chrome's WebKit Web Inspector (bug). While inspired by flame graphs, flame charts put the passage of time on the x-axis instead of the alphabet. This means that time-based patterns can studied. Flame graphs reorder the x-axis samples alphabetically, which maximizes frame merging, and better shows the big picture of the profile. Multi-threaded applications can't be shown sensibly by a single flame chart, whereas they can with a flame graphs (a problem flame charts didn't need to deal with, since it was initially used for single-threaded JavaScript analysis). Both visualizations are useful, and tools should make both available if possible (e.g., TraceCompass does). Some analysis tools have implemented flame charts and mistakingly called them flame graphs.

Sunburst layout using radial coordinates for the x-axis, a flame graph can be turned into a hierarchical pie chart. The Google Web Inspector team prototyped them. I also discussed them vs flame graphs in my comparison post.

Origin

I invented flame graphs when working on a MySQL performance issue and needed to understand CPU usage quickly and in depth. The regular profilers/tracers had produced walls of text, so I was exploring visualizations. I first traced CPU function calls and visualized it using Neelakanth Nadgir's time-ordered visualization for callstacks, which itself was inspired by Roch Bourbonnais's CallStackAnalyzer and Jan Boerhout's vftrace. These look similar to flame graphs, but have the passage of time on the x-axis. But there were two problems: the overhead of function tracing was too high, perturbing the target, and the final visualization was too dense to read when spanning multiple seconds. I switched to timed sampling (profiling) to solve the overhead problem, but since the function flow is no longer known (sampling has gaps) I ditched time on the x-axis and reordered samples to maximize frame merging. It worked, the final visualization was much more readable. Neelakanth and Roch's visualizations used completely random colors to differentiate frames. I thought it looked nicer to narrow the color palette, and picked just warm colors initially as it explained why the CPUs were "hot" (busy). Since it resembled flames, it quickly became known as flame graphs.

I described more detail of the original performance problem that led to flame graphs in my ACMQ/CACM article (link above). The flame graph visualization is really an adjacency diagram with an inverted icicle layout, which I used to visualize profiled stack traces.

Updates

Flame graphs were released in Dec 2011. Not long afterwards (updated in 2012):

More Flame Graph news (updated Apr 2013):

More Flame Graph news (updated Aug 2013):

More Flame Graph news (updated Jun 2014):

More Flame Graph news (updated Dec 2014):

More Flame Graph news (updated Jun 2015):

More Flame Graph news (updated Dec 2015):

More Flame Graph news (updated Jun 2016):

More Flame Graph news (updated Dec 2016):

More Flame Graph news (updated Jun 2017):

More Flame Graph news (updated Dec 2017):

More Flame Graph news (updated Dec 2018):

More Flame Graph news (updated Oct 2019):

More Flame Graph news (updated Oct 2020):

More Flame Graph news (updated Nov 2021):

Thanks to everyone who has written about flame graphs, developed them further, and shared their results! I'll update this page from time to time with more news.


Last updated: 29-Oct-2024