This is a short selection of my most useful and popular material. See my homepage for the full list.
- perf Examples: Linux perf_events one-liners, examples, and visualizations.
- eBPF Tracing Tools: Linux enhanced BPF tools for performance analysis.
- The USE Method: a performance methodology for identifying resource bottlenecks.
- USE Method: Rosetta Stone: performance checklists for different OSes.
- Off-CPU Analysis: a methodology for analyzing blocked time, complimentary to CPU analysis.
- TSA Method: the methodology of thread state analysis.
- Active Benchmarking: a methodology for performing accurate benchmarks.
- Working Set Size Estimation: showing techniques for understanding main memory usage.
- CPU Flame Graphs: a visualization for sampled stack traces.
- Off-CPU Flame Graphs: different techniques for analyzing blocking events.
- Memory Flame Graphs: techniques for efficiently analyzing leaks and growth.
- Latency Heat Maps: a visualization for latency distributions over time.
- Utilization Heat Maps: different visualizations for resource utilization.
- Frequency Trails: a visualization for multiple distributions.
- What is Observability defines this made-up computer word (2021).
- An Unbelievable Demo recalls a surprising software demo from 2005 (2021).
- FlameScope Pattern Recognition, shows how to interpret subsecond-offset heatmap views of profiled data (2018).
- KPTI/KAISER Meltdown Initial Performance Regressions, analyzing the Linux kernel regression we'll all see (2018).
- Linux Load Averages: Solving the Mystery, where I explained the inclusion of the uninterruptible sleep state (2017).
- CPU Utilization is Wrong: a post explaining the growing problem of memory stall cycles dominating the %CPU metric (2017).
- gdb Debugging Full Example (Tutorial): a post to share an entire debugging, including output and explanations (2016).
- The Flame Graph article for ACMQ and CACM that defines and explains flame graphs, and discusses future developments (2016).
- Linux Performance Analysis in 60,000 Milliseconds (PDF): for the Netflix Tech Blog, by myself and the perf team (2015).
- Java in Flames (PDF): for the Netflix Tech Blog, introducing mixed-mode Java flame graphs (2015).
- eBPF One Small Step: introducing Linux eBPF and explaining the capabilities this feature brings (2015).
- Ftrace: The Hidden Light Switch: an lwn.net article about Linux ftrace (2014).
- The Benchmark Paradox: a short blog post explaining a seeming paradox in benchmark evaluations (2014).
- strace Wow Much Syscall: my warning blog post about strace(1), along with many bad strace-related jokes (2014).
- The Case of the Clumsy Kernel (PDF): a kernel performance analysis article for USENIX ;login (2013).
- The Greatest Tool that Never Worked: har: about the value of ideas in software screenshots (2013).
- Top 10 DTrace Scripts for Mac OS X: included an intro to command line DTrace usage (2011).
- Visualizing System Latency: an article for ACMQ and CACM about latency heat maps (2010).
- bcc: BPF compiler collection, for which I'm a major contributor, especially for performance tools.
- bpftrace: a high-level BPF tracing language, for which I'm a major contributor.
- FlameGraph: a visualization for sampled stack traces, used for performance analysis.
- HeatMap: an program for generating interactive SVG heat maps from trace data.
- perf-tools: perf analysis tools based on Linux perf_events and ftrace.
- Specials: "special" tools for system administrators.
Computing Performance: On the Horizon, USENIX LISA (online), 2021
Cloud Performance Root Cause Analysis at Netflix, YOW! Conf Australia, 2018
Performance Tuning EC2 Instances, AWS re:Invent, 2017
Linux 4.x Performance: Using BPF Superpowers, Facebook's Performance @Scale, 2016
Click for video of: Linux 4.x Performance: Using BPF Superpowers (Brendan Gregg)Posted by At Scale on Friday, February 26, 2016
Visualizing Performance with Flame Graphs, USENIX ATC, Santa Clara, 2017
System Methodology, ACM Applicative, New York, 2016
Performance Checklists for SREs, SREcon Santa Clara, 2016
Linux Performance Tools, O'Reilly Velocity, Santa Clara, 2015
- Give me 15 minutes and I'll change your view of Linux tracing, USENIX/LISA, 2016: youtube (18 mins).
- Broken Performance Tools for QConSF, 2015: slideshare, infoq (slides, video) (50 mins).
- Netflix Instance Analysis Requirements for Monitorama, 2015: blog (slides, video) (34 mins).
- What Linux Can Learn from Solaris Performance, and Vice-Versa, SCaLE, 2015: youtube, slideshare (60 mins).
- Flame Graphs on FreeBSD, FreeBSD Developer and Vendor Summit, 2014: blog (slides, video) (53 mins).
- Performance Analysis of BSD, MeetBSD CA, 2014: blog (slides, video) (53 mins).
- Analyzing OS X Systems Performance with the USE Method, MacIT, 2014: slideshare (no video).
- Benchmarking Gone Wrong, Surge 2013 lightning talk: youtube (5 mins).
- Stop the Guessing, Velocity 2013: youtube, slideshare (46 mins).
- Open Source Systems Performance, OSCON, 2013: slideshare, youtube (32 mins).
- Blazing Performance with Flame Graphs, USENIX LISA, 2013: youtube, slideshare (90 mins).
- Performance Analysis Methodology, USENIX/LISA, 2012: slideshare, youtube (90 mins).
- ZFS: Performance Analysis and Tools, zfsday, 2012: slideshare, youtube (43 mins).
- Performance Visualizations, USENIX/LISA, 2010: slideshare, youtube (80 mins).
More listed on my homepage.
Systems Performance: Enterprise and the Cloud 2nd Edition (2020)
Brendan Gregg. ISBN 978-0-13-682015-4. Addison-Wesley.
Systems performance is the study of application, operating system, kernel, and hardware performance: Everything in the data path. The second edition of this best-selling book adds content on BPF, BCC, bpftrace, perf, and Ftrace, mostly removes Solaris, makes numerous updates to Linux and cloud computing, and includes general improvements and additions. 928 pages.
BPF Performance Tools: Linux System and Application Observability (2019)
Brendan Gregg. ISBN 0-13-655482-2. Addison-Wesley.
BPF originally stood for Berkeley Packet Filter, but has been extended to be an in-kernel execution environment in Linux, allowing a new type of software to be developed. This includes a new era of observability tools.
The book includes over 150 BPF observability tools that you can run to find performance wins and troubleshoot software, and also shows you how to write your own. I developed over 100 new BPF tools for this book. 880 pages.
Systems Performance: Enterprise and the Cloud (2013)
This book covers new developments in systems performance: in particular, dynamic tracing and cloud computing. It also introduces many new methodologies to help a wide audience get started. It leads with Linux examples from Ubuntu, Fedora, and CentOS, and also covers Solaris-based distributions. Covering two different kernels provides additional perspective that enhances the reader's understanding of each. 772 pages.
DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD (2011)
This shows how to use DTrace by-example for performance analysis and troubleshooting. Solaris was used as the primary OS, with additional examples from Mac OS X and FreeBSD. The most difficult challenge for using a dynamic tracing tool (DTrace, SystemTap, etc.) is knowing what to do with it. This book provides over one hundred use cases (scripts), which will be invaluable even after the example code becomes out of date. 1152 pages.
Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris (2006)
Richard McDougall, Jim Mauro, Brendan Gregg. ASIN 0131568191. Prentice Hall.
A practical guide to performance analysis on Solaris. This summarizes background for context, and shows how to use the various tools available. This book was written at an interesting time: DTrace was new, filling in many observability gaps, and this book covers the best of the old and new ways of analysis. It was written as a companion volume to Solaris Internals 2nd Edition, which it references. 444 pages.
If you purchase my books through Amazon or InformIT link, the book's technical editor earns a commission.