Brendan D. Gregg
G'Day. I use this site to share some hobbies and my work with computers. These days I work on large scale computer performance, including large cloud computing environments, and live in Silicon Valley. I have a personal blog, and I'm also on twitter. Here is my bio and anti-bio.
For a short selection of my favourite content, see my portfolio page. For everything, see the sitemap.
Documentation
In approximately reverse chronological order:
- A page to summarize my Linux Performance related material.
- My Linux Profiling at Netflix talk for SCALE 13x (2015), where I covered getting CPU profiling to work, including for Java and Node.js, and a tour of other perf_events features (slideshare, PDF, youtube).
- For LISA2014, my Linux Performance Analysis: New Tools and Old Secrets talk, where I covered ftrace and perf_events tools I've recently developed (slideshare, PDF, youtube, USENIX).
- My talk on Performance Tuning Linux Instances on EC2 from AWS re:Invent 2014, where I covered how Netflix selects, tunes, and then observes the performance of Linux cloud instances (slideshare, youtube).
- A post introducing Differential Flame Graphs, which can be used for performance regression analysis. I also wrote about CPI flame graphs, which uses the differential flame graph code, and pmcstat on FreeBSD.
- My Flame Graphs on FreeBSD talk, for the FreeBSD Dev and Vendor Summit 2014. I summarized the different types, with the FreeBSD commands to create them (slideshare, PDF, youtube).
- For my first BSD conference, MeetBSDCA 2014, my Performance Analysis for BSD talk, where I discussed 5 facets: observability, methodologies, benchmarking, profiling, and tuning (slideshare, PDF, youtube).
- Contributions to the DTrace on FreeBSD wiki page: one-liners and a basic tutorial.
- My Linux Performance Tools talk for LinuxCon Europe 2014, where I summarized observability, benchmarking, tuning, static perf tuning tools, and tracing. This was an updated version of my earlier LinuxCon talk on the same topic (slideshare, PDF, youtube).
- For the 2014 Tracing Summit, my From DTrace to Linux talk, where I summarized what Linux can learn from DTrace (slideshare, PDF, youtube).
- My Surge 2014 talk From Clouds to Roots, on how Netflix does root cause performance analysis on a Linux cloud. It's my most comprehensive performance analysis talk to date. Instead of just focusing on low-level tools, I provided context and then showed the full path from clouds to roots. (slideshare, PDF, youtube).
- My post The MSRs of EC2, where I showed how CPU Model Specific Registers can be used to measure the real CPU clock rate and temperature in Xen guests (2014).
- Slides from my LinuxCon North America 2014 talk Linux Performance Tools, which summarizes performance observability, benchmarking, and tuning tools, and illustrates their role on Linux system functional diagrams (PDF).
- Posts summarizing Linux Java CPU Flame Graphs and Node.js CPU Flame Graphs (2014).
- My lwn.net article Ftrace: The Hidden Light Switch (2014).
- Posts describing perf-tools based on Linux perf_events and ftrace (both core Linux kernel tracers): perf Hacktogram, iosnoop, iosnoop Latency Heat Maps, opensnoop, execsnoop, tcpretrans (2014).
- Posts about Linux perf_events: perf CPU Sampling, perf Static Tracepoints, perf Heat Maps, perf Counting, perf Kernel Line Tracing (2014).
- I wrote a warning post titled strace Wow Much Syscall, which discusses strace(1) for production use, includes an interesting example, and many bad strace-related jokes (2014).
- Two posts to explain Xen modes on AWS EC2: What Color is Your Xen and Xen Feature Detection (2014).
- The Benchmark Paradox: a short post explaining a seeming paradox in benchmark evaluations (2014).
- My Analyzing OS X Systems Performance with the USE Method talk at MacIT 2014 (PDF).
- At SCaLE12x (2014) I gave the keynote on What Linux can learn from Solaris perf. and vice-versa (PDF, youtube).
- The Case of the Clumsy Kernel (PDF): a kernel performance analysis article for USENIX ;login (2013).
- My USENIX/LISA 2013 slides Blazing Performance with Flame Graphs, was two talks in one: part 1 covered the commonly used CPU flame graphs, and part 2 covered various advanced flame graphs (PDF, youtube).
- The TSA Method, a performance analysis methodology for identifying issues causing poor application performance. This is a thread-oriented methodology, and is complementary to the resource-oriented USE Method. It has solved countless issues.
- A page of my Performance Analysis Methodology summaries, and links.
- Systems Performance: Enterprise and the Cloud, Prentice Hall, 2013 (ISBN 0133390098). This book covers new developments in systems performance: in particular, dynamic tracing and cloud computing. It also introduces many new methodologies to help a wide audience get started. It leads with Linux examples from Ubuntu, Fedora, and CentOS, and also covers illumos distributions. Covering two different kernels provides additional perspective that enhances the reader's understanding of each. The book is 635 pages plus appendices.
- My slides for a brief talk on The New Systems Performance, where I summarized how the topic has changed from the 1990's to today (July 2013, PDF, youtube).
- Active Benchmarking: a methodology for successful benchmarking, and an example of its use for Bonnie++.
- My OSCON 2013 slides for Open Source Systems Performance, where I provided a unique perspective I'm best positioned to give about both open- and close-sourcing software, and what this means for systems performance analysis (PDF, youtube).
- Visualizing distributions using Frequency Trails, explained in the Introduction, then using them for Detecting Outliers, measuring Modes and Modality, and What the Mean Really Means.
- My slides for Stop the Guessing: Performance Methodologies for Production Systems talk at Velocity (2013) (PDF, youtube).
- The very popular slide deck for my Linux Performance Analysis and Tools talk at SCaLE11x (2013), which includes lesser known tools such as perf's dynamic tracing and static trace points. I've been told people want slide 16 on a coffee cup! (slideshare, PDF, youtube).
- A summary of Virtualization Performance: Zones, KVM, Xen, focusing on I/O path overheads (2013) (PDF).
- The Thinking Methodically about Performance article for ACMQ (2012), and CACM, based on my earlier USE Method articles.
- USENIX/LISA 2012 slides on Performance Analysis Methodology, summarizing ten methods and anti-methods (slideshare, PDF, youtube).
- For illumosday and zfsday, my slides for DTracing the Cloud (PDF, youtube) and ZFS Performance Analysis and Tools (PDF, youtube).
- The introduction of a new visualization type: Subsecond Offset Heat Maps, which allow behavior within a second to be seen (PDF).
- The USE Method, which I developed for identifying common system bottlenecks and errors, and have used successfully for many years in enterprise and cloud performance environments. Based on the USE method: the Linux Performance Checklist, the Solaris Performance Checklist, the SmartOS Performance Checklist, the Mac OS X Performance Checklist, the FreeBSD Performance Checklist, and the Unix 7th Edition Performance Checklist. There is also the USE Method Rosetta Stone of Performance Checklists.
- The Flame Graph visualization and using them for CPU, Memory, and Off-CPU analysis. The CPU page summarizes how different tools can be used to collect the profile data, including DTrace, and the different Linux profilers: perf, SystemTap, and ktap.
- Colony Graphs, a visualization of computer life forms, and their use for Visualizing the Cloud, Process Snapshots and Process Execution.
- Demonstrations of different visualizations for Device Utilization, which was described as blog post of the year (2011).
- Narrow topics in operating system performance: Activity of the ZFS ARC.
- A long post about Using SystemTap on the Ubuntu and CentOS Linux distributions, written in late 2011.
- An introduction the technique of Off-CPU Performance Analysis, which can identify the cause of high latency due to blocking events.
- Top 10 DTrace Scripts for Mac OS X performance analysis and troubleshooting, written to reach the broader Mac OS X community. This includes step by step instructions on how to find and run the Terminal application and sudo (PDF).
- A series of blog posts on File System latency, using MySQL as an example application (1, 2, 3, 4, 5) (2011).
- MySQL Query Latency using DTrace (2011).
- A series of blog posts on the DTrace pid provider, going beyond what was covered in the DTrace book (2011).
- The DTrace book with Jim Mauro (Prentice Hall, 2011; ISBN 0132091518). A sample chapter on File Systems is online. This 1152 page book took over a year to write, including the research, development and testing of dozens of new DTrace scripts and one-liners, and soliciting input from many experts. Solaris was used as the primary OS for examples, with additional examples from Mac OS X and FreeBSD. The most difficult challenge for using a dynamic tracing tool (DTrace, SystemTap, etc.) is knowing what to do with it. This book provides over one hundred use cases (scripts), which will be invaluable even after the example code becomes out of date.
- A page on Heat Maps, and a demonstration of Latency Heat Maps which includes example software to generate them.
- The Visualizations for Performance Analysis slide deck, USENIX/LISA 2010. This describes two different approaches (methodologies) for systems performance: workload analysis and latency analysis, the metrics used, and then introduces a variety of heat map visualizations. This talk ends by describing the challenges of cloud computing, and how heat maps are well suited for the scale of data (PDF).
- An article for ACMQ, also published by CACM, on Visualizing System latency (2010). This includes interesting latency heat maps I had found, including the Rainbow Pterodactyl and the Icy Lake.
- A series of posts on performance testing a line of storage appliances (1, 2, 3). I wrote these in 2009, when I was often saving benchmarking mishaps. They were very successful (and thanks to those who read them) as the calls for help were greatly reduced.
- For the Front Range OpenSolaris User Group (FROSUG), in Denver, Colorado, my slides for my Little Shop of Performance Horrors talk, where I discussed things going wrong instead of right. It was a lot of fun, and people showed up despite a massive snow storm (2009).
- The storage appliance dashboard where I used weather icons to highlight performance issues and convey ambiguity for those types of metric.
- The original ZFS L2ARC post (2008) and later L2ARC Screenshots (2009). Since code changes were public each night, my block comment in usr/src/uts/common/fs/zfs/arc.c (added in Nov 2007) disguised the then-secret intent of this technology by listing "short-stroked disks" as the first intended device, instead of SSDs.
- My Solaris Performance: Introduction slides (PDF) from May 2007, covering Solaris performance features and observability. This includes two of my methodologies for performance analysis: the "By-Layer Strategy" and the "3-Metric Strategy" (back when I spelled utilization with an "s"). The latter strategy is what I later called the USE Method.
- Slide decks (PDFs) from DTrace talks in 2007: DTrace Intro, DTraceToolkit, and DTrace Java.
- The companion to Solaris Internals 2nd Edition: Solaris Performance and Tools, with Richard McDougall and Jim Mauro (Prentice Hall, 2006; ISBN 0131568191). These chapters began during development of Solaris Internals 2nd Edition, and were later split into a separate companion volume. It worked well: a reference book on internals, and a companion book for practitioners on performance.
- My Solaris 10 Zones page from 2005: where I had figured out how to configure Solaris Zones with Resource Controls.
- A page on DTrace (created in 2004), where I shared early scripts I was developing, and the DTraceToolkit (no longer maintained).
- Old or out of date Unix and Sun Solaris material is in the Crypt for historic interest. (circa 2005.)
- I have an old personal blog at bdgregg.blogspot.com where I discussed DTraceToolkit updates.
- I have two prior professional blogs: blogs.oracle.com/brendan (formally blogs.sun.com/brendan), where I discussed performance, DTrace, and the ZFS storage appliance (2006-2010). In case that vanishes one day, the posts are also available on dtrace.org/blogs/brendan, where I continued blogging about cloud performance and DTrace (2010-2014). Both have been popular, exceeding 300,000 yearly views.
Videos
- At SCALE13x (2015), my Linux Profiling at Netflix talk on perf_events CPU profiling and features (blog, youtube, slideshare) (59 mins).
- For USENIX LISA14, I gave a talk about my Linux perf-tools collection, which is based on ftrace and perf_events: Linux Performance Analysis: New Tools and Old Secrets (youtube, slideshare, USENIX) (43 mins).
- My AWS re:Invent 2014 talk Performance Tuning EC2 Instances: selection, Linux tuning, observability (youtube, slideshare) (45 mins).
- I was interviewed by BSD Now about BSD and benchmarking in Episode 065: 8,000,000 Mogofoo-ops (youtube) (2014) (28 mins).
- My FreeBSD dev summit 2014 talk on Flame Graphs on FreeBSD. Since this talk wasn't videoed, I captured it using screenflow from my laptop. It's better than nothing, and shows my live demos well. (youtube) (53 mins).
- My MeetBSDCA 2014 talk Performance Analysis, summarizing 5 facets of perf analysis on BSD (youtube, slideshare) (53 mins).
- My popular talk on Linux Performance Tools, which summarizes performance observability, benchmarking, tuning, static performance tuning, and tracing tools. This was for LinuxCon Europe 2014 (youtube, slideshare) (49 mins).
- For the Tracing Summit 2014, my talk From DTrace to Linux summarized lessons Linux tracing can learn (youtube, slideshare) (61 mins).
- My Surge 2014 talk From Clouds to Roots, showing how Netflix does perf analysis and the tools involved (youtube, slideshare) (56 mins).
- My SCaLE12x (2014) keynote on What Linux can learn from Solaris performance and vice-versa (youtube, slideshare) (60 mins).
- Deirdré Straughan has a youtube playlist named Brendan Gregg's Best, which has many of my talks (many of which she filmed).
- My plenary session at USENIX/LISA 2013: Blazing Performance with Flame Graphs (youtube, slideshare, usenix) (90 mins).
- A talk for BayLISA October 2013 to describe and launch the Systems Performance book (60 mins).
- A lightning talk for Surge 2013 on Benchmarking Gone Wrong, which includes the craziest line graph I've ever seen (~5 mins).
- The New Systems Performance, a meetup talk I gave in 2013 about modern systems performance (23 mins).
- My OSCON 2013 talk on Open Source Systems Performance, a tale of three parts (youtube, slideshare) (32 mins).
- My Stop the Guessing: Performance Methodologies for Production Systems talk at Velocity 2013 (youtube, slideshare) (46 mins).
- At SCaLE11x (2013) I gave a talk on Linux Performance Analysis and Tools, summarizing basic to advanced analysis tools, and including some methodologies (youtube, slideshare, blog) (60 mins).
- My LISA 2012 talk on Performance Analysis Methodology named and summarized 10 methods (youtube, usenix, slideshare, blog) (86 mins).
- ZFS: Performance Analysis and Tools at zfsday was probably my best talk of 2012 (youtube, slideshare, blog) (43 mins).
- At illumosday I gave a talk on DTracing the Cloud, showing what can be done (youtube, slideshare) (44 mins).
- At FISL'13 (2012) I gave a talk on The USE Method for systems performance analysis, including some other methods for comparison (youtube, slideshare, blog) (56 mins).
- My talk at Surge'12 on Real-time in the real world with Bryan Cantrill is online (youtube, slideshare) (56 mins).
- At dtrace.conf(12) I gave an unconference-style talk on various Visualizations (youtube, blog) (35 mins).
- My SCaLE10x talk (2012) on Performance Analysis: new tools and concepts from the cloud, with examples (youtube, slideshare) (1 hr).
- Short talks on performance tools for Solaris-based operating systems: vmstat, mpstat, and load averages, filmed during 2011.
- My extended Percona Live New York 2011 talk: Breaking Down MySQL/Percona Query Latency With DTrace (youtube, blog) (90 mins).
- My LISA 2010 talk on Visualizations for Performance, which explains the need for heat maps (youtube, usenix, slideshare, blog) (80 mins).
- At Oracle Open World 2010, I gave a talk on How to Build Better Applications with DTrace, which is on youtube: part 1, part 2. (64 mins).
- I've given many technical talks of things going right. This was about things going wrong: Little Shop of Performance Horrors at FROSUG in Colorado, 2009 (youtube, blog) (2.5 hours).
- Shouting in the Datacenter (youtube, blog) was a video that Bryan Cantrill and I made on the spur of the moment on New Year's Eve 2008, which went viral (750,000+ views). I've had many emails about it: it has spawned an industry of sound proofing data centers (2 mins). There is also a making of (youtube) video (5 mins).
Software
The following are my spare time software projects, and are open source with no warranty – use at your own risk. Some are computer security tools, which may be illegal to own or run in your country if they are misidentified as cracking tools.
I've also developed software as a professional kernel engineer, which isn't listed below (eg, the ZFS L2ARC).
- perf-tools is a collection of ftrace- and perf_events-based performance analysis tools for Linux.
- perf Examples for perf_events, the standard Linux profiler. Page including one-liners and flame graphs.
- ktap Examples for the new lua-based Linux dynamic tracing tool. Page including one-liners, tools, and flame graphs.
- msr-cloud-tools model specific register observability tools intended for cloud instances.
- DTrace Tools for FreeBSD.
- DTrace book scripts from the DTrace book, which demonstrates many new uses of dynamic tracing.
- DTraceToolkit a collection of over 200 scripts, with man pages and example files (no longer maintained).
- DTrace Tools original versions of iosnoop, opensnoop, bitesize.d, execsnoop, shellsnoop, tcpsnoop, iotop, ...
- Dump2PNG visualizes file data as a PNG (uses libpng). An experimental tool intended for core dump analysis. screenshot.
- nicstat network interface stats for Solaris (uses Kstat). example. There is also a Perl version, and Tim Cook added Linux support.
- FlameGraph: a visualization for sampled stack traces, used for performance analysis. See the Flame Graphs page for an explanation.
- HeatMap: an program for generating interactive SVG heat maps from trace data. See the page about it.
- Chaosreader: Trace TCP/UDP sessions and fetch application data from snoop or tcpdump logs. This will fetch telnet sessions, FTP files, HTTP transfers, SMTP emails, ... The following example output was created by Chaosreader to link to the extracted HTTP sections, telnet sessions, and FTP files found in a snoop log. This can also create telnet replay programs that play back sessions in realtime: example. A tool for forensics or network troubleshooting. download code (github).
- Perl modules: Net::SnoopLog for snoop packet logs (RFC1761), Net::TcpDumpLog for tcpdump/libpcap logs, Algorithm::Hamming::Perl.
- FreqCount is a simple frequency counter. Useful for processing logs (most common IP addr, port, etc..). example.
- PortPing is a version of ping that connects using ssh (or other ports), not ICMP. Good for checking firewalls. example.
- MTUfinder tests different sized HTTP requests to a web server, highlighting MTU size problems. example.
- Specials is a collection of "special" programs for system administrators. Mostly Perl.
- DtkshDemos a collection of X11 dtksh scripts. They include xvmstat - a GUI version of vmstat, and xplot - a generic data plotter. Written for any OS with dtksh.
- BBaseline is a small script to create a baseline of the system's performance, by logging the output of several tools. By creating logs during normal and peak activity, this can assist performance tuning. Easy to customize, and to grep the baselines. See the example.
- total is a simple awk script to sum a field (example); field prints a field (example). These exist for convenience at the shell.
- Quick Text Toaster v1.0 An editor I wrote many years ago to grab text from corrupted files. Works with executables, documents, etc.
- QBASIC CRO v1.2 I still find this old program amusing. It is a digital (on/off) CRO that samples the parallel port at 1KHz. screenshot.
- Guessing Game is written in awk C C++ csh Fortran java ksh Pascal Perl QBASIC sh and more as a language comparison.
- The Crypt has some of my older Solaris and Unix software, including the K9Toolkit collection of kstat-based performance tools, Psio for disk I/O by-process, and CacheKit for hardware and software cache analysis.
Misc
- Recommended Reading: A list of my favourite technology books.
- Other Sites: Other interesting places on the web.
- Photos: Some photos I've taken.
- Games: My favourite computer games.
- Full Site Map: A list of links to everything here.


