Brendan D. Gregg
G'Day. I use this homepage to share some hobbies and my work with computers. These days I work on performance and live in the San Francisco Bay Area. I have two blogs: personal and professional. I'm also on twitter and linkedin. Here is my bio and anti-bio.
- My USENIX/LISA 2013 slides Blazing Performance with Flame Graphs, which was two talks in one: part 1 covered the commonly used CPU sample flame graphs, and part 2 covered various advanced flame graphs (PDF).
- The TSA Method, a performance analysis methodology for identifying issues causing poor application performance. This is a thread-oriented methodology, and is complementary to the resource-oriented USE Method. It has solved countless issues.
- Systems Performance: Enterprise and the Cloud, Prentice Hall, 2013 (ISBN 0133390098). This is the book I felt needed to be written, covering new developments in systems performance: in particular, dynamic tracing and cloud computing. It also introduces many new methodologies to help a wide audience get started. It leads with Linux examples from Ubuntu, Fedora, and CentOS, and also covers illumos distributions. Covering two different kernels provides additional perspective that enhances the reader's understanding of each. The book is 635 pages plus appendices.
- My slides for a brief talk on The New Systems Performance, where I summarized how the topic has changed from the 1990's to today (July 2013, PDF).
- My OSCON 2013 slides for Open Source Systems Performance, where I provided a unique perspective I'm best positioned to give about both open- and close-sourcing software, and what this means for systems performance analysis (PDF).
- Visualizing distributions using Frequency Trails (PDF), then using them for Detecting Outliers (PDF), measuring Modes and Modality (PDF), and What the Mean Really Means (PDF).
- My slides for Stop the Guessing: Performance Methodologies for Production Systems talk at Velocity (2013) (PDF).
- The very popular slide deck for my Linux Performance Analysis and Tools talk at SCaLE11x (2013), which includes lesser known tools such as perf's dynamic tracing and static trace points. I've been told people want slide 16 on a coffee cup! (slideshare, PDF).
- A summary of Virtualization Performance: Zones, KVM, Xen, focusing on I/O path overheads (PDF).
- The Thinking Methodically about Performance article for ACMQ, and CACM, based on my earlier USE Method articles.
- My USENIX/LISA 2012 slides on Performance Analysis Methodology, summarizing ten methodologies and anti-methodologies (slideshare, PDF).
- For illumosday and zfsday, my slide decks for DTracing the Cloud (PDF) and ZFS Performance Analysis and Tools (PDF).
- The introduction of a new visualization type: Subsecond Offset Heat Maps, which allow behavior within a second to be seen (PDF).
- The USE Method, which I developed for identifying common system bottlenecks and errors, and have used successfully for many years in enterprise and cloud performance environments (PDF). Based on the USE method: the Linux Performance Checklist, the Solaris Performance Checklist, the SmartOS Performance Checklist, the Mac OS X Performance Checklist, the FreeBSD Performance Checklist, and the Unix 7th Edition Performance Checklist.
- The Flame Graph visualization (PDF) and using them for Linux Kernel Performance analysis (PDF).
- Performance visualizations: Device Utilization, and a series of three (so far) on Visualizing the Cloud, Process Snapshots and Process Execution.
- Narrow topics in operating system performance: Activity of the ZFS ARC.
- A long post about Using SystemTap on the Ubuntu and CentOS Linux distributions.
- An article to introduce the technique of Off-CPU Performance Analysis, which can identify the cause of high latency due to blocking events.
- Top 10 DTrace Scripts for Mac OS X performance analysis and troubleshooting, written to reach the broader Mac OS X community. This includes step by step instructions on how to find and run the Terminal application and sudo (PDF).
- A series of blog posts on File System latency, using MySQL as an example application (1, 2, 3, 4, 5).
- MySQL Query Latency using DTrace.
- A series of blog posts on the DTrace pid provider, going beyond what was covered in the DTrace book.
- The DTrace book with Jim Mauro, Prentice Hall 2011 (ISBN 0132091518). A sample chapter on File Systems is online. This 1152 page book took over a year to write, including the research, development and testing of dozens of new DTrace scripts and one-liners, and soliciting input from many experts. Solaris was used as the primary OS for examples, with additional examples from Mac OS X and FreeBSD. Someone once asked me who wrote the FreeBSD content - it was me! Why is that a question? Maybe I need to publish more on FreeBSD. :)
- The Visualizations for Performance Analysis slide deck, USENIX/LISA 2010. This describes two different approaches (methodologies) for systems performance: workload analysis and latency analysis, the metrics used to apply these, and then introduces a variety of heat map visualizations. This talk ends by describing the challenges of cloud computing, and how heat maps are well suited for the scale of data (PDF).
- An article for ACMQ, also published by CACM, on Visualizing System latency. This includes interesting latency heat maps I had found, including the Rainbow Pterodactyl and the Icy Lake.
- A series of posts on performance testing a line of storage appliances (1, 2, 3). These were written at a time when I was often pulled in to save benchmarking mishaps, and needed to share tips to avoid common mistakes. They were very successful (and thanks to those who read them) as the calls for help were greatly reduced.
- The storage appliance Dashboard where I used weather icons to highlight performance issues and convey ambiguity for those types of metric.
- The original ZFS L2ARC post and later L2ARC Screenshots. Since code changes were public each night, my block comment in usr/src/uts/common/fs/zfs/arc.c (added in Nov 2007) disguised the then-secret intent of this technology by listing "short-stroked disks" as the first intended device, instead of SSDs.
- My Solaris Performance: Introduction slides (PDF) from May 2007, covering Solaris performance features and observability.
- The companion to Solaris Internals 2nd Edition: Solaris Performance and Tools, with Richard McDougall and Jim Mauro, Prentice Hall 2006 (ISBN 0131568191). These chapters began during development of Solaris Internals 2nd Edition, and were later split into a separate companion volume. It worked well: a reference book on internals, and a companion book for practitioners on performance.
- My Solaris 10 Zones page: the first showing how to configure Solaris Zones with Resource Controls (which I had figured out the hard way).
- A page on DTrace, where I described and shared early scripts I was developing, and a page on the DTraceToolkit.
- Older and out of date Unix or Sun Solaris material is in the Sun Crypt for historic interest.
- My plenary session at USENIX/LISA 2013: Blazing Performance with Flame Graphs (90 mins).
- A talk for BayLISA October 2013 to describe and launch the Systems Performance book (60 mins).
- A lightning talk for Surge 2013 on Benchmarking Gone Wrong, which includes the craziest line graph I've ever seen (~5 mins).
- The New Systems Performance, a meetup talk I gave in 2013 about modern systems performance (23 mins).
- My OSCON 2013 talk on Open Source Systems Performance, a tale of three parts (youtube; slideshare) (32 mins).
- My Stop the Guessing: Performance Methodologies for Production Systems talk at Velocity 2013 (youtube; slideshare) (46 mins).
- At SCaLE11x (2013) I gave a talk on Linux Performance Analysis and Tools, summarizing basic to advanced analysis tools, and including some methodologies (youtube; slideshare; blog) (60 mins).
- My LISA 2012 talk on Performance Analysis Methodology named and summarized 10 methodologies (youtube; usenix; slideshare; blog) (86 mins).
- ZFS: Performance Analysis and Tools at zfsday was probably my best talk of 2012 (youtube; slideshare; blog) (43 mins).
- At illumosday I gave a talk on DTracing the Cloud, showing what can be done (youtube; slideshare) (44 mins).
- At FISL'13 (2012) I gave a talk on The USE Method for systems performance analysis, including some other methods for comparison (youtube; slideshare; blog) (56 mins).
- My talk at Surge'12 on Real-time in the real world with Bryan Cantrill is online (youtube; slideshare; blog) (56 mins).
- At dtrace.conf(12) I gave an unconference-style talk on various Visualizations (youtube; blog) (35 mins).
- My SCaLE10x talk (2012) on Performance Analysis: new tools and concepts from the cloud, with example problems (youtube; slideshare; blog) (1 hr).
- Short talks on performance tools for Solaris-based operating systems: vmstat, mpstat, and load averages, filmed during 2011.
- A long version of my Percona Live New York 2011 talk: Breaking Down MySQL/Percona Query Latency With DTrace (youtube; blog) (90 mins).
- My LISA 2010 talk on Visualizations for Performance, which explains the need for heat maps (youtube; usenix; slideshare; blog) (80 mins).
- I've given many technical talks of things going right. This was about things going wrong: Little Shop of Performance Horrors at FROSUG in Colorado, 2009 (youtube; blog) (2.5 hours).
- Shouting in the Datacenter (youtube; blog) was a video that Bryan and I made on the spur of the moment on New Year's Eve 2008, which went viral (750,000+ views). I've had many emails from companies about it: it has spawned an industry of sound proofing data centers (2 mins). There is also a making of (youtube) video (5 mins).
These are some small software projects I've developed in my spare time. (I've also developed software as a professional kernel engineer, which isn't listed below, including the ZFS L2ARC.) The following are open source with no warranty – use at your own risk. Some are computer security tools, which may be illegal to own or run in your country if they are misidentified as cracking tools.
Unix/Linux - C
- Dump2PNG visualizes file data as a PNG (uses libpng). An experimental tool intended for core dump analysis. screenshot.
- nicstat network interface stats for Solaris (uses Kstat). example. There is also a Perl version, and Tim Cook added Linux support.
- Fastburden is a performance testing tool that can generate a flood of client web traffic (multithreaded), from Apache or Squid access logs. example.
Linux - tracing
- ktap Examples for the new lua-based Linux dynamic tracing tool. Page including one-liners, tools, and flame graphs.
Solaris/Mac OS X/FreeBSD - DTrace
- DTraceToolkit a collection of over 200 scripts, with man pages and example files.
- DTrace Tools original versions of iosnoop, opensnoop, bitesize.d, execsnoop, shellsnoop, tcpsnoop, iotop, ...
- DTraceTazTool a GUI to plot live disk activity.
Unix/Linux/Windows - Perl
- FlameGraph: a visualization for sampled stack traces, used for performance analysis (see the posts for Flame Graphs via: DTrace, perf/SystemTap).
- Chaosreader: Trace TCP/UDP sessions and fetch application data from snoop or tcpdump logs. This will fetch telnet sessions, FTP files, HTTP transfers, SMTP emails, ... example output was created by Chaosreader to link to the extracted HTTP sections, telnet sessions, and FTP files found in a snoop log. This can also create telnet replay programs that play back sessions in realtime: example. A tool for forensics or network troubleshooting. download code.
- Perl modules: Net::SnoopLog for snoop packet logs (RFC1761), Net::TcpDumpLog for tcpdump/libpcap logs, Algorithm::Hamming::Perl.
- Distillerror summarizes truss(1) or strace(1) output to highlight errors. See the Solaris example, the Red Hat example, or a larger example.
- FreqCount is a simple frequency counter. Useful for processing logs (most common IP addr, port, etc..). example.
- PortPing is a version of ping that connects using ssh (or other ports), not ICMP. Good for checking firewalls. example.
- MTUfinder tests different sized HTTP requests to a web server, highlighting MTU size problems. example.
- Specials is a collection of "special" programs for system administrators. Mostly Perl.
Solaris - Perl/C
- K9Toolkit A collection of Perl programs for Solaris that use Sun::Solaris::KStat. This includes tools to print load averages for CPU, memory, disks and network, to aid finding performance bottlenecks.
- FindBill finds backup super blocks on a Solaris UFS for "fsck -o b=..." (if "newfs -N" dosen't help). example.
- listprusage a C program to print process resource usage statistics such as minor faults and syscalls by PID. Solaris (uses procfs). Example here.
Unix/Linux - Bourne/Korn Shell
- DtkshDemos a collection of X11 dtksh scripts. They include xvmstat - a GUI version of vmstat, and xplot - a generic data plotter. Written for any OS with dtksh.
- BBaseline is a small script to create a baseline of the system's performance, by logging the output of several tools. By creating logs during normal and peak activity, this can assist performance tuning. Easy to customize the baseline content, and to grep the baselines. See the example output.
- total is a simple awk script to sum a field (example), and field prints a field (example). These exist for convenience at the shell.
Windows - Delphi
- Quick Text Toaster v1.0 An editor I wrote many years ago to grab text from corrupted files. Works with executables, documents, etc.
MSDOS - QBASIC
- QBASIC CRO v1.2 I still find this old program amusing. It is a digital (on/off) CRO that samples the parallel port at 1KHz. screenshot.
- Guessing Game is written in awk C C++ csh Fortran java ksh Pascal Perl QBASIC sh and more as a language comparison.
- Sun Crypt has some of my older Solaris and Unix software, including the performance analysis tools Psio for disk I/O by-process, and CacheKit for hardware and software cache analysis.