Brendan D. Gregg
G'Day. I use this homepage to share stuff - some hobbies and my work with computers. These days I work on performance and live in the San Francisco Bay Area. I have two blogs: personal and work. I'm also on twitter and linkedin. Here is my bio and anti-bio.
Documentation
- The slide deck for my Linux Performance Analysis and Tools talk at SCaLE11x (2013) (PDF).
- A summary of three virtualization technologies in Virtualization Performance: Zones, KVM, Xen, focusing on I/O path overheads.
- The Thinking Methodically about Performance article for ACMQ, based on my earlier USE Method articles. This was also published in CACM.
- For illumosday and zfsday, my slide decks for DTracing the Cloud (PDF) and ZFS Performance Analysis and Tools (PDF).
- The introduction of a new visualization type: Subsecond Offset Heat Maps, which allow behavior within a second to be seen.
- The USE Method, which I developed for identifying common system bottlenecks and errors, and have used successfully for many years in enterprise and cloud performance environments. Based on the USE method: the Linux Performance Checklist, the Solaris Performance Checklist, and the SmartOS Performance Checklist.
- The Flame Graph visualization and using them for Linux Kernel Performance analysis.
- Performance visualizations: Device Utilization, and a series of three (so far) on Visualizing the Cloud, Process Snapshots and Process Execution.
- Narrow topics in operating system performance: Activity of the ZFS ARC.
- A long post about Using SystemTap on the Ubuntu and CentOS Linux distributions.
- An article to introduce the technique of Off-CPU Performance Analysis, which can identify the cause of high latency due to blocking events.
- Top 10 DTrace Scripts for Mac OS X performance analysis and troubleshooting, written to reach the broader Mac OS X community. This includes step by step instructions on how to find and run the Terminal application and sudo.
- A series of blog posts on File System latency, using MySQL as an example application (1, 2, 3, 4, 5).
- MySQL Query Latency using DTrace.
- A series of blog posts on the DTrace pid provider, going beyond what was covered in the DTrace book.
- The DTrace book with Jim Mauro, Prentice Hall 2011 (ISBN 0132091518). A sample chapter on File Systems is online. This 1152 page book took over a year to write, including the research, development and testing of dozens of new DTrace scripts and one-liners, and soliciting input from many experts.
- The Visualizations for Performance Analysis slide deck, for USENIX/LISA 2010. This illustrates visualizations for performance analysis, showing what is effective and becoming important for cloud computing environments.
- An article for ACMQ, also published by CACM, on Visualizing System latency. This includes interesting latency heat maps I had found, including the Rainbow Pterodactyl and the Icy Lake.
- A series of posts on performance testing a series of storage appliances (1, 2, 3). These were written at a time when I was often pulled in to save benchmarking mishaps, and needed to share tips to avoid common mistakes. They were very successful (and thanks to those who read them) as the calls for help were greatly reduced.
- The storage appliance Dashboard where I used weather icons to highlight performance issues and convey ambiguity for those types of metric.
- The original ZFS L2ARC post and later L2ARC Screenshots. Since code changes were public each night, my block comment in usr/src/uts/common/fs/zfs/arc.c (added in Nov 2007) disguised the then-secret intent of this technology by listing "short-stroked disks" as the first intended device, instead of SSDs.
- The companion to Solaris Internals 2nd Edition: Solaris Performance and Tools, with Richard McDougall and Jim Mauro, Prentice Hall 2006 (ISBN 0131568191).
- My Solaris 10 Zones page: the first showing how to configure Solaris Zones with Resource Controls (which I had figured out the hard way).
- A page on DTrace, where I described and shared early scripts I was developing, and the DTraceToolkit.
- Older and out of date Unix or Sun Solaris material is in the Sun Crypt for historic interest.
Videos
- At SCaLE11x (2013) I gave a talk on Linux Performance Analysis and Tools, summarizing basic to advanced tools and some methodologies (youtube; slideshare; blog) (60 mins).
- My LISA 2012 talk on Performance Analysis Methodology named and summarized 10 methodologies (usenix; slideshare; blog) (86 mins).
- ZFS: Performance Analysis and Tools at zfsday was probably my best talk of 2012 (youtube; slideshare; blog) (43 mins).
- At illumosday I gave a talk on DTracing the Cloud, showing what can be done (youtube; slideshare) (44 mins).
- At FISL'13 (2012) I gave a talk on The USE Method for systems performance analysis, including some other meethods for comparison (youtube; slideshare; blog) (56 mins).
- My talk at Surge'12 on Real-time in the real world with Bryan Cantrill is online (youtube; slideshare; blog) (56 mins).
- At dtrace.conf(12) I gave an unconference-style talk on Visualizations, summarizing heat maps and Flame Graphs (youtube; blog) (35 mins).
- My SCaLE10x talk (2012) on Performance Analysis: new tools and concepts from the cloud, covering some example performance problems (youtube; slideshare; blog) (1 hr).
- Short talks on performance tools for Solaris-based operating systems: vmstat, mpstat, and load averages, filmed during 2011.
- An extended version of my Percona Live New York 2011 talk on Breaking Down MySQL/Percona Query Latency With DTrace was filmed (youtube; blog) (90 mins).
- My LISA 2010 talk on Visualizations for Performance, which explains the need for heat maps (youtube; usenix; slideshare; blog) (80 mins).
- I've given many technical talks of things going right. This was about things going wrong: Little Shop of Performance Horrors at FROSUG in Colorado, 2009 (youtube; blog) (2.5 hours).
- Shouting in the Datacenter (youtube; blog) was a video that Bryan and I made on the spur of the moment on New Year's Eve 2008, which went viral (750,000+ views). I've had many emails from companies about it: it spawned an industry of sound proofing data centers (2 mins). There is also a making of (youtube) video (5 mins).
Software
These are some small software projects I've developed in my spare time. (I've also developed software as a professional kernel engineer, which isn't listed below, including the ZFS L2ARC.) The following are open source with no warranty – use at your own risk. Some are computer security tools, which may be illegal to own or run in your country if they are misidentified as cracking tools.
Unix/Linux - C
- Dump2PNG visualizes file data as a PNG (uses libpng). An experimental tool intended for core dump analysis. screenshot.
- nicstat prints network interface stats including Kbytes/sec for Solaris (uses Kstat). Example here. There is also a Perl version, and Tim Cook added Linux support.
- Fastburden is a performance testing tool that can generate a flood of client web traffic (multithreaded). Apache or Squid access logs can also be replayed. example.
Solaris/Mac OS X/FreeBSD - DTrace
- DTrace Tools such as iosnoop, opensnoop, bitesize.d, execsnoop, shellsnoop, tcpsnoop, ...
- DTraceToolkit a collection of over 200 scripts, with man pages and example files.
- DTraceTazTool a GUI to plot live disk activity.
Unix/Linux/Windows - Perl
- FlameGraph: a visualization for sampled stack traces, used for performance analysis (see the posts for Flame Graphs via: DTrace, perf/SystemTap).
- Chaosreader: Trace TCP/UDP sessions and fetch application data from snoop or tcpdump logs. This will fetch telnet sessions, FTP files, HTTP transfers, SMTP emails, ... example output was created by Chaosreader to link to the extracted HTTP sections, telnet sessions, and FTP files found in a snoop log. This can also create telnet replay programs that play back sessions in realtime: example. A tool for forensics or network troubleshooting. download code.
- Perl modules: Net::SnoopLog for snoop packet logs (RFC1761), Net::TcpDumpLog for tcpdump/libpcap logs, Algorithm::Hamming::Perl.
- Distillerror summarizes truss(1) or strace(1) output to highlight errors. See the Solaris example, the Red Hat example, or a larger example.
- FreqCount is a simple frequency counter. Useful for processing logs (most common IP addr, port, etc..). example.
- PortPing is a version of ping that connects using ssh (or other ports), not ICMP. Good for checking firewalls. example.
- MTUfinder tests different sized HTTP requests to a web server, highlighting MTU size problems. example.
- Specials is a collection of "special" programs for system administrators. Mostly Perl.
Solaris - Perl/C
- K9Toolkit A collection of Perl programs for Solaris that use Sun::Solaris::KStat. This includes tools to print load averages for CPU, memory, disks and network, to aid finding performance bottlenecks.
- FindBill finds backup super blocks on a Solaris UFS for "fsck -o b=..." (if "newfs -N" dosen't help). example.
- listprusage a C program to print process resource usage statistics such as minor faults and syscalls by PID. Solaris (uses procfs). Example here.
Unix/Linux - Bourne/Korn Shell
- DtkshDemos a collection of X11 dtksh scripts. They include xvmstat - a GUI version of vmstat, and xplot - a generic data plotter. Written for any OS with dtksh.
- BBaseline is a small script to create a baseline of the system's performance, by logging the output of several tools. By creating logs during normal and peak activity, this can assist performance tuning. Easy to customize the baseline content, and to grep the baselines. See the example output.
- total is a simple awk script to sum a field (example), and field prints a field (example). These exist for convenience at the shell.
Windows - Delphi
- Quick Text Toaster v1.0 An editor I wrote many years ago to grab text from corrupted files. Works with executables, documents, etc.
MSDOS - QBASIC
- QBASIC CRO v1.2 I still find this old program amusing. It is a digital (on/off) CRO that samples the parallel port at 1KHz. screenshot.
Other
- Guessing Game is written in awk C C++ csh Fortran java ksh Pascal Perl QBASIC sh and more as a language comparison.
- Sun Crypt has some of my older Solaris and Unix software, including the performance analysis tools Psio for disk I/O by-process, and CacheKit for hardware and software cache analysis.
Misc
Last updated: 17-Mar-2013 (docs, videos)
Email address: click here