Brendan D. Gregg
G'Day. I use this site to share and bookmark various things, mostly my work with computers. While I currently work on large scale cloud computing performance at Netflix, this site reflects my own opinions and work from over the years. I have a personal blog, and I'm also on twitter. Here is my bio and anti-bio.
For a short selection of my most popular content, see my portfolio page. For everything, see the sitemap.
Documentation
Documents I've written, in approximately reverse chronological order:
- A page to summarize my Linux Performance related material.
- Slides for my DockerCon 2017 talk on Container Performance Analysis, where I showed how to find bottlenecks in the host vs the container, how to profiler container apps, and dig deeper into the kernel (slideshare, youtube). (2017).
- Slides for my SCaLE15x talk on Linux 4.x Tracing: Performance Analysis with bcc/BPF, where I also included a ply demo for the first time (slideshare, youtube, PDF) (2017).
- Slides for my BSidesSF 2017 talk with Alex Maestretti on Linux Monitoring at Scale with eBPF (slideshare, youtube).
- I posted Where has my disk space gone? Flame graphs for file systems and Flame Graphs vs Tree Maps vs Sunburst (2017).
- Slides for my Linux.conf.au 2017 talk on BPF: Tracing and More, summarizing other uses for enhanced BPF. (slideshare, youtube, PDF).
- A post on Golang bcc/eBPF Function Tracing, where I figured out how to trace functions and arguments from the different Go compilers (2017).
- A page to summarize eBPF Tools using Linux eBPF and the bcc front end for advanced observability and tracing tools (2016+).
- Slides for my USENIX LISA 2016 talk Linux 4.x Tracing Tools: Using BPF Superpowers, which focused on the bcc tools I've been developing. This included my demo Give me 15 minutes and I'll change your view of Linux tracing. (slideshare, PDF, youtube demo, youtube talk).
- DTrace for Linux 2016, announcing that the Linux kernel now has similar raw capabilities as DTrace in Linux 4.9 via enhanced BPF. I've been heavily involved in this project, especially as the number one user, and it was great to reach this milestone. A long time coming.
- Slides for a talk at the first sysdig conference on Designing Tracing Tools (slideshare, youtube, PDF) (2016).
- I wrote the original bcc/BPF end user tutorial, Python developer tutorial, and reference guide (2016).
- Several posts introducing new Linux bcc/BPF tracing tools: bcc/BPF Tracing Security Capabilities, bcc/BPF MySQL Slow Query Tracing, bcc/BPF ext4 Latency Tracing, Linux bcc/BPF Run Queue (Scheduler) Latency, Linux bcc/BPF Node.js USDT Tracing, Linux bcc tcptop, Linux 4.9's Efficient BPF-based Profiler, Linux bcc/BPF tcplife: TCP Lifespans (2016).
- My JavaOne 2016 slides for Java Performance Analysis on Linux with Flame Graphs (slides only: slideshare, PDF), and a follow-up post on Java Warmup Analysis with Flame Graphs.
- A post on gdb Debugging Full Example (Tutorial): ncurses where I shared a full debugging session, including all output and dead ends. It includes a little ftrace and BPF (2016).
- My keynote slides for ACM Applicative 2016 on System Methodology: Holistic Performance Analysis on Modern Systems, where I used several different operating systems as examples (slideshare, PDF, youtube).
- A post demonstrating new capabilities by llnode for Node.js Memory Leak Analysis (2016).
- A post on Linux Hist Triggers in Linux 4.7 demonstrating this new tracing feature (2016).
- For PerconaLive 2016, slides for my Linux Systems Performance 2016 overview talk (slideshare, PDF, video).
- The Flame Graph article for ACMQ/CACM that defines flame graphs, describes their origin, explains how to interpret them, and discusses possible future developments (2016).
- For SREcon 2016 Santa Clara, slides for my Performance Checklists for SREs talk, which was also my first talk about my recent SRE (Site Reliability Engineering) work at Netflix (blog, slideshare, PDF, youtube, usenix).
- A post on Working at Netflix 2016, as a follow-on from my 2015 post. This is still worth talking about: freedom and responsibility, outstanding and professional coworkers, etc. It differs from many other Silicon Valley companies.
- For Facebook's Performance @Scale conference, slides for my Linux BPF Superpowers talk, which introduced the tracing capabilities of this new feature in the Linux 4.x series (slideshare, PDF, video) (2016).
- More Linux BPF/bcc posts showing how to analyze off-CPU time and drill deeper: Linux eBPF Off-CPU Flame Graph, Linux Wakeup and Off-Wake Profiling, Who is waking the waker? (Linux chain graph prototype) (2016).
- A post on Unikernel Profiling: Flame Graphs from dom0, where I showed that observability was indeed possible with some engineering work, as some were believing otherwise (2016).
- Slides for my Broken Linux Performance Tools SCaLE14x talk, similar to my earlier QCon talk but for Linux only, and including a bit more advice (slideshare, PDF, video) (2016).
- Linux Performance Analysis in 60,000 Milliseconds shows the first ten commands one can use (video, PDF). Written by myself and the performance engineering team at Netflix (2015).
- Slides for my QConSF 2015 talk Broken Performance Tools highlighting common pitfalls with system metrics, tools, and methodologies for generic Linux/Unix systems (slideshare, PDF, video) (2015).
- At JavaOne 2015, I gave a talk on Java Mixed-Mode Flame Graphs, utilizing the new -XX:+PreserveFramePointer feature in JDK8u60 (blog, slideshare, youtube, PDF) (2015).
- Using eBPF via bcc, tcpconnect and tcpaccept (bcc), and eBPF Stack Trace Hack (bcc), which show some new Linux tracing performance tools I've developed that use eBPF via the bcc frontend (2015-6).
- For the Netflix Tech Blog I posted Java in Flames (PDF) with Martin Spier, which shows mixed-mode flame graphs using the new -XX:+PreserveFramePointer JDK option. Great to see all CPU consumers in one visualization (2015).
- A summary, and recommendations, for navigating the different Linux tracers: Choosing a Linux Tracer (2015).
- Some posts on uprobes: Linux uprobe: User-Level Dynamic Tracing, which demonstrates uprobes via my uprobe tool, and Hacking Linux USDT with Ftrace (2015).
- My Netflix Instance Performance Analysis Requirements talk for Monitorama 2015, where I showed desirable and undesirable features of these products. This is intended for the numerous vendors who keep trying to sell me these products, and, for customers who can use this talk as a source of feature requests. (slideshare, PDF, vimeo)
- My first post on Linux eBPF, which is bringing in-kernel maps to Linux tracing (2015).
- My slides for Linux Performance Tools tutorial at Velocity 2015, which was an expanded version of my earlier talk on the same topic. This is the most detailed version I've done (slideshare, PDF, youtube).
- My Linux Profiling at Netflix talk for SCALE 13x (2015), where I covered getting CPU profiling to work, including for Java and Node.js, and a tour of other perf_events features (slideshare, PDF, youtube).
- A post about Working at Netflix, describing the culture. This is worth writing about, as Netflix is pioneering with company culture as well as technology, and showing that culture can be engineered to be positive (2015).
- For LISA2014, my Linux Performance Analysis: New Tools and Old Secrets talk, where I covered ftrace and perf_events tools I've recently developed (slideshare, PDF, youtube, USENIX).
- My talk on Performance Tuning Linux Instances on EC2 from AWS re:Invent 2014, where I covered how Netflix selects, tunes, and then observes the performance of Linux cloud instances (slideshare, youtube).
- A post introducing Differential Flame Graphs, which can be used for performance regression analysis. I also wrote about CPI flame graphs, which uses the differential flame graph code, and pmcstat on FreeBSD (2014).
- My Flame Graphs on FreeBSD talk, for the FreeBSD Dev and Vendor Summit 2014. I summarized the different types, with the FreeBSD commands to create them (slideshare, PDF, youtube).
- For my first BSD conference, MeetBSDCA 2014, my Performance Analysis for BSD talk, where I discussed 5 facets: observability, methodologies, benchmarking, profiling, and tuning (slideshare, PDF, youtube).
- Contributed documentation on the DTrace on FreeBSD wiki page: the initial one-liners list and a 12 part tutorial.
- My Linux Performance Tools talk for LinuxCon Europe 2014, where I summarized observability, benchmarking, tuning, static perf tuning tools, and tracing. This was an updated version of my earlier LinuxCon talk on the same topic (slideshare, PDF, youtube).
- For the 2014 Tracing Summit, my From DTrace to Linux talk, summarizing what Linux can learn from DTrace (slideshare, PDF, youtube).
- My Surge 2014 talk From Clouds to Roots, on how Netflix does root cause performance analysis on a Linux cloud. It's my most comprehensive performance analysis talk to date. Instead of just focusing on low-level tools, I provided context and then showed the full path from clouds to roots. (slideshare, PDF, youtube).
- My post The MSRs of EC2, where I showed how CPU Model Specific Registers can be used to measure the real CPU clock rate and temperature in Xen guests (2014).
- Slides from my LinuxCon North America 2014 talk Linux Performance Tools, which summarizes performance observability, benchmarking, and tuning tools, and illustrates their role on Linux system functional diagrams (PDF).
- Posts summarizing Linux Java CPU Flame Graphs and Node.js CPU Flame Graphs (2014).
- My lwn.net article Ftrace: The Hidden Light Switch (2014).
- Posts describing perf-tools based on Linux perf_events and ftrace (both core Linux kernel tracers): perf Hacktogram, iosnoop, iosnoop Latency Heat Maps, opensnoop, execsnoop, tcpretrans (2014).
- Posts about Linux perf_events: perf CPU Sampling, perf Static Tracepoints, perf Heat Maps, perf Counting, perf Kernel Line Tracing (2014).
- A page for perf Examples with perf_events, the standard Linux profiler. Page includes one-liners and flame graphs.
- I wrote a warning post titled strace Wow Much Syscall, which discusses strace(1) for production use, includes an interesting example, and many bad strace-related jokes (2014).
- Two posts to explain Xen modes on AWS EC2: What Color is Your Xen and Xen Feature Detection (2014).
- The Benchmark Paradox: a short post explaining a seeming paradox in benchmark evaluations (2014).
- My Analyzing OS X Systems Performance with the USE Method talk at MacIT 2014 (PDF).
- At SCaLE12x (2014) I gave the keynote on What Linux can learn from Solaris perf. and vice-versa (PDF, youtube).
- The Case of the Clumsy Kernel (PDF): a kernel performance analysis article for USENIX ;login (2013).
- My USENIX/LISA 2013 slides Blazing Performance with Flame Graphs, was two talks in one: part 1 covered the commonly used CPU flame graphs, and part 2 covered various advanced flame graphs (PDF, youtube).
- A page of ktap Examples for the lua-based Linux dynamic tracing tool, including one liners and tools (no longer maintained) (2013).
- The TSA Method, a performance analysis methodology for identifying issues causing poor application performance. This is a thread-oriented methodology, and is complementary to the resource-oriented USE Method. It has solved countless issues.
- A page of my Performance Analysis Methodology summaries, and links.
- Systems Performance: Enterprise and the Cloud, Prentice Hall, 2013 (ISBN 0133390098). This book covers new developments in systems performance: in particular, dynamic tracing and cloud computing. It also introduces many new methodologies to help a wide audience get started. It leads with Linux examples from Ubuntu, Fedora, and CentOS, and also covers illumos distributions. Covering two different kernels provides additional perspective that enhances the reader's understanding of each. The book is 635 pages plus appendices.
- My slides for a brief talk on The New Systems Performance, where I summarized how the topic has changed from the 1990's to today (July 2013, PDF, youtube).
- Active Benchmarking: a methodology for successful benchmarking, and an example of its use for Bonnie++.
- My OSCON 2013 slides for Open Source Systems Performance, where I provided a unique perspective I'm best positioned to give about both open- and close-sourcing software, and what this means for systems performance analysis (PDF, youtube).
- Visualizing distributions using Frequency Trails, explained in the Introduction, then using them for Detecting Outliers, measuring Modes and Modality, and What the Mean Really Means.
- My slides for Stop the Guessing: Performance Methodologies for Production Systems talk at Velocity (2013) (PDF, youtube).
- The very popular slide deck for my Linux Performance Analysis and Tools talk at SCaLE11x (2013), which includes lesser known tools such as perf's dynamic tracing and static trace points. I've been told people want slide 16 on a coffee cup! (slideshare, PDF, youtube).
- A summary of Virtualization Performance: Zones, KVM, Xen, focusing on I/O path overheads (2013) (PDF).
- The Thinking Methodically about Performance article for ACMQ (2012), and CACM, based on my earlier USE Method articles.
- USENIX/LISA 2012 slides on Performance Analysis Methodology, summarizing ten methods and anti-methods (slideshare, PDF, youtube).
- For illumosday and zfsday, my slides for DTracing the Cloud (PDF, youtube) and ZFS Performance Analysis and Tools (PDF, youtube).
- The introduction of a new visualization type: Subsecond Offset Heat Maps, which allow behavior within a second to be seen.
- The USE Method, which I developed for identifying common system bottlenecks and errors, and have used successfully for many years in enterprise and cloud performance environments. Based on the USE method: the Linux Performance Checklist, the Solaris Performance Checklist, the SmartOS Performance Checklist, the Mac OS X Performance Checklist, the FreeBSD Performance Checklist, and the Unix 7th Edition Performance Checklist. There is also the USE Method Rosetta Stone of Performance Checklists.
- The Flame Graph visualization and using them for CPU, Memory, and Off-CPU analysis. The CPU page summarizes how different tools can be used to collect the profile data, including DTrace, and the different Linux profilers: perf, SystemTap, and ktap.
- Colony Graphs, a visualization of computer life forms, and their use for Visualizing the Cloud, Process Snapshots and Process Execution.
- Demonstrations of different visualizations for Device Utilization, which was described as blog post of the year (2011).
- Narrow topics in operating system performance: Activity of the ZFS ARC.
- A long post about Using SystemTap on the Ubuntu and CentOS Linux distributions, written in late 2011.
- An introduction the technique of Off-CPU Performance Analysis, which can identify the cause of high latency due to blocking events.
- Top 10 DTrace Scripts for Mac OS X performance analysis and troubleshooting, written to reach the broader Mac OS X community. This includes step by step instructions on how to find and run the Terminal application and sudo (PDF).
- A series of blog posts on File System latency, using MySQL as an example application (1, 2, 3, 4, 5) (2011).
- MySQL Query Latency using DTrace (2011).
- A series of blog posts on the DTrace pid provider, going beyond what was covered in the DTrace book (2011).
- The DTrace book with Jim Mauro (Prentice Hall, 2011; ISBN 0132091518). A sample chapter on File Systems is online. This 1152 page book took over a year to write, including the research, development and testing of dozens of new DTrace scripts and one-liners, and soliciting input from many experts. Solaris was used as the primary OS for examples, with additional examples from Mac OS X and FreeBSD. The most difficult challenge for using a dynamic tracing tool (DTrace, SystemTap, etc.) is knowing what to do with it. This book provides over one hundred use cases (scripts), which will be invaluable even after the example code becomes out of date.
- A page on Heat Maps, and a demonstration of Latency Heat Maps which includes example software to generate them.
- Slides for my Percona Live New York 2011 talk on Breaking Down MySQL/Percona Query Latency With DTrace (PDF).
- The Visualizations for Performance Analysis slide deck, USENIX/LISA 2010. This describes two different approaches (methodologies) for systems performance: workload analysis and latency analysis, the metrics used, and then introduces a variety of heat map visualizations. This talk ends by describing the challenges of cloud computing, and how heat maps are well suited for the scale of data (PDF).
- Slides by Jim Mauro and myself for our How to Build Better Applications With Oracle Solaris DTrace (PDF) talk at Oracle OpenWorld 2010.
- An article for ACMQ, also published by CACM, on Visualizing System latency (2010). This includes interesting latency heat maps I had found, including the Rainbow Pterodactyl and the Icy Lake.
- My DTrace Cheatsheet, summarizing probes, variables, and actions. Inspired by the mdb cheatsheet by Jonathan Adams. (2009).
- A series of posts on performance testing a line of storage appliances (1, 2, 3). I wrote these in 2009, when I was often saving benchmarking mishaps. They were very successful (and thanks to those who read them) as the calls for help were greatly reduced.
- For the Front Range OpenSolaris User Group (FROSUG), in Denver, Colorado, my slides for my Little Shop of Performance Horrors talk, where I discussed things going wrong instead of right. It was a lot of fun, and people showed up despite a massive snow storm (2009).
- The storage appliance dashboard where I used weather icons to highlight performance issues and convey ambiguity for certain metrics (2008).
- Slides for Fishworks Analytics (PDF) for CEC2008 with Bryan Cantrill, where we launched a storage appliance performance analysis tool that was many years ahead of the industry. A real-time dynamic tracing GUI, latency heat maps, etc.
- Slides for Fishworks Overview (PDF) at CEC2008 with Cindi McGuire, where we introduced the first ZFS-based storage appliance, the Sun Storage 7000 series. Fishworks was the team that developed it. We worked at a private site, setup to mimic a San Francisco startup.
- The original ZFS L2ARC post (2008) and later L2ARC Screenshots (2009). Since code changes were public each night, my block comment in usr/src/uts/common/fs/zfs/arc.c (added in Nov 2007) disguised the then-secret intent of this technology by listing "short-stroked disks" as the first intended device, instead of SSDs.
- My Solaris Performance: Introduction slides (PDF) from May 2007, covering Solaris performance features and observability. This includes two of my methodologies for performance analysis: the "By-Layer Strategy" and the "3-Metric Strategy" (back when I spelled utilization with an "s"). The latter strategy is what I later called the USE Method.
- Slide decks (PDFs) from DTrace talks in 2007: DTrace Intro, DTraceToolkit, and DTrace Java.
- The companion to Solaris Internals 2nd Edition: Solaris Performance and Tools, with Richard McDougall and Jim Mauro (Prentice Hall, 2006; ISBN 0131568191). These chapters began during development of Solaris Internals 2nd Edition, and were later split into a separate companion volume. It worked well: a reference book on internals, and a companion book for practitioners on performance.
- My Solaris 10 Zones page from 2005: where I had figured out how to configure Solaris Zones with Resource Controls.
- A page on DTrace (created in 2004), where I shared early scripts I was developing, and the DTraceToolkit.
- My old or out of date Unix and Sun Solaris material is in the Crypt for historical interest. (circa 2005.)
- I have an old personal blog at bdgregg.blogspot.com where I discussed DTraceToolkit updates.
- I have two prior professional blogs: blogs.oracle.com/brendan (formally blogs.sun.com/brendan), where I discussed performance, DTrace, and the ZFS storage appliance (2006-2010). In case that vanishes one day, the posts are also available on dtrace.org/blogs/brendan, where I continued blogging about cloud performance and DTrace (2010-2014).
Videos
- My DockerCon 2017 talk on Container Performance Analysis, where I showed how to find bottlenecks in the host vs the container, how to profiler container apps, and dig deeper into the kernel (youtube, slideshare). (42 mins).
- My SCaLE15x talk on Linux 4.x Tracing: Performance Analysis with bcc/BPF, including a ply demo (youtube, slideshare) (2017).
- My BSidesSF talk with Alex Maestretti on Linux Monitoring at Scale with eBPF, including our diagram of events to monitor for intrusion detection (youtube, slideshare) (28 mins).
- My Linux.conf.au 2017 talk on BPF: Tracing and More, where I summarized other uses for enhanced BPF. (youtube, slideshare) (46 mins).
- My USENIX/LISA 2016 full talk Linux 4.x Tracing: Using BPF Superpowers (youtube, slideshare) (44 mins).
- At LISA 2016, my Give me 15 minutes and I'll change your view of Linux tracing demo, showing ftrace, perf, bcc/BPF (youtube) (18 mins).
- My talk at the first sysdig conference on Designing Tracing Tools (youtube, slideshare) (2016) (46 mins).
- My keynote talk for ACM Applicative 2016 on System Methodology: Holistic Performance Analysis on Modern Systems, where I used several different operating systems as examples (youtube, slideshare) (57 mins).
- For PerconaLive 2016, my Linux Systems Performance 2016 summary of this topic in 50 minutes (percona, slideshare) (50 mins).
- I gave the closing address at SREcon16 Santa Clara on Performance Checklists for SREs, which was also my first talk about my SRE (Site Reliability Engineering) work at Netflix (blog, youtube, usenix, slideshare) (61 mins).
- For Facebook's Performance @Scale conference, my Linux BPF Superpowers talk video where I introduced the tracing capabilities of this new feature in the Linux 4.x series (facebook, slideshare) (2016) (34 mins).
- Broken Linux Performance Tools for SCaLE14x, focusing on Linux problems with a bit more advice (youtube, slideshare) (2016) (1 hr).
- For QConSF 2015 my Broken Performance Tools talk highlighting common pitfalls with system metrics, tools, and methodologies for generic Linux/Unix systems; good quality video is synced with the slides on the infoq site (infoq, slideshare) (2015) (50 mins).
- My Monitorama 2015 talk Netflix Instance Performance Analysis Requirements, where I showed the different features that are desirable and undesirable for an instance analysis product, aimed at both vendors and customers (blog, vimeo, slideshare) (34 mins).
- My Velocity 2015 tutorial Linux Performance Tools, which summarizes performance observability, benchmarking, tuning, static performance tuning, and tracing tools. This is an expanded and more complete version of an earlier talk of mine, and I was able to include some live demos of tools and methodology. It should be useful for everyone working on Linux systems. (youtube, slideshare) (100 mins).
- At SCALE13x (2015), my Linux Profiling at Netflix talk on perf_events CPU profiling and features (blog, youtube, slideshare) (59 mins).
- For USENIX LISA14, I gave a talk about my Linux perf-tools collection, which is based on ftrace and perf_events: Linux Performance Analysis: New Tools and Old Secrets (youtube, slideshare, USENIX) (43 mins).
- My AWS re:Invent 2014 talk Performance Tuning EC2 Instances: selection, Linux tuning, observability (youtube, slideshare) (45 mins).
- I was interviewed by BSD Now about BSD and benchmarking in Episode 065: 8,000,000 Mogofoo-ops (youtube) (2014) (28 mins).
- My FreeBSD dev summit 2014 talk on Flame Graphs on FreeBSD. Since this talk wasn't videoed, I captured it using screenflow from my laptop. It's better than nothing, and shows my live demos well. (youtube) (53 mins).
- My MeetBSDCA 2014 talk Performance Analysis, summarizing 5 facets of perf analysis on BSD (youtube, slideshare) (53 mins).
- My popular talk on Linux Performance Tools, which quickly summarizes performance observability, benchmarking, tuning, static tuning, and tracing tools. This was for LinuxCon Europe 2014 (youtube, slideshare) (49 mins).
- For the Tracing Summit 2014, my talk From DTrace to Linux summarized lessons Linux tracing can learn (youtube, slideshare) (61 mins).
- My Surge 2014 talk From Clouds to Roots, showing how Netflix does perf analysis and the tools involved (youtube, slideshare) (56 mins).
- My SCaLE12x (2014) keynote on What Linux can learn from Solaris performance and vice-versa (youtube, slideshare) (60 mins).
- Deirdré Straughan has a youtube playlist named Brendan Gregg's Best, which has many of my talks (many of which she filmed).
- My plenary session at USENIX/LISA 2013: Blazing Performance with Flame Graphs (youtube, slideshare, usenix) (90 mins).
- A talk for BayLISA October 2013 to describe and launch the Systems Performance book (60 mins).
- A lightning talk for Surge 2013 on Benchmarking Gone Wrong, which includes the craziest line graph I've ever seen (~5 mins).
- The New Systems Performance, a meetup talk I gave in 2013 about modern systems performance (23 mins).
- My OSCON 2013 talk on Open Source Systems Performance, a tale of three parts (youtube, slideshare) (32 mins).
- My Stop the Guessing: Performance Methodologies for Production Systems talk at Velocity 2013 (youtube, slideshare) (46 mins).
- At SCaLE11x (2013) I gave a talk on Linux Performance Analysis and Tools, summarizing basic to advanced analysis tools, and including some methodologies (youtube, slideshare, blog) (60 mins).
- My LISA 2012 talk on Performance Analysis Methodology named and summarized 10 methods (youtube, usenix, slideshare, blog) (86 mins).
- ZFS: Performance Analysis and Tools at zfsday was probably my best talk of 2012 (youtube, slideshare, blog) (43 mins).
- At illumosday I gave a talk on DTracing the Cloud, showing what can be done (youtube, slideshare) (44 mins).
- At FISL'13 (2012) I gave a talk on The USE Method for systems performance analysis, including some other methods for comparison (youtube, slideshare, blog) (56 mins).
- My talk at Surge'12 on Real-time in the real world with Bryan Cantrill is online (youtube, slideshare) (56 mins).
- At dtrace.conf(12) I gave an unconference-style talk on various Visualizations (youtube, blog) (35 mins).
- My SCaLE10x talk (2012) on Performance Analysis: new tools and concepts from the cloud, with examples (youtube, slideshare, PDF) (1 hr).
- Short talks on performance tools for Solaris-based operating systems: vmstat, mpstat, and load averages, filmed during 2011.
- My extended Percona Live New York 2011 talk: Breaking Down MySQL/Percona Query Latency With DTrace (youtube, blog) (90 mins).
- My LISA 2010 talk on Visualizations for Performance, which explains the need for heat maps (youtube, usenix, slideshare, blog) (80 mins).
- At Oracle Open World 2010, I gave a talk on How to Build Better Applications with DTrace, which is on youtube: part 1, part 2. (64 mins).
- I've given many technical talks of things going right. This was about things going wrong: Little Shop of Performance Horrors at FROSUG in Colorado, 2009 (youtube, blog) (2.5 hours).
- Shouting in the Datacenter (youtube, blog) was a video that Bryan Cantrill and I made on the spur of the moment on New Year's Eve 2008, which went viral (1M+ views). I've had many emails about it: it has spawned an industry of sound proofing data centers (2 mins). There is also a making of (youtube) video (5 mins).
Software
The following are my spare time software projects, and are open source with no warranty – use at your own risk. Some are computer security tools, which may be illegal to own or run in your country if they are misidentified as cracking tools.
I've also developed software as a professional kernel engineer, which isn't listed below (eg, the ZFS L2ARC).
- perf-tools (github) is a collection of ftrace- and perf_events-based performance analysis tools for Linux.
- perf Examples for perf_events, the standard Linux profiler. Page including one-liners and flame graphs.
- eBPF Tools using Linux eBPF and the bcc front end for advanced observability and tracing tools.
- bcc tools (github) for bcc, I wrote many tracing tools for advanced performance analysis, implemented efficiently using BPF.
- ktap Examples for the lua-based Linux dynamic tracing tool, including one liners and tools (no longer maintained).
- msr-cloud-tools model specific register observability tools intended for cloud instances.
- DTrace Tools for FreeBSD.
- DTrace book scripts from the DTrace book, which demonstrates many new uses of dynamic tracing.
- DTraceToolkit a collection of over 200 scripts, with man pages and example files (no longer maintained).
- DTrace Tools original versions of iosnoop, opensnoop, bitesize.d, execsnoop, shellsnoop, tcpsnoop, iotop, ...
- Dump2PNG visualizes file data as a PNG (uses libpng). An experimental tool intended for core dump analysis. screenshot.
- nicstat network interface stats for Solaris (uses Kstat). example. There is also a Perl version, and Tim Cook added Linux support.
- FlameGraph: a visualization for sampled stack traces, used for performance analysis. See the Flame Graphs page for an explanation.
- HeatMap: an program for generating interactive SVG heat maps from trace data. See the page about it.
- Chaosreader: Trace TCP/UDP sessions and fetch application data from snoop or tcpdump logs. This will fetch telnet sessions, FTP files, HTTP transfers, SMTP emails, ... The following example output was created by Chaosreader to link to the extracted HTTP sections, telnet sessions, and FTP files found in a snoop log. This can also create telnet replay programs that play back sessions in realtime: example. A tool for forensics or network troubleshooting. download code (github).
- Perl modules: Net::SnoopLog for snoop packet logs (RFC1761), Net::TcpDumpLog for tcpdump/libpcap logs, Algorithm::Hamming::Perl.
- FreqCount is a simple frequency counter. Useful for processing logs (most common IP addr, port, etc..). example.
- PortPing is a version of ping that connects using ssh (or other ports), not ICMP. Good for checking firewalls. example.
- MTUfinder tests different sized HTTP requests to a web server, highlighting MTU size problems. example.
- Specials is a collection of "special" programs for system administrators. Mostly Perl.
- DtkshDemos a collection of X11 dtksh scripts. They include xvmstat - a GUI version of vmstat, and xplot - a generic data plotter. Written for any OS with dtksh.
- BBaseline is a small script to create a baseline of the system's performance, by logging the output of several tools. By creating logs during normal and peak activity, this can assist performance tuning. Easy to customize, and to grep the baselines. See the example.
- total is a simple awk script to sum a field (example); field prints a field (example). These exist for convenience at the shell.
- Quick Text Toaster v1.0 An editor I wrote many years ago to grab text from corrupted files. Works with executables, documents, etc.
- QBASIC CRO v1.2 I still find this old program amusing. It is a digital (on/off) CRO that samples the parallel port at 1KHz. screenshot.
- Guessing Game is written in awk C C++ csh Fortran java ksh Pascal Perl QBASIC sh and more as a language comparison.
- The Crypt has some of my older Solaris and Unix software, including the K9Toolkit collection of kstat-based performance tools, Psio for disk I/O by-process, and CacheKit for hardware and software cache analysis.
Misc
- Recommended Reading: A list of my favourite technology books.
- Other Sites: Other interesting places on the web.
- Photos: Some photos I've taken.
- Games: My favourite computer games.
- Full Site Map: A list of links to everything here.


