Systems Performance 2nd Ed.

BPF Performance Tools book

Recent posts:
Blog index

USENIX LISA 2012: Performance Analysis Methodology

13 Dec 2012

I originally posted this at

At USENIX LISA 2012, I gave a talk titled Performance Analysis Methodology. This covered ten performance analysis anti-methodologies and methodologies, including the USE Method. I wrote about these in the ACMQ article Thinking Methodically about Performance, which is worth reading for more detail. I've also posted USE Method-derived checklists for Solaris- and Linux-based systems.

The video of the talk is on the LISA site, and the slides are below, also available as a PDF.

I've summarized the methodologies in the talk below.

Methodology Summaries

Blame-Someone-Else Anti-Method:

  1. Find a system or environment component you are not responsible for
  2. Hypothesize that the issue is with that component
  3. Redirect the issue to the responsible team
  4. When proven wrong, go to 1

Streetlight Anti-Method:

  1. Pick observability tools that are
    • familiar found on the Internet found at random
  2. Run tools
  3. Look for obvious issues

Ad Hoc Checklist Method:

  1. ..N. Run A, if B, do C

Problem Statement Method:

  1. What makes you think there is a performance problem?
  2. Has this system ever performed well?
  3. What has changed recently? (Software? Hardware? Load?)
  4. Can the performance degradation be expressed in terms of latency or run time?
  5. Does the problem affect other people or applications
(or is it just you)?
  6. What is the environment? What software and hardware is used? Versions? Configuration?

Scientific Method:

  1. Question
  2. Hypothesis
  3. Prediction
  4. Test
  5. Analysis

Workload Characterization Method:

  1. Who is causing the load? PID, UID, IP addr, ...
  2. Why is the load called? code path
  3. What is the load? IOPS, tput, type
  4. How is the load changing over time?

Drill-Down Analysis Method:

  1. Start at highest level
  2. Examine next-level details
  3. Pick most interesting breakdown
  4. If problem unsolved, go to 2

Latency Analysis Method:

  1. Measure operation time (latency)
  2. Divide into logical synchronous components
  3. Continue division until latency origin is identified
  4. Quantify: estimate speedup if problem fixed

USE Method:

For every resource, check:

  1. Utilization
  2. Saturation
  3. Errors

Stack Profile Method:

  1. Profile thread stack traces (on- and off-CPU)
  2. Coalesce
  3. Study stacks bottom-up