USE Method: Mac OS X Performance Checklist

This is my example USE Method-based performance checklist for the Apple Mac OS X operating system, for identifying common bottlenecks and errors. This draws upon both command line and graphical tools for coverage, focusing where possible on those that are provided with the OS by default, or by Apple (eg, Instruments). Further notes about tools are provided after this table.

Some of the metrics are easy to find in various GUIs or from the command line (eg, using Terminal; if you've never used Terminal before, follow my instructions at the top of this post). Many metrics require some math, inference, or quite a bit of digging. This will hopefully get easier in the future, as tools include a USE method wizard or the metrics required to follow this easily.

Physical Resources, Standard

CPUutilizationsystem-wide: iostat 1, "us" + "sy"; per-cpu: DTrace [1]; Activity Monitor → CPU Usage or Floating CPU Window; per-process: top -o cpu, "%CPU"; Activity Monitor → Activity Monitor, "%CPU"; per-kernel-thread: DTrace profile stack()
CPUsaturationsystem-wide: uptime, "load averages" > CPU count; latency, "SCHEDULER" and "INTERRUPTS"; per-cpu: dispqlen.d (DTT), non-zero "value"; runocc.d (DTT), non-zero "%runocc"; per-process: Instruments → Thread States, "On run queue"; DTrace [2]
CPUerrorsdmesg; /var/log/system.log; Instruments → Counters, for PMC and whatever error counters are supported (eg, thermal throttling)
Memory capacityutilizationsystem-wide: vm_stat 1, main memory free = "free" + "inactive", in units of pages; Activity Monitor → Activity Monitor → System Memory, "Free" for main memory; per-process: top -o rsize, "RSIZE" is resident main memory size, "VSIZE" is virtual memory size; ps -alx, "RSS" is resident set size, "SZ" is virtual memory size; ps aux similar (legacy format)
Memory capacitysaturationsystem-wide: vm_stat 1, "pageout"; per-process: anonpgpid.d (DTT), DTrace vminfo:::anonpgin [3] (frequent anonpgin == pain); Instruments → Memory Monitor, high rate of "Page Ins" and "Page Outs"; sysctl vm.memory_pressure [4]
Memory capacityerrorsSystem Information → Hardware → Memory, "Status" for physical failures; DTrace failed malloc()s
Network Interfacesutilizationsystem-wide: netstat -i 1, assume one very busy interface and use input/output "bytes" / known max (note: includes localhost traffic); per-interface: netstat -I interface 1, input/output "bytes" / known max; Activity Monitor → Activity Monitor → Network, "Data received/sec" "Data sent/sec" / known max (note: includes localhost traffic); atMonitor, interface percent
Network Interfacessaturationsystem-wide: netstat -s, for saturation related metrics, eg netstat -s | egrep 'retrans|overflow|full|out of space|no bufs'; per-interface: DTrace
Network Interfaceserrorssystem-wide: netstat -s | grep bad, for various metrics; per-interface: netstat -i, "Ierrs", "Oerrs" (eg, late collisions), "Colls" [5]
Storage device I/Outilizationsystem-wide: iostat 1, "KB/t" and "tps" are rough usage stats [6]; DTrace could be used to calculate a percent busy, using io provider probes; atMonitor, "disk0" is percent busy; per-process: iosnoop (DTT), shows usage; iotop (DTT), has -P for percent I/O
Storage device I/Osaturationsystem-wide: iopending (DTT)
Storage device I/OerrorsDTrace io:::done probe when /args[0]->b_error == 0/
Storage capacityutilizationfile systems: df -h; swap: sysctl vm.swapusage, for swap file usage; Activity Monitor → Activity Monitor → System Memory, "Swap used"
Storage capacitysaturationnot sure this one makes sense - once its full, ENOSPC
Storage capacityerrorsDTrace; /var/log/system.log file system full messages

Physical Resources, Advanced

GPUutilizationdirectly: DTrace [7]; atMonitor, "gpu"; indirect: Temperature Monitor; atMonitor, "gput"
GPUsaturationDTrace [7]; Instruments → OpenGL Driver, "Client GLWait Time" (maybe)
GPUerrorsDTrace [7]
Storage controllerutilizationiostat 1, compare to known IOPS/tput limits per-card
Storage controllersaturationDTrace and look for kernel queueing
Storage controllererrorsDTrace the driver
Network controllerutilizationsystem-wide: netstat -i 1, assume one busy controller and examine input/output "bytes" / known max (note: includes localhost traffic)
Network controllersaturationsee network interface saturation
Network controllererrorssee network interface errors
CPU interconnectutilizationfor multi-processor systems, try Instruments → Counters, and relevent PMCs for CPU interconnect port I/O, and measure throughput / max
CPU interconnectsaturationInstruments → Counters, and relevent PMCs for stall cycles
CPU interconnecterrorsInstruments → Counters, and relevent PMCs for whatever is available
Memory interconnectutilizationInstruments → Counters, and relevent PMCs for memory bus throughput / max, or, measure CPI and treat, say, 5+ as high utilization; Shark had "Processor bandwidth analysis" as a feature, which either was or included memory bus throughput, but I never used it
Memory interconnectsaturationInstruments → Counters, and relevent PMCs for stall cycles
Memory interconnecterrorsInstruments → Counters, and relevent PMCs for whatever is available
I/O interconnectutilizationInstruments → Counters, and relevent PMCs for tput / max if available; inference via known tput from iostat/...
I/O interconnectsaturationInstruments → Counters, and relevent PMCs for stall cycles
I/O interconnecterrorsInstruments → Counters, and relevent PMCs for whatever is available

Software Resources

Kernel mutexutilizationDTrace and lockstat provider for held times
Kernel mutexsaturationDTrace and lockstat provider for contention times [8]
Kernel mutexerrorsDTrace and fbt provider for return probes and error status
User mutexutilizationplockstat -H (held time); DTrace plockstat provider
User mutexsaturationplockstat -C (contention); DTrace plockstat provider
User mutexerrorsDTrace plockstat and pid providers, for EDEADLK, EINVAL, ... see pthread_mutex_lock(3C)
Process capacityutilizationcurrent/max using: ps -e | wc -l / sysctl kern.maxproc; top, "Processes:" also shows current
Process capacitysaturationnot sure this makes sense
Process capacityerrors"can't fork()" messages
File descriptorsutilizationsystem-wide: sysctl kern.num_files / sysctl kern.maxfiles; per-process: can figure out using lsof and ulimit -n
File descriptorssaturationI don't think this one makes sense, as if it can't allocate or expand the array, it errors; see fdalloc()
File descriptorserrorsdtruss or custom DTrace to look for errno == EMFILE on syscalls returning fds (eg, open(), accept(), ...)

Other Tools

I didn't include fs_usage, sc_usage, sample, spindump, heap, vmmap, malloc_history, leaks, and other useful Mac OS X performance tools, as here I'm beginning with questions (the methodology) and only including tools that answer them. This is instead of the other way around: listing all the tools and trying to find a use for them. Those other tools are useful for other methodologies, which can be used after this one.

What's Next

See the USE Method for the follow-up methodologies after identifying a possible bottleneck. If you complete this checklist but still have a performance issue, move onto other methodologies: drill-down analysis and latency analysis.

For more performance analysis, also see my earlier post on Top 10 DTrace Scripts for Mac OS X.


Resources used:

Filling this this checklist has required a lot of research, testing and experimentation. Please reference back to this post if it helps you develop related material.

It's quite possible I've missed something or included the wrong metric somewhere (sorry); I'll update the post to fix these up as they are understood, and note at the top the update date.

Also see my USE method performance checklists for Solaris, SmartOS, Linux, and FreeBSD.

Last updated: 09-Feb-2014