Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

The Greatest Tool that Never Worked: har

27 May 2013

I originally posted this at http://dtrace.org/blogs/brendan/2013/05/27/the-greatest-tool-that-never-worked-har.

har is the Hardware Activity Reporter. I've never seen it work, but it did help me solve a crucial performance issue.

I saw har's output in an article written in 2001 by Frédéric Parienté, the tool's author:

# har -r 1 3
mips    bus     cpi     dcm     icm     ecm     dsr     isr     fsr     bsr
2       0.0     3.1     33.7    25.1    15.4    15.6    31.9    0.0     3.1
0       0.0     3.6     31.3    22.8    1.0     24.4    25.4    0.1     6.1
0       0.0     3.3     34.2    21.9    0.0     24.6    19.6    0.0     5.3

har produces a rolling output like vmstat. The har metrics include:

buspercentage utilization of address bus from CPUs  
cpicycles per instruction
dcmdata-cache misses
ecmexternal-cache misses
dsrdata stall rate

While I could access many of these metrics from other tools, har made them much easier to examine. But what really caught my eye was the "bus" metric – showing address bus utilization. I previously didn't know that this was even possible to measure.

In 2009 I was working on the first ZFS-based storage product, and tuning its performance to be the best in the industry (our competitors included NetApp and EMC). To determine performance limiters, I was working through the USE Method. Checking resource types such as CPU and disks was easy; busses and interconnects were harder.

Using the system functional diagram (pictured right) as a checklist, I knew I had checked all physical components except the busses. Based on the known aggregated throughput to the I/O devices, we weren't expecting the I/O busses to be an issue. But what about the memory busses and the HyperTrasport-based CPU interconnect?

I measured cycles-per-instruction (CPI), which reported over 11 under load. This is high, suggesting a memory bus issue.

That's when I remembered har - which could report address bus (or memory bus) utilization directly.

har didn't work on this platform.

But the memory of har – a tool I'd never run – taught me that I wasn't crazy for wanting to measure this metric. It was possible on another platform. Could it also be measured here? What would it take to port har to this platform? (Assuming I could find the source code - I only had binaries.)

This motivated me to dig through the AMD BIOS and Kernel Developer's Guide and learn more about the CPU performance counters, knowing that my time spent had a chance of paying off. It wasn't easy, and required careful testing, but I did eventually develop tools for measuring the throughput and utilization of all the busses: I/O, memory, and CPU interconnect. These tools included amd64htcpu:

walu# ./amd64htcpu 1
     Socket  HT0 TX MB/s  HT1 TX MB/s  HT2 TX MB/s  HT3 TX MB/s
          0      3170.82       595.28      2504.15         0.00
          1      2738.99      2051.82       562.56         0.00
          2      2218.48         0.00      2588.43         0.00
          3      2193.74      1852.61         0.00         0.00
[...]

Each CPU had four HyperTransport 1 (HT) ports, and the transmit (TX) for each is reported by this tool. It showed that it was the CPU interconnect that was approaching its limit. The decoder for this tool is the following, showing what each number measures:

     Socket  HT0 TX MB/s  HT1 TX MB/s  HT2 TX MB/s  HT3 TX MB/s
          0       CPU0-1        MCP55       CPU0-2         0.00
          1       CPU1-0       CPU1-3         IO55         0.00
          2       CPU2-3       CPU2-3       CPU2-0         0.00
          3       CPU3-2       CPU3-1       CPU3-2         0.00

Upgrading the system to HyperTransport 3 improved performance between 25% and 75% (I wrote about this previously, including more on amd64htcpu). While the USE Method identified that I should be examining the memory bus, it was the har screenshot that suggested that it was possible.

If you like my amd64htcpu tool (script here), the bad news is that it probably won't work for you, as it is for a particular platform and OS. But har didn't work for me, either. It's valuable to know that measuring a certain metric is even possible: think of these screenshots as proof-of-concepts.

To learn about more tools that probably don't work for you (especially if you're using Linux or Windows today), I recommend my book on DTrace, which contains hundreds of tools and screenshots showing what dynamic tracing can do. Many of these won't even work on a given version of Solaris without tweaking. The book's value may not lie in the tools, but in the ideas that they encompass, and the screenshots that convey them. Just like har - the greatest tool that never worked (for me).

UPDATE: Frédéric Parienté commented:

"Hi Brendan, the HAR source code is available at https://kenai.com/projects/har as long as Project Kenai remains up-n-running. AFAIK HAR itself is still distributed as part of dimStat at http://dimitrik.free.fr/. Hope that you will eventually see it work ;-) It needs porting to more recent hardware though. HTH, Frederic."