Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

DTracing in Anger

14 Nov 2012

I originally posted this at http://dtrace.org/blogs/brendan/2012/11/14/dtracing-in-anger.

My Macbook has becomeso sluggish that it feels like I'm typing ove a 9600 baud modem aagn. Or 2400. It's alo droping keystokes – which is irritatng as hll – so please forgive theapparent tyos and mistakes. It comes and goes each minute, so thiswhole post isn't too bad.

Usually I cn see what's wrng using my Top 10 DTrace scripts for Mac OS X, which have a focus on disk and file system issues. That's not the casehere.

Since this isinterferig wth localapplications, specifically, my keystokes as I type in vim, my suspects are CPU related. top(1) and DTrace profiling tells me this running:

# dtrace -n 'profile-99 { @[execname] = count(); } tick-1s { trunc(@, 5); printa(@); trunc(@); }'
dtrace: description 'profile-99 ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  4  16339                         :tick-1s 
  configd                                                          39
  firefox                                                          46
  Google Chrome H                                                  55
  SystemUIServer                                                   96  ← who r u??
  kernel_task                                                     444

Ok, so lets start with SystemUIServer, although I have no idea what it is, and there is no man page. A quick Internet search tells me it manages menu bar popups, and device removal (CDROMs, DVDs, USB). Suggestions include:

  • reboot
  • logout and log back in
  • run killall SystemUIServer to kill it, and allow it to restart automatically.
  • use command-R at startup and reinstall Lion OS X. "Everything solved."

That doesn't sound scientific. The only suggested root-cause I saw wasthat it had a "memory leak".

Lets look using DTrace.

First is to profile the on-CPU usage:

# dtrace -n 'profile-97 /execname == "SystemUIServer"/ { @[ustack()] = count(); }'
dtrace: description 'profile-97 ' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 28: profile:::profile-97): invalid address (0x118ca2f00) in action #2
dtrace: error on enabled probe ID 1 (ID 28: profile:::profile-97): invalid address (0x7fff50b4b1f0) in action #2
dtrace: error on enabled probe ID 1 (ID 28: profile:::profile-97): invalid address (0x7fff50b4a130) in action #2
dtrace: error on enabled probe ID 1 (ID 28: profile:::profile-97): invalid address (0x7fff50b4afc0) in action #2
[...]

Ow. This was supposed to sample on-CPU user-level stacks, but I've hit the profiling stack-trace bug currently present on Mac OS X. Once this is fixed, this will be a lot easier to debug, as I can quickly roll Flame Graphs.

Now you get to see how I work around this on Mac OS X.

One level of stack works:

# dtrace -n 'profile-97 /execname == "SystemUIServer"/ { @[ustack(1)] = count(); }'
dtrace: description 'profile-97 ' matched 1 probe
^C
[...]
              CoreFoundation`CFRelease+0x46
                9

              libsystem_kernel.dylib`__workq_kernreturn+0xa
               10

              libobjc.A.dylib`objc::DenseMap, objc::DenseMapInfo >::LookupBucketFor(objc_object* const&, std::__1::pair*&) const+0x3a
               11

              libsystem_c.dylib`OSAtomicCompareAndSwap64Barrier$VARIANT$mp+0x8
               12

              libsystem_kernel.dylib`kevent+0xa
               26

              libsystem_kernel.dylib`mach_msg_trap+0xa
              151

This tells me mach_msg_trap() was on-CPU more than anyone else. Really wish I could see the full sttack trace, but can't with profiling due to that bug.

I can with the pid provider. This will begin to slow the target due to the ovrheads of pid tracing and user-level stack collection, but given how woefully slow this laptop is already, I don't mind.

# dtrace -n 'pid$target::mach_msg_trap:entry { @[ustack()] = count(); }' -p 177
dtrace: description 'pid$target::mach_msg_trap:entry ' matched 2 probes
^C
[...]
              libsystem_kernel.dylib`mach_msg_trap
              libsystem_kernel.dylib`mach_msg+0x46
              IOKit`io_iterator_next+0x4c
              IOKit`IOIteratorNext+0x11
              SystemConfiguration`findMatchingInterfaces+0xa4
              SystemConfiguration`_SCNetworkInterfaceCreateWithEntity+0x261
              SystemConfiguration`SCNetworkServiceGetInterface+0xab
              CoreWLAN`copyActiveWLANNetworkServices+0x115
              CoreWLAN`copyPrimaryActiveWLANNetworkService+0xc
              CoreWLAN`-[CWIPMonitor networkServiceID]+0x64
              CoreWLAN`-[CWIPMonitor ipv4StateConfig]+0x4d
              CoreWLAN`-[CWIPMonitor ipv4Routable]+0x21
              AirPort`0x000000011287e25f+0x30b
              Foundation`__NSThreadPerformPerform+0xe1
              CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__+0x11
              CoreFoundation`__CFRunLoopDoSources0+0xf5
              CoreFoundation`__CFRunLoopRun+0x315
              CoreFoundation`CFRunLoopRunSpecific+0x122
              HIToolbox`RunCurrentEventLoopInMode+0xd1
              HIToolbox`ReceiveNextEventCommon+0x164
              676

Many stacks like the above were printed. They look truncated - they should go all the way back to thread creation.

Increasing stack levels:

# dtrace -x ustackframes=100 -n 'pid$target::mach_msg_trap:entry { @[ustack()] = count(); }' -p 177
dtrace: description 'pid$target::mach_msg_trap:entry ' matched 2 probes
[...]
              libsystem_kernel.dylib`mach_msg_trap
              libsystem_kernel.dylib`mach_msg+0x46
              IOKit`io_iterator_next+0x4c
              IOKit`IOIteratorNext+0x11
              SystemConfiguration`findMatchingInterfaces+0xa4
              SystemConfiguration`_SCNetworkInterfaceCreateWithEntity+0x261
              SystemConfiguration`SCNetworkServiceGetInterface+0xab
              CoreWLAN`copyActiveWLANNetworkServices+0x115
              CoreWLAN`copyPrimaryActiveWLANNetworkService+0xc
              CoreWLAN`-[CWIPMonitor networkServiceID]+0x64
              CoreWLAN`-[CWIPMonitor ipv4StateConfig]+0x4d
              CoreWLAN`-[CWIPMonitor ipv4Routable]+0x21
              AirPort`0x000000011287e25f+0x30b
              Foundation`__NSThreadPerformPerform+0xe1
              CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__+0x11
              CoreFoundation`__CFRunLoopDoSources0+0xf5
              CoreFoundation`__CFRunLoopRun+0x315
              CoreFoundation`CFRunLoopRunSpecific+0x122
              HIToolbox`RunCurrentEventLoopInMode+0xd1
              HIToolbox`ReceiveNextEventCommon+0xa6
              HIToolbox`BlockUntilNextEventMatchingListInMode+0x3e
              AppKit`_DPSNextEvent+0x2ad
              AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]+0x80
              AppKit`-[NSApplication run]+0x205
              SystemUIServer`0x000000010f0c747c+0x76
              libdyld.dylib`start
              SystemUIServer`0x1
             3010

That's better.

Airport? CoreWLAN? ipv4Routable? Is SystemUIServer doing something dumb with my WiFi?

Since that's just one stack out of hundreds, it's hard to know if this is the problem or a problem, without reading all the other stacks.

Fortunately there is a quick way to processall those stacks: Flame Graphs – not for thread samples, but for these function tracing stacks. Should still work.

# dtrace -x ustackframes=100 -n 'pid$target::mach_msg_trap:entry { @[ustack()] = count(); } tick-30s { exit(0); }' -p 177 -o out.SystemUIServer_stacks
dtrace: description 'pid$target::mach_msg_trap:entry ' matched 3 probes
# ~/Git/FlameGraph/stackcollapse.pl out.SystemUIServer_stacks2 | ~/Git/FlameGraph/flamegraph.pl > out.SystemUIServer_stacks.svg

The resulting SVG is:

Click for the interactive version. This confirmed that the mach_msg_trap()s were mostly AirPort.

I can dig around te stack more using dynamic tracing; taknig a quick look for strings to see if it sheds any light:

# dtrace -n 'pid$target::io_service_get_matching_services:entry { @[copyinstr(arg1)] = count(); }' -p 177
dtrace: description 'pid$target::io_service_get_matching_services:entry ' matched 1 probe
^C
  <dict><key>BSD Name</key><string>en1</string></dict>               51
  <dict><key>BSD Name</key><string>en2</string></dict>               51
  <dict><key>IOProviderClass</key><string>IOSerialBSDClient</string><key>IOTTYBaseName</key><string>Bluetooth-Modem</string></dict>               51
  <dict><key>IOProviderClass</key><string>IOSerialBSDClient</string><key>IOTTYDevice</key><string>Bluetooth-Modem</string></dict>               51
  <dict><key>BSD Name</key><string>en0</string></dict>               52
  <dict><key>BSD Name</key><string>fw0</string></dict>               52
  <dict><key>IOProviderClass</key><string>IOBluetoothHCIController</string></dict>              102
  <dict><key>IOProviderClass</key><string>IO80211Interface</string></dict>
       1917

Given that I'm not really doing anything with the WiFi, that it's so busy managing interfaces is starting to look like a bug. Possibly a memory leak too, as SystemUIServer is at 600 Mbytes RSS.

Opening the WiFi menu results in muchspinning beachball. As anexerient, I'l tun off WiFi.

...

Hooray! I can type again!

Turning WiFi back on returned the issue soon after. I did resort to the "killall SystemUIServer", which appears to have returned the laptop back to normal, while on WiFi.

I can keeping digging further in the stack when this issue returns, and browse the parts of this that are open source on Apple's website, and probably put my finger on the root cause. But this has been a distraction - I have real work to get back to doing.

And hey WindowServer – don't make me sic the pony on you, too.