This is an example of performing troubleshooting using DTrace
from Solaris 10.
Note: This document is not written by Sun, and has been based on beta Solaris 10.
A great tool to try early on is prstat with microstate accounting,
# prstat -m PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP 1969 root 2.8 1.7 0.0 0.0 0.0 52 43 0.0 177 38 .2M 0 java/21 14422 root 0.1 0.4 0.0 0.0 0.0 0.0 99 0.0 39 0 487 0 prstat/1 551 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 19 0 35 0 dtgreet/1 550 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 17 0 30 0 dtgreet/1 461 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 6 0 36 0 Xvnc/1 552 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 16 0 30 0 dtgreet/1 463 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 7 0 42 0 Xvnc/1 455 root 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 6 0 36 0 Xvnc/1 [...] |
From the above, 0.2 million system calls looks interesting.
Lets take a closer look at that java process by using DTrace to count the number of each system call used. We trace syscalls from processes with the name "java", as those processes appear to be causing all the system calls (there is more than one "java" process). I run DTrace during an entire SMC first boot,
# dtrace -n 'syscall:::entry /execname == "java"/ { @Num[probefunc] = count(); }'
^C
putmsg 1
issetugid 1
getgid 1
rexit 2
chmod 2
kill 2
shutdown 2
getpeername 3
pathconf 3
[...]
open 8640
llseek 8671
fstat64 9906
lwp_cond_wait 16048
resolvepath 21061
lwp_mutex_timedlock 21912
lwp_mutex_wakeup 22084
close 27603
ioctl 40783
stat64 62643
pollsys 141273
write 1102115
read 12934468
|
The above shows that most of the system calls are read()s. Since there is not an equally large number of lseek()s, we can assume that these reads are mostly sequential and not random reads.
Now we take a closer look at those reads, in particular the attempted size of each read. This time DTrace is run for several minutes during boot,
# dtrace -n 'syscall::read:entry /execname == "java"/ { @Size = quantize(arg2); }'
^C
value ------------- Distribution ------------- count
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@@@ 4302108
2 |@@@@@ 1032476
4 |@@@@@@@@ 1516412
8 |@@@ 625137
16 | 40540
32 |@ 164393
64 |@ 104476
128 | 2
256 | 0
[...]
|
This shows that many of those reads are requesting 1 byte!
Often it's also interesting to check the read returns, to verify the successful bytes read. The following DTrace command does this, and is also run for several minutes,
# dtrace -n 'syscall::read:return /execname == "java"/ { @Size = quantize(arg0); }'
^C
value ------------- Distribution ------------- count
-2 | 0
-1 | 1063
0 | 566
1 |@@@@@@@@@@@@@@@@@@@@@@@ 3156897
2 |@@@@@ 693066
4 |@@@@@@@ 1030087
8 |@@@ 458795
16 | 20322
32 |@ 98572
64 |@ 77944
128 | 178
256 | 284
[...]
|
and in this case produces similar results to the requested reads.
There are many ways DTrace can be used to check what these reads are for, we begin by grabbing the file descriptor and doing an aggregate on the pathname. The following standalone dtrace script is run during the entire boot,
# cat filereads.d
/* Trace open filenames */
syscall::open:entry /execname == "java"/ {
self->file = copyinstr(arg0);
}
/* Create lookup table of filedes -> filename */
syscall::open:return /execname == "java" && self->file != ""/ {
path[pid,arg0] = self->file;
self->file=0;
}
/* Count file reads by filename */
syscall::read:entry /execname == "java" && path[pid,arg0] != ""/ {
@num[path[pid,arg0]] = count();
}
#
# dtrace -s filereads.d
dtrace: script 'filereads.d' matched 3 probes
^C
/var/sadm/smc/toolboxes/smc/smc.gif 1
/var/sadm/smc/toolboxes/this_computer/services_16.gif 1
/var/sadm/smc/toolboxes/this_computer/services_32.gif 1
/var/sadm/smc/toolboxes/this_computer/devices_16.gif 1
/var/sadm/smc/toolboxes/this_computer/devices_32.gif 1
[...]
/usr/sadm/lib/volmgr/VVolMgr.jar 102
/usr/sadm/lib/wbem.jar 103
/var/sadm/wbem/logr/Snapshot.1 136
/usr/openwin/lib/X11/fonts/TrueType/Arial.ttf 187
/usr/openwin/lib/X11/fonts/TrueType/Arial-Bold.ttf 188
/usr/openwin/lib/X11/fonts/TrueType/TimesNewRoman.ttf 193
/usr/sadm/lib/smc/lib/dtds/viperbean_1_0.dtd 205
/usr/openwin/lib/X11/fonts/TrueType/Symbol.ttf 326
/usr/j2se/jre/lib/fonts/LucidaSansRegular.ttf 431
/etc/syslog.conf 1001
/usr/sadm/lib/smc/lib/dtds/toolbox.dtd 1116
/usr/lib//liblayout.so 1443
/var/sadm/wbem/logr/store 1611
/usr/sadm/lib/wbem/store 2231
/usr/j2se/jre/lib/sparc/libfontmanager.so 4758
/var/sadm/smc/properties/classlist.txt 61201
/var/sadm/smc/properties/registry.ser 12716152
|
A lot of files were read (a dozen screen fulls), which in itself may be of interest here; however 12.7 million reads of the registry.ser file certainly stands out.
Lets check that file size,
# ls -l /var/sadm/smc/properties/registry.ser -rw-r--r-- 1 root root 72676 Jan 17 17:01 /var/sadm/smc/properties/registry.ser |
So this is a 72 kilobyte file that is being read() at least 12 million times, a majority of those reads are sequential and about 1 byte in size.
I should note that it is likely that there are some very good reasons why SMC needs to call all these read()s, DTrace has merely suggested why the process takes some time - it may in fact be wrong to jump to conclusions and assume that this is a problem.
DTrace can check more carefully for this specific case - the requested read() sizes for the /var/sadm/smc/properties/registry.ser file only. An entire SMC first boot is traced,
# cat regreads.d
/* Trace open filenames */
syscall::open:entry
/execname == "java"/
{
self->file = copyinstr(arg0);
}
/* Create lookup table of filedes -> filename */
syscall::open:return
/execname == "java" && self->file != ""/
{
path[pid,arg0] = self->file;
self->file=0;
}
/* Measure file reads by this particular file */
syscall::read:entry
/execname == "java" && path[pid,arg0] == "/var/sadm/smc/properties/registry.ser"/
{
@Size = quantize(arg2);
}
#
# dtrace -s regreads.d
dtrace: script 'regreads.d' matched 3 probes
^C
value ------------- Distribution ------------- count
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@@@ 7070709
2 |@@@@@ 1646078
4 |@@@@@@@@ 2428123
8 |@@@ 1031849
16 | 58258
32 |@ 248439
64 |@ 173775
128 | 0
256 | 0
512 | 0
1024 | 2
2048 | 2
4096 | 0
|
This confirms that this one file is being read() using mostly 1 byte reads.
DTrace has access to nanosecond timestamps that can be used to measure the time during system calls, user functions, etc.
So far we have found read()s on the registry.ser file interesting, so the following DTrace script has been specifically written to gather more information on these events. It sums the time for each type of system call and inserts three extra catagories - "read:registry.ser" are the reads of the registry.ser file only (the other reads exclude that time), "TOTAL:" is for the total time for all system calls, and "DURATION:" is the time of this DTrace sample (the boot time). This is run for an entire boot,
# cat time.d
dtrace:::BEGIN {
self->begin = timestamp;
}
syscall::open:entry
/execname == "java"/
{
self->file = copyinstr(arg0);
}
syscall::open:return
/execname == "java" && self->file != ""/
{
path[pid,arg0] = self->file;
self->file = 0;
}
syscall::read:entry
/execname == "java" && path[pid,arg0] == "/var/sadm/smc/properties/registry.ser"/
{
self->registry = vtimestamp;
}
syscall:::entry
/execname == "java" && self->registry == 0/
{
self->start = vtimestamp;
}
syscall::read:return
/self->registry/
{
this->time = vtimestamp - self->registry;
@Time["read:registry.ser"] = sum(this->time);
@Time["TOTAL:"] = sum(this->time);
self->registry = 0;
}
syscall:::return
/self->start/
{
this->time = vtimestamp - self->start;
@Time[probefunc] = sum(this->time);
@Time["TOTAL:"] = sum(this->time);
self->start = 0;
}
dtrace:::END {
@Time["DURATION:"] = sum(timestamp - self->begin);
}
#
#
# dtrace -b 128m -s time.d
^C
getgid 800
issetugid 800
gtime 3800
sigpending 6400
c2audit 8800
getuid 12200
fstat 14400
[...]
getdents64 36732600
munmap 49234400
lwp_mutex_wakeup 52178200
fdsync 55799000
fstat64 62206600
unlink 68512000
ioctl 123305400
stat64 143406600
lwp_mutex_timedlock 204307400
rename 257776400
lwp_cond_wait 279028200
open 441507800
close 450650000
fork1 490807800
resolvepath 497607000
exece 508040800
pollsys 2163482200
read 4057674600
write 15972839600
read:registry.ser 93894490000
TOTAL: 120157772400
DURATION: 1827896864000
|
The above report shows that during this 30 minute sample, system calls accounted for a total of 120 seconds on the CPU, and read()s on the registry.ser file accounted for a total of 94 seconds on the CPU - which is 78% of the system call time. There values were in DURATION:, TOTAL: and read:registry.ser respectively.
Now to get a better understanding of the nature of these read()s, we can try printing out the stack backtrace, here we use jstack() - which prints a user stack backtrace with some Java translations. These stack backtraces are also aggregated, so we can see what the most common stack backtrace is,
# cat jstack.d
syscall::open:entry
/execname == "java"/
{
self->file = copyinstr(arg0);
}
syscall::open:return
/execname == "java" && self->file != ""/
{
path[pid,arg0] = self->file;
self->file=0;
}
syscall::read:entry
/execname == "java" && path[pid,arg0] == "/var/sadm/smc/properties/registry.ser"/
{
@Stack[jstack()] = count();
}
#
# dtrace -s jstack.d
^C
[...]
0xf9c05c64
0xf9c00118
libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_p\
nGThread__v_+0x274 libjvm.so`__1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_n\
LJNICallType_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_+0x218
libjvm.so`jni_CallStaticVoidMethod+0x13c
java`main+0x158c
java`_start+0x108
347325
libc.so.1`_read+0x8
libjvm.so`JVM_Read+0x68
libjava.so`0xfea71e1c
libjava.so`Java_java_io_FileInputStream_read+0x28
0xf9dca918
0xf9cbc2fc
0xf9cc05a0
0xf9cbd1bc
0xf9cc6cf0
0xf9ccbb40
0xf9cc0998
0xf9ccf1bc
0xf9cc0940
0xf9cbd1bc
0xf9cc6cf0
0xf9ccbb40
0xf9cc0998
0xf9ccf1bc
0xf9cc0940
0xf9ccf1bc
0xf9cc0940
0xf9cd5a74
0xf9c05b10
0xf9c05b10
0xf9e5a730
0xf9e86f70
0xf9c05c64
0xf9c05c64
0xf9c05c64
0xf9c05c64
0xf9c05c64
0xf9c05c64
0xf9c00118
libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_p\
nGThread__v_+0x274 libjvm.so`__1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_n\
LJNICallType_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_+0x218
libjvm.so`jni_CallStaticVoidMethod+0x13c
java`main+0x158c
java`_start+0x108
367591
|
The last stack backtrace was the most common, occuring 367591 times. This information would be of particular use to the developers to identify which parts of their code were causing these read()s.
DTrace also allows us to use tools that let us examine new areas of system behaviour. For example, execsnoop is used to capture an entire boot,
# ./execsnoop -v | tee execsnoop.out
STRTIME UID PID PPID CMD
2005 Jan 17 17:32:57 0 1934 1932 /bin/ksh /usr/sbin/smc
2005 Jan 17 17:32:57 0 1936 1935 /usr/bin/sh /usr/bin/dirname /usr/sbin/smc
[...]
2005 Jan 17 17:34:02 0 3751 3730 fgrep -c [ /var/sadm/smc/smcreg
2005 Jan 17 17:34:02 0 3752 1950 /usr/bin/chmod +x /var/sadm/smc/scripts/SUNWdclnt.reg/SUNWdclnt_
2005 Jan 17 17:34:02 0 3753 1950 /bin/sh /usr/sadm/lib/smc/prereg/SUNWcsmc/SUNWcsmc_reg.sh
2005 Jan 17 17:34:02 0 3754 3753 /bin/ksh /usr/sadm/bin/smcregister library -n SysLogServiceBean_
2005 Jan 17 17:34:02 0 3756 3755 /usr/bin/sh /usr/bin/dirname /usr/sadm/bin/smcregister
2005 Jan 17 17:34:02 0 3756 3755 /usr/bin/expr /usr/sadm/bin/smcregister/ : \(/\)/*[^/]*//*$ | /u
2005 Jan 17 17:34:02 0 3757 3754 /usr/bin/sh /usr/bin/basename /usr/sadm/bin/smcregister
2005 Jan 17 17:34:02 0 3757 3754 /usr/bin/expr //usr/sadm/bin/smcregister : \(.*[^/]\)/*$ : .*/\(
2005 Jan 17 17:34:02 0 3754 3753 /bin/ksh /usr/sadm/lib/smc/bin/smcregister library -n SysLogServ
2005 Jan 17 17:34:02 0 3758 3754 /usr/bin/sh /bin/basename /usr/sadm/lib/smc/bin/smcregister
2005 Jan 17 17:34:02 0 3758 3754 /usr/bin/expr //usr/sadm/lib/smc/bin/smcregister : \(.*[^/]\)/*$
2005 Jan 17 17:34:02 0 3759 3754 date -u +%Y%m%d%H%M%S
2005 Jan 17 17:34:02 0 3760 3754 /usr/bin/sh /bin/basename SysLogServiceBean_zh.jar
2005 Jan 17 17:34:02 0 3760 3754 /usr/bin/expr /SysLogServiceBean_zh.jar : \(.*[^/]\)/*$ : .*/\(.
2005 Jan 17 17:34:02 0 3761 3754 sed -e s@^//@/@
2005 Jan 17 17:34:02 0 3763 3754 sed -e s@^//@/@
2005 Jan 17 17:34:02 0 3765 3754 sed -e s@^//@/@
2005 Jan 17 17:34:02 0 3767 3754 sed -e s@^//@/@
2005 Jan 17 17:34:02 0 3769 3754 sed -e s@^//@/@
2005 Jan 17 17:34:02 0 3771 3754 sed -e s@^//@/@
[...]
^C
#
# wc -l execsnoop.out
7742 execsnoop.out
#
|
which shows 7742 processes were executed. Lets try an experiment to get a grip on what effect creating and destroying 7742 processes has on a system. I run a tiny command 7742 times and time it. This is on an UltraSPARC 5 with a 360 MHz CPU,
# perl -e 'print "date\n" x 7742;' | ptime sh > /dev/null real 2:10.397 user 31.131 sys 1:29.769 |
That took over 2 minutes. This suggests that just running 7742 commands is likely to be a considerable factor towards the overall boot time.
Now we will use DTrace again to print a summary of the commands that were executed. This is run during an entire SMC boot,
# dtrace -n 'syscall::exec*:return { @Exec[execname] = count(); }'
^C
stty 1
SUNWhksmc_reg.sh 1
SUNWdlvmg_reg.sh 1
SUNWhdcl_reg.sh 1
SUNWhmga_reg.sh 1
SUNWhsmc_reg.sh 1
SUNWddcl_reg.sh 1
SUNWdmgp_reg.sh 1
[...]
mkdir 30
grep 45
cat 45
chmod 57
cp 79
rm 88
id 186
fgrep 614
smcregister 614
date 615
ksh 616
dirname 801
basename 2029
expr 3127
sed 3200
|
The above report highlights the main type of commands that were executed - shell scripting commands. Ok, if we are already using ksh (some 616 times), then it's suprising that we are using external versions of dirname, basename, expr and sed - the Korn shell has much of this functionality built in. In fact, Perl has all of this functionality built in.
Another DTrace tool to try is iosnoop, which was used to check disk behaviour during the boot,
# ./iosnoop -v > iosnoop.out
#
# head iosnoop.out
STRTIME UID PID D BLOCK SIZE COMM PATHNAME
2005 Jan 17 17:35:00 0 8036 R 15721760 8192 dtrace /lib/sparcv9/libc.so.1
2005 Jan 17 17:35:00 0 8041 R 10267472 8192 sed /lib/libc.so.1
2005 Jan 17 17:35:00 0 8036 R 20896480 8192 dtrace /usr/lib/sparcv9/libdtrace.so.1
2005 Jan 17 17:35:00 0 8036 R 15720624 8192 dtrace /lib/sparcv9/libc.so.1
2005 Jan 17 17:35:00 0 8036 R 15721744 8192 dtrace /lib/sparcv9/libc.so.1
2005 Jan 17 17:35:00 0 1267 R 20628288 8192 snmpd /usr/sfw/lib/sparcv9/libnetsnmp.so.5.0.9
2005 Jan 17 17:35:00 0 3 W 5069290 2048 fsflush /var/sadm/smc/scripts/SUNWcsmc.reg/SUNWs...
2005 Jan 17 17:35:01 0 1943 R 251568 8192 java /usr/j2se/jre/lib/sparc/libawt.so
2005 Jan 17 17:35:01 0 1943 R 251584 8192 java /usr/j2se/jre/lib/sparc/libawt.so
#
# wc -l iosnoop.out
9504 iosnoop.out
#
# grep registry.ser iosnoop.out | wc -l
2228
#
# grep registry.ser iosnoop.out | grep ' W ' | wc -l
2228
#
|
During this sample, 9504 I/O events occured (these are captured at the block device, many will not make it past the file system caches). Of these events, 2228 of them were for that registry.ser file, and all of these were writes (I can assume the reads are absorbed by the page cache).
The following are observations and conclusions from the above DTrace measurements,
These may help explain what work is performed during an SMC boot.
I hope this has been a useful demonstration of using DTrace from the command line, as well as from DTrace scripts.
More DTrace
Back to Brendan Gregg's Homepage