BayLISA2013_SystemsPerformanceBook.pdf

Systems Performance: Enterprise and the Cloud

My talk for BayLISA, Oct 2013, launching the Systems Performance book.

Description: "Operating system performance analysis and tuning leads to a better end-user experience and lower costs, especially for cloud computing environments that pay by the operating system instance. This book covers concepts, strategy, tools and tuning for Unix operating systems, with a focus on Linux- and Solaris-based systems. The book covers the latest tools and techniques, including static and dynamic tracing, to get the most out of your systems."

	next prev 1/26
	next prev 2/26
	next prev 3/26
	next prev 4/26
	next prev 5/26
	next prev 6/26
	next prev 7/26
	next prev 8/26
	next prev 9/26
	next prev 10/26
	next prev 11/26
	next prev 12/26
	next prev 13/26
	next prev 14/26
	next prev 15/26
	next prev 16/26
	next prev 17/26
	next prev 18/26
	next prev 19/26
	next prev 20/26
	next prev 21/26
	next prev 22/26
	next prev 23/26
	next prev 24/26
	next prev 25/26
	next prev 26/26

PDF: BayLISA2013_SystemsPerformanceBook.pdf

Keywords (from pdftotext):

slide 1:

BayLISA, Oct 2013

slide 2:

Systems Performance
• Analysis of apps to metal. Think LAMP not AMP.
• An activity for everyone: from casual to full time.
Operating System
• The basis is
the system
Applications
• The target is
System Libraries
System Call Interface
• All software
can cause
performance
problems
Kernel
everything
VFS
Sockets
File Systems
TCP/UDP
Volume Managers
Block Device Interface
Ethernet
Device Drivers
Resource Controls
Firmware
Metal
Scheduler
Virtual
Memory

slide 3:

Systems Performance: Enterprise and the Cloud
• Brendan Gregg (and many others); Prentice Hall, 2013
• 635 pages of chapters, plus appendices, etc
• Background, methodologies, examples
• Examples from:
• Linux (Ubuntu, Fedora, CentOS)
• illumos (SmartOS, OmniOS)
• Audience:
• Sysadmins, developers, everyone
• Enterprise and cloud environments

slide 4:

The Author: Brendan Gregg
• Currently at Joyent, previously Brendan@Sun, then Oracle
• Lead Performance Engineer: debugs perf on SmartOS/Linux/
Windows daily, small to large cloud environments, any layer of
the software stack, down to firmware and metal. Previously a
kernel engineer, performance consultant, trainer.
• Written hundreds of published perf tools (too many), including
the original iosnoop, iotop, execsnoop, nicstat, psio, etc.
• Created visualizations: heat maps for various uses, flame
graphs, frequency trails, cloud process graphs
• Developed methodologies: USE method, TSA method
• Co-authored books: DTrace, Solaris Performance and Tools

slide 5:

Goals
• Modern systems performance: including cloud computing,
dynamic tracing, visualizations, open source
• Accessible to a wide audience
• Help you maximize system and application performance
• Quickly diagnose performance issues: eg, outilers
• Turn unknown unknowns into known unknowns – actionable
• 10+ year shelf life: document concepts and methodology first,
with tools and tunables of the day as examples of application

slide 6:

Personal Motivation
• The need for a good reference for:
• Internal Joyent staff
• External customers
• IT at large
• As a reference for classes
• I’ve been teaching professional classes in system
administration and performance on and off since 2001
• I’ve learned a lot from teaching students to solve real
performance problems, to see what works, what doesn’t
• I’ve been using this book already for teaching the Joyent
cloud performance class: http://joyent.com/training,
next class Nov 18th 2013

slide 7:

Table of Contents
• 1. Intro
• 2. Methodology
• 3. Operating Systems
• 4. Observability Tools
• 5. Applications
• 6. CPUs
• 7. Memory
• 8. File Systems
• 9. Disks
• 10. Network
• 11. Cloud Computing
• 12. Benchmarking
• 13. Case Study
• Apx.A. USE Linux
• Apx.B. USE Solaris
• Apx.C. sar Summary
• Apx.D. DTrace one-liners
• Apx.E. DTrace to SystemTap
• Apx.F. Solutions to Selected Ex.
• Apx.G. Who's Who
• Glossary
• Index

slide 8:

Highlights:
• Chapter 2 Methodologies:
• Many documented for the first time; some created by me
• Chapter 3 Operating Systems:
• 30 page summary of OS internals
• Chapter 6-10: CPUs, Memory, FS, Disks, Network
• Background, methodology, tools
• Chapter 11: Cloud Computing
• Different technologies and their performance
• Chapter 12: Benchmarking
• For the good of the industry. Please, everyone, read this.

slide 9:

Chapter 2 Methodologies
• Documenting the black art
of systems performance
• Also summarizes concepts,
statistics, visualizations

slide 10:

Chapter 3 Operating Systems
• The OS crash course you missed at University

slide 11:

Chapter 6-10 Structure
• Background
• Just enough OS and HW internals
• Methodologies
• For beginners, casual users, experts
• How to start, and steps to proceed
• Example Application
• Linux, illumos
• Tools, screenshots, case studies
• Some tunables of the day

slide 12:

Chapter 6-10 Structure
• Background
• Just enough OS and HW internals
Generic
• Methodologies
• For beginners, casual users, experts
• How to start, and steps to proceed
• Example Application
• Linux, illumos
• Tools, screenshots, case studies
• Some tunables of the day
Specific

slide 13:

Example: Chapter 6 CPUs
Hardware
Software

slide 14:

Chapter 11 Cloud Computing
• OS Virtualization
• HW Virtualization
• Observability
• Performance
• Resource controls

slide 15:

Modern Systems Performance
• Comparing 1990’s to 2010’s

slide 16:

1990’s Systems Performance
* Proprietary Unix, closed source, static tools
$ vmstat 1
kthr
memory
r b w
swap free re
0 0 0 8475356 565176 2
1 0 0 7983772 119164 0
0 0 0 8046208 181600 0
[...]
page
disk
faults
cpu
mf pi po fr de sr cd cd s0 s5
cs us sy id
8 0 0 0 0 1 0 0 -0 13 378 101 142 0 0 99
0 0 0 0 0 0 224 0 0 0 1175 5654 1196 1 15 84
0 0 0 0 0 0 322 0 0 0 1473 6931 1360 1 7 92
* Limited metrics and documentation
* Some perf issues could not be solved
* Analysis methodology constrained by tools
* Perf experts used inference and experimentation
* Literature is still around

slide 17:

2010’s Systems Performance
• Open source (the norm)
• Ultimate documentation
• Dynamic tracing
• Observe everything
• Visualizations
• Comprehend many metrics
• Cloud computing
• Resource controls can be the bottleneck!
• Methodologies
• Where to begin, and steps to root cause

slide 18:

1990’s Performance Visualizations
Text-based and line graphs
$ iostat -x 1
device
sd0
sd5
sd12
sd12
sd13
sd14
sd15
sd16
nfs6
[...]
r/s
extended device statistics
w/s
kr/s
kw/s wait actv
3.9 0.0 0.0
0.0 0.0 0.0
1.1 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
svc_t

slide 19:

2010’s Performance Visualizations
• Utilization and latency heat maps, flame graphs

slide 20:

Modern Performance Analysis Tools
• Traditional tools
• Plus dynamic tracing to fill in gaps

slide 21:

Performance Analysis Tools: Linux
strace
Operating System
netstat
Hardware
perf
Applications
DBs, all server types, ...
pidstat
mpstat
System Libraries
perf
dtrace
stap
lttng
ktap
CPU
Interconnect
System Call Interface
VFS
Sockets
File Systems
TCP/UDP
Volume Managers
Block Device Interface
Ethernet
Scheduler
top ps
pidstat
Virtual
Memory
vmstat
slabtop
free
Device Drivers
iostat
iotop
blktrace
perf
Expander Interconnect
I/O Bus
I/O Bridge
tcpdump
I/O Controller
Disk
Memory
Bus
perf
DRAM
nicstat
Network Controller
Interface Transports
Disk
CPU
Various:
Port
Swap
swapon
ping
Port
traceroute
sar
/proc

slide 22:

Performance Analysis Tools: illumos
Operating System
netstat
Hardware
plockstat
lockstat
mpstat
Applications
DBs, all server types, ...
truss
System Libraries
kstat
CPU
Interconnect
System Call Interface
dtrace
VFS
Sockets
File Systems
TCP/UDP
Volume Managers
Block Device Interface
Ethernet
Scheduler
prstat
Virtual
Memory
vmstat
Device Drivers
cpustat
iostat
Expander Interconnect
I/O Bus
snoop
intrstat
I/O Bridge
Memory
Bus
DRAM
Network Controller
Interface Transports
Disk
CPU
nicstat
kstat
I/O Controller
Disk
cpustat
cputrack
Various:
Port
Swap
swap
ping
Port
traceroute
sar
kstat

slide 23:

Dynamic Tracing: DTrace
• Example DTrace scripts from the DTraceToolkit, DTrace book, ...
cifs*.d, iscsi*.d :Services
nfsv3*.d, nfsv4*.d
ssh*.d, httpd*.d
Language Providers:
Databases:
fswho.d, fssnoop.d
sollife.d
solvfssnoop.d
dnlcsnoop.d
zfsslower.d
ziowait.d
ziostacks.d
spasync.d
metaslab_free.d
iosnoop, iotop
disklatency.d
satacmds.d
satalatency.d
scsicmds.d
scsilatency.d
sdretry.d, sdqueue.d
ide*.d, mpt*.d
hotuser, umutexmax.d, lib*.d
node*.d, erlang*.d, j*.d, js*.d
php*.d, pl*.d, py*.d, rb*.d, sh*.d
mysql*.d, postgres*.d, redis*.d, riak*.d
opensnoop, statsnoop
errinfo, dtruss, rwtop
rwsnoop, mmap.d, kill.d
shellsnoop, zonecalls.d
weblatency.d, fddist
Applications
DBs, all server types, ...
System Libraries
System Call Interface
VFS
Sockets
File Systems
TCP/UDP
Volume Managers
Block Device Interface
Ethernet
Device Drivers
Scheduler
priclass.d, pridist.d
cv_wakeup_slow.d
displat.d, capslat.d
Virtual
Memory
minfbypid.d
pgpginbypid.d
macops.d, ixgbecheck.d
ngesnoop.d, ngelink.d
soconnect.d, soaccept.d, soclose.d, socketio.d, so1stbyte.d
sotop.d, soerror.d, ipstat.d, ipio.d, ipproto.d, ipfbtsnoop.d
ipdropper.d, tcpstat.d, tcpaccept.d, tcpconnect.d, tcpioshort.d
tcpio.d, tcpbytes.d, tcpsize.d, tcpnmap.d, tcpconnlat.d, tcp1stbyte.d
tcpfbtwatch.d, tcpsnoop.d, tcpconnreqmaxq.d, tcprefused.d
tcpretranshosts.d, tcpretranssnoop.d, tcpsackretrans.d, tcpslowstart.d
tcptimewait.d, udpstat.d, udpio.d, icmpstat.d, icmpsnoop.d

slide 24:

Too Many Tools
• It’s not really about the tools
• ... those previous diagrams aren’t even in the book
• It’s about what you need to accomplish, and then finding the
tools to answer them
• This is documented as
methodologies
• Tools are then used as
examples

slide 25:

Modern Performance Methodologies
• Workload characterization
• USE Method
• TSA Method
• Drill-down Analysis
• Latency Analysis
• Event Tracing
• Static performance
tuning
• ...
• Covered in Chapter 2
and later chapters

slide 26:

Systems Performance
• Really understand how systems work
• New observability, visualizations, methodologies
• Understand the challenges of
cloud computing
• Brendan Gregg:
• http://www.brendangregg.com
• http://dtrace.org/blogs/brendan
• twitter: @brendangregg
Sample Chapter
http://dtrace.org/blogs/brendan/2013/06/21/systems-performance-enterprise-and-the-cloud/