Systems Performance: Enterprise and the Cloud
This page is about the book Systems Performance: Enterprise and the Cloud, published by Prentice Hall (2013). Here I'll describe the book, link to related content, and list errata. Also see the preface for a detailed description of the book, which is included in the sample chapter from the informIT site.
Why Systems Performance
Systems performance analysis is an important skill for all computer users, whether you're trying to understand why your laptop is slow, or optimizing the performance of a large-scale production environment. It is the study of both operating system (kernel) and application performance, but can also lead to more specialized performance topics, for specific languages or applications.
There are two general reasons for doing this:
- Improving price/performance: especially for large environments, where even small performance wins can add up to large savings in IT spend.
- Reducing latency outliers: for an environment of any size, occasional high latency I/O can slow application requests, causing unhappy customers.
Other activities of systems performance include benchmarking for the evaluation of systems, capacity planning, bottleneck elimination, and scalability analysis – so that you discover scalability limiters early, and in time to fix them.
Operating systems based on two different kernels are used as examples in this book: Linux-based: Ubuntu, Fedora, and CentOS; and illumos-based (a fork of OpenSolaris): SmartOS, and OmniOS. You may be interested in only one of these, but covering others provides additional perspective, helping you better understand the design choices, and performance results, of each.
This book is primarily for system administrators, support staff, operators, and devops in enterprise and cloud environments. It is also a useful reference for developers, database administrators, and web server administrators who would like to understand operating system and application performance.
Why This Book is Different
While it covers performance tools and the background for understanding them, what makes this book different is the inclusion of many performance methodologies, including those covered quickly in my USENIX 2012 talk. I've been teaching and developing systems performance classes on and off for ten years, and have found methodologies to be crucial for giving students a starting point, and then guiding them through performance activities. The USE Method is one example I developed for this purpose.
Focusing on methodologies is one of the strategies I've used to make this book as timeless as possible, so that you can continue referring to it for the rest of your career. I'm still referring to The Art of Computer Systems Performance Analysis, by Raj Jain in 1991, over twenty years since it was written.
Table of Contents
3. Operating Systems
4. Observability Tools
8. File Systems
11. Cloud Computing
13. Case Study
The chapters end on page 635, and then there are appendices and other matter. The preface, included in the sample chapter PDF, has a longer description of the chapters and their structure.
- I launched the book at BayLISA in October 2013: Systems Performance: Author's Introduction.
- Deirdré has reposted numerous twitter comments and Reviews of "Systems Performance", including the one pictured on the right (over 500 retweets!).
- Deirdré took several Systems Performance Book Videos while I was writing the book.
- Computerworld published an excerpt from the book: 13 Benchmarking Sins.
- Look under Documentation and Videos for other related content by myself on systems performance and methodologies.
- p89: The term "context switch" should be "mode switch" on this page.
p106: "processes to run in parallel" → "processes to run concurrently".
p202: "toward provide these on-chip" → "toward providing these on-chip".
p215: "check the idle column" → "check the idle columns" (wait I/O and idle)
p253: Figure 6-16, x-axis scale should be 0-30 s, and delete "5,312 CPUs" from label
p390: "Key ZSF" → "Key ZFS"
These corrections have already been sent to the publisher. Also: the first copies that went through the print machinery had an issue with binding and gluing for the first pages, which in the worst case has led to pages falling out; the publisher has been replacing those copies, and checking all future copies. If the book was bought on Amazon, then follow the return procedure. Sorry for the inconvenience.
1st & 2nd Printing:
- p30, 2.3.14: The "hit ratio" formula is correct, but the description above it is not. "cache's hit ratio ... versus the number of times it was not (misses):" should be "... versus the total accesses (hits + misses).".
p231, 6.6.5: Regarding ps(1) "On Linux, the %CPU column shows the CPU usage during the previous second as the sum across all CPUs." should be "On Linux, the %CPU column shows the average CPU utilization over the lifetime of the process, summed across all CPUs".
p231, 6.6.6: "... TIME and %CPU columns, which were introduced in the previous section on ps(1)." should now be "... TIME and %CPU columns. For top(1), the %CPU is the average for the update interval."
p435, 9.6.1: "r/s: read requests issued to the disk device per second" should say "completed" instead of "issued", and the same for "w/s".
Thanks to all the reviewers, and to Deirdré Straughan for editing another one of my books!