Systems Performance: Enterprise and the Cloud
My talk for BayLISA, Oct 2013, launching the Systems Performance book.Description: "Operating system performance analysis and tuning leads to a better end-user experience and lower costs, especially for cloud computing environments that pay by the operating system instance. This book covers concepts, strategy, tools and tuning for Unix operating systems, with a focus on Linux- and Solaris-based systems. The book covers the latest tools and techniques, including static and dynamic tracing, to get the most out of your systems."
next prev 1/26 | |
next prev 2/26 | |
next prev 3/26 | |
next prev 4/26 | |
next prev 5/26 | |
next prev 6/26 | |
next prev 7/26 | |
next prev 8/26 | |
next prev 9/26 | |
next prev 10/26 | |
next prev 11/26 | |
next prev 12/26 | |
next prev 13/26 | |
next prev 14/26 | |
next prev 15/26 | |
next prev 16/26 | |
next prev 17/26 | |
next prev 18/26 | |
next prev 19/26 | |
next prev 20/26 | |
next prev 21/26 | |
next prev 22/26 | |
next prev 23/26 | |
next prev 24/26 | |
next prev 25/26 | |
next prev 26/26 |
PDF: BayLISA2013_SystemsPerformanceBook.pdf
Keywords (from pdftotext):
slide 1:
BayLISA, Oct 2013slide 2:
Systems Performance • Analysis of apps to metal. Think LAMP not AMP. • An activity for everyone: from casual to full time. Operating System • The basis is the system Applications • The target is System Libraries System Call Interface • All software can cause performance problems Kernel everything VFS Sockets File Systems TCP/UDP Volume Managers Block Device Interface Ethernet Device Drivers Resource Controls Firmware Metal Scheduler Virtual Memoryslide 3:
Systems Performance: Enterprise and the Cloud • Brendan Gregg (and many others); Prentice Hall, 2013 • 635 pages of chapters, plus appendices, etc • Background, methodologies, examples • Examples from: • Linux (Ubuntu, Fedora, CentOS) • illumos (SmartOS, OmniOS) • Audience: • Sysadmins, developers, everyone • Enterprise and cloud environmentsslide 4:
The Author: Brendan Gregg • Currently at Joyent, previously Brendan@Sun, then Oracle • Lead Performance Engineer: debugs perf on SmartOS/Linux/ Windows daily, small to large cloud environments, any layer of the software stack, down to firmware and metal. Previously a kernel engineer, performance consultant, trainer. • Written hundreds of published perf tools (too many), including the original iosnoop, iotop, execsnoop, nicstat, psio, etc. • Created visualizations: heat maps for various uses, flame graphs, frequency trails, cloud process graphs • Developed methodologies: USE method, TSA method • Co-authored books: DTrace, Solaris Performance and Toolsslide 5:
Goals • Modern systems performance: including cloud computing, dynamic tracing, visualizations, open source • Accessible to a wide audience • Help you maximize system and application performance • Quickly diagnose performance issues: eg, outilers • Turn unknown unknowns into known unknowns – actionable • 10+ year shelf life: document concepts and methodology first, with tools and tunables of the day as examples of applicationslide 6:
Personal Motivation • The need for a good reference for: • Internal Joyent staff • External customers • IT at large • As a reference for classes • I’ve been teaching professional classes in system administration and performance on and off since 2001 • I’ve learned a lot from teaching students to solve real performance problems, to see what works, what doesn’t • I’ve been using this book already for teaching the Joyent cloud performance class: http://joyent.com/training, next class Nov 18th 2013slide 7:
Table of Contents • 1. Intro • 2. Methodology • 3. Operating Systems • 4. Observability Tools • 5. Applications • 6. CPUs • 7. Memory • 8. File Systems • 9. Disks • 10. Network • 11. Cloud Computing • 12. Benchmarking • 13. Case Study • Apx.A. USE Linux • Apx.B. USE Solaris • Apx.C. sar Summary • Apx.D. DTrace one-liners • Apx.E. DTrace to SystemTap • Apx.F. Solutions to Selected Ex. • Apx.G. Who's Who • Glossary • Indexslide 8:
Highlights: • Chapter 2 Methodologies: • Many documented for the first time; some created by me • Chapter 3 Operating Systems: • 30 page summary of OS internals • Chapter 6-10: CPUs, Memory, FS, Disks, Network • Background, methodology, tools • Chapter 11: Cloud Computing • Different technologies and their performance • Chapter 12: Benchmarking • For the good of the industry. Please, everyone, read this.slide 9:
Chapter 2 Methodologies • Documenting the black art of systems performance • Also summarizes concepts, statistics, visualizationsslide 10:
Chapter 3 Operating Systems • The OS crash course you missed at Universityslide 11:
Chapter 6-10 Structure • Background • Just enough OS and HW internals • Methodologies • For beginners, casual users, experts • How to start, and steps to proceed • Example Application • Linux, illumos • Tools, screenshots, case studies • Some tunables of the dayslide 12:
Chapter 6-10 Structure • Background • Just enough OS and HW internals Generic • Methodologies • For beginners, casual users, experts • How to start, and steps to proceed • Example Application • Linux, illumos • Tools, screenshots, case studies • Some tunables of the day Specificslide 13:
Example: Chapter 6 CPUs Hardware Softwareslide 14:
Chapter 11 Cloud Computing • OS Virtualization • HW Virtualization • Observability • Performance • Resource controlsslide 15:
Modern Systems Performance • Comparing 1990’s to 2010’sslide 16:
1990’s Systems Performance * Proprietary Unix, closed source, static tools $ vmstat 1 kthr memory r b w swap free re 0 0 0 8475356 565176 2 1 0 0 7983772 119164 0 0 0 0 8046208 181600 0 [...] page disk faults cpu mf pi po fr de sr cd cd s0 s5 cs us sy id 8 0 0 0 0 1 0 0 -0 13 378 101 142 0 0 99 0 0 0 0 0 0 224 0 0 0 1175 5654 1196 1 15 84 0 0 0 0 0 0 322 0 0 0 1473 6931 1360 1 7 92 * Limited metrics and documentation * Some perf issues could not be solved * Analysis methodology constrained by tools * Perf experts used inference and experimentation * Literature is still aroundslide 17:
2010’s Systems Performance • Open source (the norm) • Ultimate documentation • Dynamic tracing • Observe everything • Visualizations • Comprehend many metrics • Cloud computing • Resource controls can be the bottleneck! • Methodologies • Where to begin, and steps to root causeslide 18:
1990’s Performance Visualizations Text-based and line graphs $ iostat -x 1 device sd0 sd5 sd12 sd12 sd13 sd14 sd15 sd16 nfs6 [...] r/s extended device statistics w/s kr/s kw/s wait actv 3.9 0.0 0.0 0.0 0.0 0.0 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 svc_tslide 19:
2010’s Performance Visualizations • Utilization and latency heat maps, flame graphsslide 20:
Modern Performance Analysis Tools • Traditional tools • Plus dynamic tracing to fill in gapsslide 21:
Performance Analysis Tools: Linux strace Operating System netstat Hardware perf Applications DBs, all server types, ... pidstat mpstat System Libraries perf dtrace stap lttng ktap CPU Interconnect System Call Interface VFS Sockets File Systems TCP/UDP Volume Managers Block Device Interface Ethernet Scheduler top ps pidstat Virtual Memory vmstat slabtop free Device Drivers iostat iotop blktrace perf Expander Interconnect I/O Bus I/O Bridge tcpdump I/O Controller Disk Memory Bus perf DRAM nicstat Network Controller Interface Transports Disk CPU Various: Port Swap swapon ping Port traceroute sar /procslide 22:
Performance Analysis Tools: illumos Operating System netstat Hardware plockstat lockstat mpstat Applications DBs, all server types, ... truss System Libraries kstat CPU Interconnect System Call Interface dtrace VFS Sockets File Systems TCP/UDP Volume Managers Block Device Interface Ethernet Scheduler prstat Virtual Memory vmstat Device Drivers cpustat iostat Expander Interconnect I/O Bus snoop intrstat I/O Bridge Memory Bus DRAM Network Controller Interface Transports Disk CPU nicstat kstat I/O Controller Disk cpustat cputrack Various: Port Swap swap ping Port traceroute sar kstatslide 23:
Dynamic Tracing: DTrace • Example DTrace scripts from the DTraceToolkit, DTrace book, ... cifs*.d, iscsi*.d :Services nfsv3*.d, nfsv4*.d ssh*.d, httpd*.d Language Providers: Databases: fswho.d, fssnoop.d sollife.d solvfssnoop.d dnlcsnoop.d zfsslower.d ziowait.d ziostacks.d spasync.d metaslab_free.d iosnoop, iotop disklatency.d satacmds.d satalatency.d scsicmds.d scsilatency.d sdretry.d, sdqueue.d ide*.d, mpt*.d hotuser, umutexmax.d, lib*.d node*.d, erlang*.d, j*.d, js*.d php*.d, pl*.d, py*.d, rb*.d, sh*.d mysql*.d, postgres*.d, redis*.d, riak*.d opensnoop, statsnoop errinfo, dtruss, rwtop rwsnoop, mmap.d, kill.d shellsnoop, zonecalls.d weblatency.d, fddist Applications DBs, all server types, ... System Libraries System Call Interface VFS Sockets File Systems TCP/UDP Volume Managers Block Device Interface Ethernet Device Drivers Scheduler priclass.d, pridist.d cv_wakeup_slow.d displat.d, capslat.d Virtual Memory minfbypid.d pgpginbypid.d macops.d, ixgbecheck.d ngesnoop.d, ngelink.d soconnect.d, soaccept.d, soclose.d, socketio.d, so1stbyte.d sotop.d, soerror.d, ipstat.d, ipio.d, ipproto.d, ipfbtsnoop.d ipdropper.d, tcpstat.d, tcpaccept.d, tcpconnect.d, tcpioshort.d tcpio.d, tcpbytes.d, tcpsize.d, tcpnmap.d, tcpconnlat.d, tcp1stbyte.d tcpfbtwatch.d, tcpsnoop.d, tcpconnreqmaxq.d, tcprefused.d tcpretranshosts.d, tcpretranssnoop.d, tcpsackretrans.d, tcpslowstart.d tcptimewait.d, udpstat.d, udpio.d, icmpstat.d, icmpsnoop.dslide 24:
Too Many Tools • It’s not really about the tools • ... those previous diagrams aren’t even in the book • It’s about what you need to accomplish, and then finding the tools to answer them • This is documented as methodologies • Tools are then used as examplesslide 25:
Modern Performance Methodologies • Workload characterization • USE Method • TSA Method • Drill-down Analysis • Latency Analysis • Event Tracing • Static performance tuning • ... • Covered in Chapter 2 and later chaptersslide 26:
Systems Performance • Really understand how systems work • New observability, visualizations, methodologies • Understand the challenges of cloud computing • Brendan Gregg: • http://www.brendangregg.com • http://dtrace.org/blogs/brendan • twitter: @brendangregg Sample Chapter http://dtrace.org/blogs/brendan/2013/06/21/systems-performance-enterprise-and-the-cloud/