Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Systems Performance: Enterprise and the Cloud

My talk for BayLISA, Oct 2013, launching the Systems Performance book.

Description: "Operating system performance analysis and tuning leads to a better end-user experience and lower costs, especially for cloud computing environments that pay by the operating system instance. This book covers concepts, strategy, tools and tuning for Unix operating systems, with a focus on Linux- and Solaris-based systems. The book covers the latest tools and techniques, including static and dynamic tracing, to get the most out of your systems."

next
prev
1/26
next
prev
2/26
next
prev
3/26
next
prev
4/26
next
prev
5/26
next
prev
6/26
next
prev
7/26
next
prev
8/26
next
prev
9/26
next
prev
10/26
next
prev
11/26
next
prev
12/26
next
prev
13/26
next
prev
14/26
next
prev
15/26
next
prev
16/26
next
prev
17/26
next
prev
18/26
next
prev
19/26
next
prev
20/26
next
prev
21/26
next
prev
22/26
next
prev
23/26
next
prev
24/26
next
prev
25/26
next
prev
26/26

PDF: BayLISA2013_SystemsPerformanceBook.pdf

Keywords (from pdftotext):

slide 1:
    BayLISA, Oct 2013
    
slide 2:
    Systems Performance
    • Analysis of apps to metal. Think LAMP not AMP.
    • An activity for everyone: from casual to full time.
    Operating System
    • The basis is
    the system
    Applications
    • The target is
    System Libraries
    System Call Interface
    • All software
    can cause
    performance
    problems
    Kernel
    everything
    VFS
    Sockets
    File Systems
    TCP/UDP
    Volume Managers
    Block Device Interface
    Ethernet
    Device Drivers
    Resource Controls
    Firmware
    Metal
    Scheduler
    Virtual
    Memory
    
slide 3:
    Systems Performance: Enterprise and the Cloud
    • Brendan Gregg (and many others); Prentice Hall, 2013
    • 635 pages of chapters, plus appendices, etc
    • Background, methodologies, examples
    • Examples from:
    • Linux (Ubuntu, Fedora, CentOS)
    • illumos (SmartOS, OmniOS)
    • Audience:
    • Sysadmins, developers, everyone
    • Enterprise and cloud environments
    
slide 4:
    The Author: Brendan Gregg
    • Currently at Joyent, previously Brendan@Sun, then Oracle
    • Lead Performance Engineer: debugs perf on SmartOS/Linux/
    Windows daily, small to large cloud environments, any layer of
    the software stack, down to firmware and metal. Previously a
    kernel engineer, performance consultant, trainer.
    • Written hundreds of published perf tools (too many), including
    the original iosnoop, iotop, execsnoop, nicstat, psio, etc.
    • Created visualizations: heat maps for various uses, flame
    graphs, frequency trails, cloud process graphs
    • Developed methodologies: USE method, TSA method
    • Co-authored books: DTrace, Solaris Performance and Tools
    
slide 5:
    Goals
    • Modern systems performance: including cloud computing,
    dynamic tracing, visualizations, open source
    • Accessible to a wide audience
    • Help you maximize system and application performance
    • Quickly diagnose performance issues: eg, outilers
    • Turn unknown unknowns into known unknowns – actionable
    • 10+ year shelf life: document concepts and methodology first,
    with tools and tunables of the day as examples of application
    
slide 6:
    Personal Motivation
    • The need for a good reference for:
    • Internal Joyent staff
    • External customers
    • IT at large
    • As a reference for classes
    • I’ve been teaching professional classes in system
    administration and performance on and off since 2001
    • I’ve learned a lot from teaching students to solve real
    performance problems, to see what works, what doesn’t
    • I’ve been using this book already for teaching the Joyent
    cloud performance class: http://joyent.com/training,
    next class Nov 18th 2013
    
slide 7:
    Table of Contents
    • 1. Intro
    • 2. Methodology
    • 3. Operating Systems
    • 4. Observability Tools
    • 5. Applications
    • 6. CPUs
    • 7. Memory
    • 8. File Systems
    • 9. Disks
    • 10. Network
    • 11. Cloud Computing
    • 12. Benchmarking
    • 13. Case Study
    • Apx.A. USE Linux
    • Apx.B. USE Solaris
    • Apx.C. sar Summary
    • Apx.D. DTrace one-liners
    • Apx.E. DTrace to SystemTap
    • Apx.F. Solutions to Selected Ex.
    • Apx.G. Who's Who
    • Glossary
    • Index
    
slide 8:
    Highlights:
    • Chapter 2 Methodologies:
    • Many documented for the first time; some created by me
    • Chapter 3 Operating Systems:
    • 30 page summary of OS internals
    • Chapter 6-10: CPUs, Memory, FS, Disks, Network
    • Background, methodology, tools
    • Chapter 11: Cloud Computing
    • Different technologies and their performance
    • Chapter 12: Benchmarking
    • For the good of the industry. Please, everyone, read this.
    
slide 9:
    Chapter 2 Methodologies
    • Documenting the black art
    of systems performance
    • Also summarizes concepts,
    statistics, visualizations
    
slide 10:
    Chapter 3 Operating Systems
    • The OS crash course you missed at University
    
slide 11:
    Chapter 6-10 Structure
    • Background
    • Just enough OS and HW internals
    • Methodologies
    • For beginners, casual users, experts
    • How to start, and steps to proceed
    • Example Application
    • Linux, illumos
    • Tools, screenshots, case studies
    • Some tunables of the day
    
slide 12:
    Chapter 6-10 Structure
    • Background
    • Just enough OS and HW internals
    Generic
    • Methodologies
    • For beginners, casual users, experts
    • How to start, and steps to proceed
    • Example Application
    • Linux, illumos
    • Tools, screenshots, case studies
    • Some tunables of the day
    Specific
    
slide 13:
    Example: Chapter 6 CPUs
    Hardware
    Software
    
slide 14:
    Chapter 11 Cloud Computing
    • OS Virtualization
    • HW Virtualization
    • Observability
    • Performance
    • Resource controls
    
slide 15:
    Modern Systems Performance
    • Comparing 1990’s to 2010’s
    
slide 16:
    1990’s Systems Performance
    * Proprietary Unix, closed source, static tools
    $ vmstat 1
    kthr
    memory
    r b w
    swap free re
    0 0 0 8475356 565176 2
    1 0 0 7983772 119164 0
    0 0 0 8046208 181600 0
    [...]
    page
    disk
    faults
    cpu
    mf pi po fr de sr cd cd s0 s5
    cs us sy id
    8 0 0 0 0 1 0 0 -0 13 378 101 142 0 0 99
    0 0 0 0 0 0 224 0 0 0 1175 5654 1196 1 15 84
    0 0 0 0 0 0 322 0 0 0 1473 6931 1360 1 7 92
    * Limited metrics and documentation
    * Some perf issues could not be solved
    * Analysis methodology constrained by tools
    * Perf experts used inference and experimentation
    * Literature is still around
    
slide 17:
    2010’s Systems Performance
    • Open source (the norm)
    • Ultimate documentation
    • Dynamic tracing
    • Observe everything
    • Visualizations
    • Comprehend many metrics
    • Cloud computing
    • Resource controls can be the bottleneck!
    • Methodologies
    • Where to begin, and steps to root cause
    
slide 18:
    1990’s Performance Visualizations
    Text-based and line graphs
    $ iostat -x 1
    device
    sd0
    sd5
    sd12
    sd12
    sd13
    sd14
    sd15
    sd16
    nfs6
    [...]
    r/s
    extended device statistics
    w/s
    kr/s
    kw/s wait actv
    3.9 0.0 0.0
    0.0 0.0 0.0
    1.1 0.0 0.0
    0.0 0.0 0.0
    0.0 0.0 0.0
    0.0 0.0 0.0
    0.0 0.0 0.0
    0.0 0.0 0.0
    0.0 0.0 0.0
    svc_t
    
slide 19:
    2010’s Performance Visualizations
    • Utilization and latency heat maps, flame graphs
    
slide 20:
    Modern Performance Analysis Tools
    • Traditional tools
    • Plus dynamic tracing to fill in gaps
    
slide 21:
    Performance Analysis Tools: Linux
    strace
    Operating System
    netstat
    Hardware
    perf
    Applications
    DBs, all server types, ...
    pidstat
    mpstat
    System Libraries
    perf
    dtrace
    stap
    lttng
    ktap
    CPU
    Interconnect
    System Call Interface
    VFS
    Sockets
    File Systems
    TCP/UDP
    Volume Managers
    Block Device Interface
    Ethernet
    Scheduler
    top ps
    pidstat
    Virtual
    Memory
    vmstat
    slabtop
    free
    Device Drivers
    iostat
    iotop
    blktrace
    perf
    Expander Interconnect
    I/O Bus
    I/O Bridge
    tcpdump
    I/O Controller
    Disk
    Memory
    Bus
    perf
    DRAM
    nicstat
    Network Controller
    Interface Transports
    Disk
    CPU
    Various:
    Port
    Swap
    swapon
    ping
    Port
    traceroute
    sar
    /proc
    
slide 22:
    Performance Analysis Tools: illumos
    Operating System
    netstat
    Hardware
    plockstat
    lockstat
    mpstat
    Applications
    DBs, all server types, ...
    truss
    System Libraries
    kstat
    CPU
    Interconnect
    System Call Interface
    dtrace
    VFS
    Sockets
    File Systems
    TCP/UDP
    Volume Managers
    Block Device Interface
    Ethernet
    Scheduler
    prstat
    Virtual
    Memory
    vmstat
    Device Drivers
    cpustat
    iostat
    Expander Interconnect
    I/O Bus
    snoop
    intrstat
    I/O Bridge
    Memory
    Bus
    DRAM
    Network Controller
    Interface Transports
    Disk
    CPU
    nicstat
    kstat
    I/O Controller
    Disk
    cpustat
    cputrack
    Various:
    Port
    Swap
    swap
    ping
    Port
    traceroute
    sar
    kstat
    
slide 23:
    Dynamic Tracing: DTrace
    • Example DTrace scripts from the DTraceToolkit, DTrace book, ...
    cifs*.d, iscsi*.d :Services
    nfsv3*.d, nfsv4*.d
    ssh*.d, httpd*.d
    Language Providers:
    Databases:
    fswho.d, fssnoop.d
    sollife.d
    solvfssnoop.d
    dnlcsnoop.d
    zfsslower.d
    ziowait.d
    ziostacks.d
    spasync.d
    metaslab_free.d
    iosnoop, iotop
    disklatency.d
    satacmds.d
    satalatency.d
    scsicmds.d
    scsilatency.d
    sdretry.d, sdqueue.d
    ide*.d, mpt*.d
    hotuser, umutexmax.d, lib*.d
    node*.d, erlang*.d, j*.d, js*.d
    php*.d, pl*.d, py*.d, rb*.d, sh*.d
    mysql*.d, postgres*.d, redis*.d, riak*.d
    opensnoop, statsnoop
    errinfo, dtruss, rwtop
    rwsnoop, mmap.d, kill.d
    shellsnoop, zonecalls.d
    weblatency.d, fddist
    Applications
    DBs, all server types, ...
    System Libraries
    System Call Interface
    VFS
    Sockets
    File Systems
    TCP/UDP
    Volume Managers
    Block Device Interface
    Ethernet
    Device Drivers
    Scheduler
    priclass.d, pridist.d
    cv_wakeup_slow.d
    displat.d, capslat.d
    Virtual
    Memory
    minfbypid.d
    pgpginbypid.d
    macops.d, ixgbecheck.d
    ngesnoop.d, ngelink.d
    soconnect.d, soaccept.d, soclose.d, socketio.d, so1stbyte.d
    sotop.d, soerror.d, ipstat.d, ipio.d, ipproto.d, ipfbtsnoop.d
    ipdropper.d, tcpstat.d, tcpaccept.d, tcpconnect.d, tcpioshort.d
    tcpio.d, tcpbytes.d, tcpsize.d, tcpnmap.d, tcpconnlat.d, tcp1stbyte.d
    tcpfbtwatch.d, tcpsnoop.d, tcpconnreqmaxq.d, tcprefused.d
    tcpretranshosts.d, tcpretranssnoop.d, tcpsackretrans.d, tcpslowstart.d
    tcptimewait.d, udpstat.d, udpio.d, icmpstat.d, icmpsnoop.d
    
slide 24:
    Too Many Tools
    • It’s not really about the tools
    • ... those previous diagrams aren’t even in the book
    • It’s about what you need to accomplish, and then finding the
    tools to answer them
    • This is documented as
    methodologies
    • Tools are then used as
    examples
    
slide 25:
    Modern Performance Methodologies
    • Workload characterization
    • USE Method
    • TSA Method
    • Drill-down Analysis
    • Latency Analysis
    • Event Tracing
    • Static performance
    tuning
    • ...
    • Covered in Chapter 2
    and later chapters
    
slide 26:
    Systems Performance
    • Really understand how systems work
    • New observability, visualizations, methodologies
    • Understand the challenges of
    cloud computing
    • Brendan Gregg:
    • http://www.brendangregg.com
    • http://dtrace.org/blogs/brendan
    • twitter: @brendangregg
    Sample Chapter
    http://dtrace.org/blogs/brendan/2013/06/21/systems-performance-enterprise-and-the-cloud/