Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

LinuxCon Europe 2014: Linux Performance Tools 2014

Talk by Brendan Gregg for LinuxCon Europe, 2014.

Video: https://www.youtube.com/watch?v=SN7Z0eCn0VY

Description: "There are many performance tools nowadays for Linux, but how do they all fit together, and when do we use them? This talk summarizes the three types of performance tools: observability, benchmarking, and tuning, providing a tour of what exists and why they exist. Advanced tools including those based on tracepoints, kprobes, and uprobes are also included: perf_events, ktap, SystemTap, LTTng, and sysdig. You'll gain a good understanding of the performance tools landscape, knowing what to reach for to get the most out of your systems."

next
prev
1/90
next
prev
2/90
next
prev
3/90
next
prev
4/90
next
prev
5/90
next
prev
6/90
next
prev
7/90
next
prev
8/90
next
prev
9/90
next
prev
10/90
next
prev
11/90
next
prev
12/90
next
prev
13/90
next
prev
14/90
next
prev
15/90
next
prev
16/90
next
prev
17/90
next
prev
18/90
next
prev
19/90
next
prev
20/90
next
prev
21/90
next
prev
22/90
next
prev
23/90
next
prev
24/90
next
prev
25/90
next
prev
26/90
next
prev
27/90
next
prev
28/90
next
prev
29/90
next
prev
30/90
next
prev
31/90
next
prev
32/90
next
prev
33/90
next
prev
34/90
next
prev
35/90
next
prev
36/90
next
prev
37/90
next
prev
38/90
next
prev
39/90
next
prev
40/90
next
prev
41/90
next
prev
42/90
next
prev
43/90
next
prev
44/90
next
prev
45/90
next
prev
46/90
next
prev
47/90
next
prev
48/90
next
prev
49/90
next
prev
50/90
next
prev
51/90
next
prev
52/90
next
prev
53/90
next
prev
54/90
next
prev
55/90
next
prev
56/90
next
prev
57/90
next
prev
58/90
next
prev
59/90
next
prev
60/90
next
prev
61/90
next
prev
62/90
next
prev
63/90
next
prev
64/90
next
prev
65/90
next
prev
66/90
next
prev
67/90
next
prev
68/90
next
prev
69/90
next
prev
70/90
next
prev
71/90
next
prev
72/90
next
prev
73/90
next
prev
74/90
next
prev
75/90
next
prev
76/90
next
prev
77/90
next
prev
78/90
next
prev
79/90
next
prev
80/90
next
prev
81/90
next
prev
82/90
next
prev
83/90
next
prev
84/90
next
prev
85/90
next
prev
86/90
next
prev
87/90
next
prev
88/90
next
prev
89/90
next
prev
90/90

PDF: LinuxConEU2014_LinuxPerfTools.pdf

Keywords (from pdftotext):

slide 1:
    Oct, 2014
    Linux	
      Performance	
      Tools	
      
    Brendan	
      Gregg	
      
    Senior	
      Performance	
      Architect	
      
    Performance	
      Engineering	
      Team
    	
      
    	
      	
      
    	
      
    bgregg@ne8lix.com	
      
    @brendangregg	
      
    	
      
    
slide 2:
    A	
      quick	
      tour	
      of	
      many	
      tools…	
      
    
slide 3:
    • Massive	
      AWS	
      EC2	
      Linux	
      cloud	
      
    – Tens	
      of	
      thousands	
      of	
      instances	
      
    – Autoscale	
      by	
      ~3k	
      each	
      day	
      
    – CentOS	
      and	
      Ubuntu	
      
    • FreeBSD	
      for	
      content	
      delivery	
      
    – Approx	
      33%	
      of	
      US	
      Internet	
      traffic	
      at	
      night	
      
    • Performance	
      is	
      criRcal	
      
    – Customer	
      saRsfacRon:	
      >gt;50M	
      subscribers	
      
    – $$$	
      price/performance	
      
    – Develop	
      tools	
      for	
      cloud-­‐wide	
      analysis;	
      
    use	
      server	
      tools	
      as	
      needed	
      
    • Just	
      launched	
      in	
      Europe!	
      
    
slide 4:
    Brendan	
      Gregg	
      
    • Senior	
      Performance	
      Architect,	
      Ne8lix	
      
    – Linux	
      and	
      FreeBSD	
      performance	
      
    – Performance	
      Engineering	
      team	
      (@coburnw)	
      
    • Recent	
      work:	
      
    – Linux	
      perf-­‐tools,	
      using	
      crace	
      &	
      perf_events	
      
    – Systems	
      Performance,	
      PrenRce	
      Hall	
      
    • Previous	
      work	
      includes:	
      
    – USE	
      Method,	
      flame	
      graphs,	
      uRlizaRon	
      &	
      
    latency	
      heat	
      maps,	
      DTrace	
      tools,	
      ZFS	
      L2ARC	
      
    • Twijer	
      @brendangregg	
      (these	
      slides)	
      
    
slide 5:
    Agenda	
      
    • Methodologies	
      &	
      Tools	
      
    • Tool	
      Types:	
      
    – Observability	
      
    Basic	
      
    Intermediate	
      
    Advanced	
      
    – Benchmarking	
      
    – Tuning	
      
    – StaRc	
      
    • Tracing	
      
    	
      	
      
    Aim:	
      to	
      show	
      what	
      can	
      be	
      done	
      
    Knowing	
      that	
      something	
      can	
      be	
      done	
      is	
      more	
      important	
      than	
      
    knowing	
      how	
      to	
      do	
      it.	
      
    
slide 6:
    Methodologies	
      &	
      Tools	
      
    
slide 7:
    Methodologies	
      &	
      Tools	
      
    • There	
      are	
      dozens	
      of	
      performance	
      tools	
      for	
      Linux	
      
    – Packages:	
      sysstat,	
      procps,	
      coreuRls,	
      …	
      
    – Commercial	
      products	
      
    • Methodologies	
      can	
      provide	
      guidance	
      for	
      
    choosing	
      and	
      using	
      tools	
      effecRvely	
      
    
slide 8:
    An3-­‐Methodologies	
      
    • The	
      lack	
      of	
      a	
      deliberate	
      methodology…	
      
    • Street	
      Light	
      AnR-­‐Method:	
      
    – 1.	
      Pick	
      observability	
      tools	
      that	
      are	
      
    • Familiar	
      
    • Found	
      on	
      the	
      Internet,	
      or	
      at	
      random	
      
    – 2.	
      Run	
      tools	
      
    – 3.	
      Look	
      for	
      obvious	
      issues	
      
    • Drunk	
      Man	
      AnR-­‐Method:	
      
    – Tune	
      things	
      at	
      random	
      unRl	
      the	
      problem	
      goes	
      away	
      
    
slide 9:
    Methodologies	
      
    • For	
      example,	
      the	
      USE	
      Method:	
      
    – For	
      every	
      resource,	
      check:	
      
    • URlizaRon	
      
    • SaturaRon	
      
    • Errors	
      
    • 5	
      Whys:	
      
    – Ask	
      “why?”	
      5	
      Rmes	
      
    • Other	
      methods	
      include:	
      
    – Workload	
      characterizaRon,	
      drill-­‐down	
      analysis,	
      event	
      
    tracing,	
      baseline	
      stats,	
      staRc	
      performance	
      tuning,	
      …	
      
    • Start	
      with	
      the	
      quesRons,	
      then	
      find	
      the	
      tools	
      
    
slide 10:
    Command	
      Line	
      Tools	
      
    • Useful	
      to	
      study	
      even	
      if	
      you	
      never	
      use	
      them:	
      
    GUIs	
      and	
      commercial	
      products	
      ocen	
      use	
      the	
      
    same	
      interfaces	
      
    Kernel	
      
    	
      
    /proc,	
      	
      /sys,	
      …	
      
    $ vmstat 1!
    procs -----------memory---------- ---swap-- …!
    r b
    swpd
    free
    buff cache
    so …!
    9 0
    0 29549320 29252 9299060
    2 0
    0 29547876 29252 9299332
    4 0
    0 29548124 29252 9299460
    5 0
    0 29548840 29252 9299592
    
slide 11:
    Tool	
      Types	
      
    Type	
      
    Observability	
      
    Benchmarking	
      
    Tuning	
      
    StaRc	
      
    Characteris.c	
      
    Watch	
      acRvity.	
      Safe,	
      usually,	
      depending	
      
    on	
      resource	
      overhead.	
      
    Load	
      test.	
      CauRon:	
      producRon	
      tests	
      can	
      
    cause	
      issues	
      due	
      to	
      contenRon.	
      
    Change.	
      Danger:	
      changes	
      could	
      hurt	
      
    performance,	
      now	
      or	
      later	
      with	
      load.	
      
    Check	
      configuraRon.	
      Should	
      be	
      safe.	
      
    
slide 12:
    Observability	
      Tools	
      
    
slide 13:
    How	
      do	
      you	
      measure	
      these?	
      
    
slide 14:
    Observability	
      Tools:	
      Basic	
      
    • upRme	
      
    • top	
      (or	
      htop)	
      
    • ps	
      
    • vmstat	
      
    • iostat	
      
    • mpstat	
      
    • free	
      
    
slide 15:
    upRme	
      
    • One	
      way	
      to	
      print	
      load	
      averages:	
      
    	
      	
      	
      	
      
    $ uptime!
    07:42:06 up
    8:16,
    1 user,
    load average: 2.27, 2.84, 2.91!
    • A	
      measure	
      of	
      resource	
      demand:	
      CPUs	
      +	
      disks	
      
    – Other	
      OSes	
      only	
      show	
      CPUs:	
      easier	
      to	
      interpret	
      
    • ExponenRally-­‐damped	
      moving	
      averages	
      with	
      
    Rme	
      constants	
      of	
      1,	
      5,	
      and	
      15	
      minutes	
      
    – Historic	
      trend	
      without	
      the	
      line	
      graph	
      
    • Load	
      >gt;	
      #	
      of	
      CPUs,	
      may	
      mean	
      CPU	
      saturaRon	
      
    – Don’t	
      spend	
      more	
      than	
      5	
      seconds	
      studying	
      these	
      	
      
    	
      
    
slide 16:
    top	
      (or	
      htop)	
      
    • System	
      and	
      per-­‐process	
      interval	
      summary:	
      
    $ top - 18:50:26 up 7:43, 1 user, load average: 4.11, 4.91, 5.22!
    Tasks: 209 total,
    1 running, 206 sleeping,
    0 stopped,
    2 zombie!
    Cpu(s): 47.1%us, 4.0%sy, 0.0%ni, 48.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.2%st!
    Mem: 70197156k total, 44831072k used, 25366084k free,
    36360k buffers!
    Swap:
    0k total,
    0k used,
    0k free, 11873356k cached!
    PID USER
    PR NI VIRT RES SHR S %CPU %MEM
    TIME+ COMMAND
    5738 apiprod
    0 62.6g 29g 352m S 417 44.2
    2144:15 java
    1386 apiprod
    0 17452 1388 964 R
    0 0.0
    0:00.02 top
    1 root
    0 24340 2272 1340 S
    0 0.0
    0:01.51 init
    2 root
    0 S
    0 0.0
    0:00.00 kthreadd
    […]!
    • %CPU	
      is	
      summed	
      across	
      all	
      CPUs	
      
    • Can	
      miss	
      short-­‐lived	
      processes	
      (atop	
      won’t)	
      
    • Can	
      consume	
      noRceable	
      CPU	
      to	
      read	
      /proc	
      
    
slide 17:
    htop	
      
    
slide 18:
    ps	
      
    • Process	
      status	
      lisRng	
      (eg,	
      “ASCII	
      art	
      forest”):	
      
    $ ps -ef f!
    UID
    PID PPID
    […]!
    root
    root
    28261 4546
    prod
    28287 28261
    prod
    28288 28287
    prod
    3156 28288
    root
    root
    4969 4965
    […]!
    C STIME TTY
    STAT
    TIME CMD!
    0 11:08 ?
    0 17:24 ?
    0 17:24 ?
    0 17:24 pts/0
    0 19:15 pts/0
    0 11:08 ?
    0 11:08 ?
    0:00 /usr/sbin/sshd -D!
    0:00 \_ sshd: prod [priv]!
    0:00
    \_ sshd: prod@pts/0 !
    0:00
    \_ -bash!
    0:00
    \_ ps -ef f!
    0:00 /bin/sh /usr/bin/svscanboot!
    0:00 \_ svscan /etc/service!
    • Custom	
      fields:	
      
    $ ps -eo user,sz,rss,minflt,majflt,pcpu,args!
    USER
    RSS MINFLT MAJFLT %CPU COMMAND!
    root
    6085 2272 11928
    24 0.0 /sbin/init!
    […]!
    
slide 19:
    vmstat	
      
    • Virtual	
      memory	
      staRsRcs	
      and	
      more:	
      
    $ vmstat –Sm 1!
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----!
    r b
    swpd
    free
    buff cache
    cs us sy id wa!
    8 0
    12 25 34 0 0!
    7 0
    0 205 186 46 13 0 0!
    8 0
    8 210 435 39 21 0 0!
    8 0
    0 218 219 42 17 0 0!
    […]!
    • USAGE:	
      vmstat	
      [interval	
      [count]]	
      
    • First	
      output	
      line	
      has	
      some	
      summary	
      since	
      boot	
      
    values	
      (should	
      be	
      all;	
      parRal	
      is	
      confusing)	
      
    • High	
      level	
      CPU	
      summary.	
      “r”	
      is	
      runnable	
      tasks.	
      
    
slide 20:
    iostat	
      
    • Block	
      I/O	
      (disk)	
      stats.	
      1st	
      output	
      is	
      since	
      boot.	
      
    $ iostat -xmdz 1!
    Linux 3.13.0-29 (db001-eb883efa)
    08/18/2014
    _x86_64_
    Device:
    rrqm/s
    wrqm/s
    r/s
    w/s
    rMB/s
    xvda
    xvdb
    0.00 15299.00
    xvdc
    0.00 15271.00
    md0
    0.00 31082.00
    (16 CPU)!
    wMB/s \ ...!
    0.00 / ...!
    0.00 \ ...!
    0.01 / ...!
    0.01 \ ...!
    Workload	
      
    • Very	
      useful	
      
    set	
      of	
      stats	
      
    ... \ avgqu-sz
    ... /
    ... \
    ... /
    ... \
    await r_await w_await
    ResulRng	
      Performance	
      
    svctm
    %util!
    0.00!
    86.40!
    86.00!
    0.00!
    
slide 21:
    mpstat	
      
    • MulR-­‐processor	
      staRsRcs,	
      per-­‐CPU:	
      
    $ mpstat –P ALL 1!
    […]!
    08:06:43 PM CPU
    %usr
    08:06:44 PM all 53.45
    08:06:44 PM
    0 49.49
    08:06:44 PM
    1 51.61
    08:06:44 PM
    2 58.16
    08:06:44 PM
    3 54.55
    08:06:44 PM
    4 47.42
    08:06:44 PM
    5 65.66
    08:06:44 PM
    6 50.00
    […]!
    %nice
    %sys %iowait
    %irq
    %soft %steal %guest
    • Look	
      for	
      unbalanced	
      workloads,	
      hot	
      CPUs.	
      
    %idle!
    42.26!
    45.45!
    41.94!
    33.67!
    40.40!
    49.48!
    31.31!
    47.92!
    
slide 22:
    free	
      
    • Main	
      memory	
      usage:	
      
    $ free -m!
    total
    Mem:
    -/+ buffers/cache:
    Swap:
    used
    free
    3313!
    shared
    	
      
    • buffers:	
      block	
      device	
      I/O	
      cache	
      
    • cached:	
      virtual	
      page	
      cache	
      
    buffers
    cached!
    527!
    
slide 23:
    Observability	
      Tools:	
      Basic	
      
    
slide 24:
    Observability	
      Tools:	
      Intermediate	
      
    • strace	
      
    • tcpdump	
      
    • netstat	
      
    • nicstat	
      
    • pidstat	
      
    • swapon	
      
    • lsof	
      
    • sar	
      (and	
      collectl,	
      dstat,	
      etc.)	
      
    
slide 25:
    strace	
      
    • System	
      call	
      tracer:	
      
    $ strace –tttT –p 313!
    1408393285.779746 getgroups(0, NULL)
    = 1 gt;!
    1408393285.779873 getgroups(1, [0])
    = 1 gt;!
    1408393285.780797 close(3)
    = 0 gt;!
    1408393285.781338 write(1, "LinuxCon 2014!\n", 15LinuxCon 2014!!
    ) = 15 gt;!
    • Eg,	
      -­‐jt:	
      Rme	
      (us)	
      since	
      epoch;	
      -­‐T:	
      syscall	
      Rme	
      (s)	
      
    • Translates	
      syscall	
      args	
      
    – Very	
      helpful	
      for	
      solving	
      system	
      usage	
      issues	
      
    • Currently	
      has	
      massive	
      overhead	
      (ptrace	
      based)	
      
    – Can	
      slow	
      the	
      target	
      by	
      >gt;	
      100x.	
      Use	
      extreme	
      cauRon.	
      
    
slide 26:
    tcpdump	
      
    • Sniff	
      network	
      packets	
      for	
      post	
      analysis:	
      
    $ tcpdump -i eth0 -w /tmp/out.tcpdump!
    tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes!
    ^C7985 packets captured!
    8996 packets received by filter!
    1010 packets dropped by kernel!
    # tcpdump -nr /tmp/out.tcpdump | head
    reading from file /tmp/out.tcpdump, link-type EN10MB (Ethernet)
    20:41:05.038437 IP 10.44.107.151.22 >gt; 10.53.237.72.46425: Flags [P.], seq 18...!
    20:41:05.038533 IP 10.44.107.151.22 >gt; 10.53.237.72.46425: Flags [P.], seq 48...!
    20:41:05.038584 IP 10.44.107.151.22 >gt; 10.53.237.72.46425: Flags [P.], seq 96...!
    […]!
    • Study	
      packet	
      sequences	
      with	
      Rmestamps	
      (us)	
      
    • CPU	
      overhead	
      opRmized	
      (socket	
      ring	
      buffers),	
      
    but	
      can	
      sRll	
      be	
      significant.	
      Use	
      cauRon.	
      
    
slide 27:
    netstat	
      
    • Various	
      network	
      protocol	
      staRsRcs	
      using	
      -­‐s:	
      
    $ netstat –s!
    • A	
      mulR-­‐tool:	
      
    […]!
    -­‐i:	
      interface	
      stats	
      
    -­‐r:	
      route	
      table	
      
    default:	
      list	
      conns	
      
    • netstat	
      -­‐p:	
      shows	
      
    process	
      details!	
      
    • Per-­‐second	
      interval	
      
    with	
      -­‐c	
      
    Tcp:!
    736455 active connections openings!
    176887 passive connection openings!
    33 failed connection attempts!
    1466 connection resets received!
    3311 connections established!
    91975192 segments received!
    180415763 segments send out!
    223685 segments retransmited!
    2 bad segments received.!
    39481 resets sent!
    […]!
    TcpExt:!
    12377 invalid SYN cookies received!
    2982 delayed acks sent!
    […]!
    
slide 28:
    nicstat	
      
    • Network	
      interface	
      stats,	
      iostat-­‐like	
      output:	
      
    $ ./nicstat 1!
    Time
    Int
    21:21:43
    21:21:43
    eth0
    Time
    Int
    21:21:44
    21:21:44
    eth0
    Time
    Int
    21:21:45
    21:21:45
    eth0
    […]!
    rKB/s
    rKB/s
    rKB/s
    wKB/s
    wKB/s
    wKB/s
    rPk/s
    rPk/s
    rPk/s
    wPk/s
    wPk/s
    wPk/s
    rAvs
    wAvs %Util
    4915.4 4915.4 0.00
    139.8 0.00
    rAvs
    wAvs %Util
    0.00 0.00
    58.76 40441.3 0.00
    rAvs
    wAvs %Util
    4400.8 4400.8 0.00
    54.99 2979.1 0.00
    Sat!
    0.00!
    0.00!
    Sat!
    0.00!
    0.00!
    Sat!
    0.00!
    0.00!
    • Check	
      network	
      throughput	
      and	
      interface	
      %uRl	
      
    • I	
      wrote	
      this	
      years	
      ago;	
      Tim	
      Cook	
      ported	
      to	
      Linux	
      
    
slide 29:
    pidstat	
      
    • Very	
      useful	
      process	
      stats.	
      eg,	
      by-­‐thread,	
      disk	
      I/O:	
      
    $ pidstat -t 1!
    Linux 3.2.0-54 (db002-91befe03) !08/18/2014
    !_x86_64_ !(8 CPU)!
    08:57:52 PM
    TGID
    TID
    %usr %system %guest
    %CPU
    08:57:54 PM
    - 484.75
    0.00 524.58
    08:57:54 PM
    08:57:54 PM
    08:57:54 PM
    08:57:54 PM
    […]!
    $ pidstat -d 1!
    […]!
    08:58:27 PM
    PID
    kB_rd/s
    kB_wr/s kB_ccwr/s Command!
    08:58:28 PM
    0.00 java!
    […]!
    • I	
      usually	
      prefer	
      this	
      over	
      top(1)	
      
    CPU
    Command!
    java!
    |__java!
    |__java!
    |__java!
    |__java!
    
slide 30:
    swapon	
      
    • Show	
      swap	
      device	
      usage:	
      
    	
      $ swapon -s!
    	
      
    • If	
      you	
      have	
      swap	
      enabled…	
      
    Filename
    /dev/sda3
    Type
    partition
    Size
    Used
    Priority!
    -1!
    
slide 31:
    lsof	
      
    • More	
      a	
      debug	
      tool,	
      lsof(8)	
      shows	
      file	
      descriptor	
      
    usage,	
      which	
      for	
      some	
      apps,	
      equals	
      current	
      
    acRve	
      network	
      connecRons:	
      
    # lsof -iTCP -sTCP:ESTABLISHED!
    COMMAND
    PID USER FD TYPE DEVICE SIZE/OFF NODE NAME!
    sshd
    755 root 3r IPv4 13576887
    0t0 TCP bgregg-test-i-f106:ssh->gt;prod100.netflix.com:
    15241 (ESTABLISHED)!
    platforms 2614 app1 8u IPv4
    0t0 TCP localhost:33868->gt;localhost:5433 (ESTABLISHED)!
    postgres 2648 app1 7u IPv4
    0t0 TCP localhost:5433->gt;localhost:33868 (ESTABLISHED)!
    epic_plug 2857 app1 7u IPv4
    0t0 TCP localhost:33885->gt;localhost:5433 (ESTABLISHED)!
    postgres 2892 app1 7u IPv4
    0t0 TCP localhost:5433->gt;localhost:33885 (ESTABLISHED)!
    […]!
    • I’d	
      prefer	
      to:	
      	
      echo /proc/PID/fd | wc -l!
    
slide 32:
    sar	
      
    • System	
      AcRvity	
      Reporter.	
      Many	
      stats,	
      eg:	
      
    $ sar -n TCP,ETCP,DEV 1!
    Linux 3.2.55 (test-e4f1a80b)
    !08/18/2014
    !_x86_64_ !(8 CPU)!
    09:10:43 PM IFACE rxpck/s txpck/s
    rxkB/s
    txkB/s rxcmp/s txcmp/s
    09:10:44 PM
    09:10:44 PM
    eth0 4114.00 4186.00 4537.46 28513.24
    09:10:43 PM active/s passive/s
    iseg/s
    oseg/s!
    09:10:44 PM
    4107.00 22511.00!
    09:10:43 PM atmptf/s estres/s retrans/s isegerr/s
    orsts/s!
    09:10:44 PM
    1.00!
    […]!
    • Archive	
      or	
      live	
      mode:	
      (interval	
      [count])	
      
    • Well	
      designed.	
      Header	
      naming	
      convenRon,	
      
    logical	
      groups:	
      TCP,	
      ETCP,	
      DEV,	
      EDEV,	
      …	
      
    rxmcst/s!
    0.00!
    0.00!
    
slide 33:
    Observability:	
      sar	
      
    
slide 34:
    Other	
      Tools	
      
    • You	
      may	
      also	
      use	
      collectl,	
      atop,	
      dstat,	
      or	
      another	
      
    measure-­‐all	
      tool	
      
    • The	
      tool	
      isn’t	
      important	
      
    • It’s	
      important	
      to	
      have	
      a	
      way	
      to	
      measure	
      
    everything	
      you	
      want	
      
    • In	
      cloud	
      environments,	
      you	
      are	
      probably	
      using	
      a	
      
    monitoring	
      product,	
      developed	
      in-­‐house	
      or	
      
    commercial.	
      Same	
      method	
      applies…	
      
    
slide 35:
    How	
      does	
      your	
      monitoring	
      tool	
      
    measure	
      these?	
      
    
slide 36:
    Observability	
      Tools:	
      Intermediate	
      
    
slide 37:
    Advanced	
      Observability	
      Tools	
      
    • Misc:	
      
    – ltrace,	
      ss,	
      iptraf,	
      ethtool,	
      snmpget,	
      lldptool,	
      iotop,	
      
    blktrace,	
      slabtop,	
      /proc,	
      pcstat	
      
    • CPU	
      Performance	
      Counters:	
      
    – perf_events,	
      Rptop,	
      rdmsr	
      
    • Advanced	
      Tracers:	
      
    – perf_events,	
      crace,	
      eBPF,	
      SystemTap,	
      ktap,	
      LTTng,	
      
    dtrace4linux,	
      sysdig	
      
    • Some	
      selected	
      demos…	
      
    
slide 38:
    ss	
      
    • More	
      socket	
      staRsRcs:	
      
    $ ss -mop!
    State
    Recv-Q Send-Q
    Local Address:Port
    Peer Address:Port
    CLOSE-WAIT 1
    127.0.0.1:42295
    127.0.0.1:28527
    users:(("apacheLogParser",2702,3))!
    ! mem:(r1280,w0,f2816,t0)!
    ESTAB
    127.0.0.1:5433
    127.0.0.1:41312
    timer:(keepalive,36min,0) users:(("postgres",2333,7))!
    ! mem:(r0,w0,f0,t0)!
    […]!
    $ ss –i!
    State
    Recv-Q Send-Q
    Local Address:Port
    Peer Address:Port
    CLOSE-WAIT 1
    127.0.0.1:42295
    127.0.0.1:28527
    cubic wscale:6,6 rto:208 rtt:9/6 ato:40 cwnd:10 send 145.6Mbps rcv_space:32792!
    ESTAB
    10.144.107.101:ssh
    10.53.237.72:4532
    cubic wscale:4,6 rto:268 rtt:71.5/3 ato:40 cwnd:10 send 1.5Mbps rcv_rtt:72
    rcv_space:14480!
    […]!
    
slide 39:
    iptraf	
      
    
slide 40:
    iotop	
      
    • Block	
      device	
      I/O	
      (disk)	
      by	
      process:	
      
    $ iotop!
    Total DISK READ:
    TID PRIO USER
    959 be/4 root
    6641 be/4 root
    1 be/4 root
    2 be/4 root
    3 be/4 root
    4 be/4 root
    5 be/4 root
    6 rt/4 root
    […]!
    50.47 M/s | Total DISK WRITE:
    59.21 M/s!
    DISK READ DISK WRITE SWAPIN
    IO>gt;
    COMMAND
    0.00 B/s
    0.00 B/s 0.00 % 99.99 % [flush-202:1]!
    50.47 M/s
    82.60 M/s 0.00 % 32.51 % java –Dnop –X!
    0.00 B/s
    0.00 B/s 0.00 % 0.00 % init!
    0.00 B/s
    0.00 B/s 0.00 % 0.00 % [kthreadd]!
    0.00 B/s
    0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]!
    0.00 B/s
    0.00 B/s 0.00 % 0.00 % [kworker/0:0]!
    0.00 B/s
    0.00 B/s 0.00 % 0.00 % [kworker/u:0]!
    0.00 B/s
    0.00 B/s 0.00 % 0.00 % [migration/0]!
    • Needs	
      kernel	
      support	
      enabled	
      
    – CONFIG_TASK_IO_ACCOUNTING	
      
    
slide 41:
    slabtop	
      
    • Kernel	
      slab	
      allocator	
      memory	
      usage:	
      
    $ slabtop!
    Active / Total Objects (% used)
    : 4692768 / 4751161 (98.8%)!
    Active / Total Slabs (% used)
    : 129083 / 129083 (100.0%)!
    Active / Total Caches (% used)
    : 71 / 109 (65.1%)!
    Active / Total Size (% used)
    : 729966.22K / 738277.47K (98.9%)!
    Minimum / Average / Maximum Object : 0.01K / 0.16K / 8.00K!
    OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
    3565575 3565575 100%
    0.10K 91425
    365700K buffer_head!
    314916 314066 99%
    0.19K 14996
    59984K dentry!
    184192 183751 99%
    0.06K
    11512K kmalloc-64!
    138618 138618 100%
    0.94K
    130464K xfs_inode!
    138602 138602 100%
    0.21K
    29968K xfs_ili!
    102116 99012 96%
    0.55K
    58352K radix_tree_node!
    97482 49093 50%
    0.09K
    9284K kmalloc-96!
    22695 20777 91%
    0.05K
    1068K shared_policy_node!
    21312 21312 100%
    0.86K
    18432K ext4_inode_cache!
    16288 14601 89%
    0.25K
    4072K kmalloc-256!
    […]!
    
slide 42:
    pcstat	
      
    • Show	
      page	
      cache	
      residency	
      by	
      file:	
      
    # ./pcstat data0*!
    |----------+----------------+------------+-----------+---------|!
    | Name
    | Size
    | Pages
    | Cached
    | Percent |!
    |----------+----------------+------------+-----------+---------|!
    | data00
    | 104857600
    | 25600
    | 25600
    | 100.000 |!
    | data01
    | 104857600
    | 25600
    | 25600
    | 100.000 |!
    | data02
    | 104857600
    | 25600
    | 4080
    | 015.938 |!
    | data03
    | 104857600
    | 25600
    | 25600
    | 100.000 |!
    | data04
    | 104857600
    | 25600
    | 16010
    | 062.539 |!
    | data05
    | 104857600
    | 25600
    | 0
    | 000.000 |!
    |----------+----------------+------------+-----------+---------|!
    • Uses	
      the	
      mincore(2)	
      syscall.	
      Useful	
      for	
      database	
      
    performance	
      analysis.	
      
    
slide 43:
    perf_events	
      (counters)	
      
    • Performance	
      Monitoring	
      Counters	
      (PMCs):	
      
    $ perf list | grep –i hardware!
    cpu-cycles OR cycles
    stalled-cycles-frontend OR idle-cycles-frontend
    stalled-cycles-backend OR idle-cycles-backend
    instructions
    […]!
    branch-misses
    bus-cycles
    L1-dcache-loads
    L1-dcache-load-misses
    […]!
    rNNN (see 'perf list --help' on how to encode it)
    mem:gt;[:access]
    [Hardware event]!
    [Hardware event]!
    [Hardware event]!
    [Hardware event]!
    [Hardware event]!
    [Hardware event]!
    [Hardware cache event]!
    [Hardware cache event]!
    	
      
    • IdenRfy	
      CPU	
      cycle	
      breakdowns,	
      esp.	
      stall	
      types	
      
    [Raw hardware event … !
    [Hardware breakpoint]!
    – PMCs	
      not	
      enabled	
      by-­‐default	
      in	
      clouds	
      (yet)	
      
    – Can	
      be	
      Rme-­‐consuming	
      to	
      use	
      (CPU	
      manuals)	
      
    • Use	
      flame	
      graphs	
      to	
      visualize	
      sampled	
      stack	
      traces	
      
    
slide 44:
    perf_events	
      CPU	
      Flame	
      Graph	
      
    Kernel	
      
    TCP/IP	
      
    Broken	
      
    Java	
      stacks	
      
    (missing	
      
    frame	
      
    pointer)	
      
    GC	
      
    Locks	
      
    Time	
      
    Idle	
      
    thread	
      
    epoll	
      
    
slide 45:
    Rptop	
      
    • IPC	
      by	
      process,	
      %MISS,	
      %BUS	
      
    • Needs	
      some	
      love.	
      perfmon2	
      library	
      integraRon?	
      
    • SRll	
      can’t	
      use	
      it	
      in	
      clouds	
      yet	
      (needs	
      PMCs	
      enabled)	
      
    
slide 46:
    rdmsr	
      
    • Model	
      Specific	
      Registers	
      (MSRs),	
      unlike	
      PMCs,	
      can	
      
    be	
      read	
      by	
      default	
      in	
      Xen	
      guests	
      
    – Timestamp	
      clock,	
      temp,	
      power,	
      …	
      
    – Use	
      rdmsr(1)	
      from	
      the	
      msr-­‐tools	
      package	
      to	
      read	
      them	
      
    – Uses	
      include	
      (hjps://github.com/brendangregg/msr-­‐cloud-­‐tools):	
      
    ./showboost!
    	
      ec2-guest#
    [...]!
    C0_MCYC
    C0_ACYC
    	
      TIME
    06:11:35
    UTIL
    51%
    50%
    49%
    RATIO
    116%
    115%
    115%
    06:11:40
    06:11:45
    [...]!
    ec2-guest# ./cputemp 1!
    CPU1 CPU2 CPU3 CPU4!
    61 61 60 59!
    CPU	
      Temperature	
      
    60 61 60 60!
    [...]!
    MHz!
    2900!
    2899!
    2899!
    Real	
      CPU	
      MHz	
      
    
slide 47:
    More	
      Advanced	
      Tools…	
      
    • Some	
      others	
      worth	
      menRoning:	
      
    Tool	
      
    ltrace	
      
    ethtool	
      
    snmpget	
      
    lldptool	
      
    blktrace	
      
    /proc	
      
    pmu-­‐tools	
      
    Descrip.on	
      
    Library	
      call	
      tracer	
      
    Mostly	
      interface	
      tuning;	
      some	
      stats	
      
    SNMP	
      network	
      host	
      staRsRcs	
      
    Can	
      get	
      LLDP	
      broadcast	
      stats	
      
    Block	
      I/O	
      event	
      tracer	
      
    Many	
      raw	
      kernel	
      counters	
      
    On-­‐	
      and	
      off-­‐core	
      CPU	
      counter	
      tools	
      
    
slide 48:
    Advanced	
      Tracers	
      
    • Many	
      opRons	
      on	
      Linux:	
      
    – perf_events,	
      crace,	
      eBPF,	
      SystemTap,	
      ktap,	
      LTTng,	
      
    dtrace4linux,	
      sysdig	
      
    • Most	
      can	
      do	
      staRc	
      and	
      dynamic	
      tracing	
      
    – StaRc:	
      pre-­‐defined	
      events	
      (tracepoints)	
      
    – Dynamic:	
      instrument	
      any	
      socware	
      (kprobes,	
      
    uprobes).	
      Custom	
      metrics	
      on-­‐demand.	
      Catch	
      all.	
      
    • Many	
      are	
      in-­‐development.	
      
    – I’ll	
      summarize	
      their	
      state	
      later…	
      
    
slide 49:
    Linux	
      Observability	
      Tools	
      
    
slide 50:
    Linux	
      Observability	
      Tools	
      
    
slide 51:
    Benchmarking	
      Tools	
      
    
slide 52:
    Benchmarking	
      Tools	
      
    • MulR:	
      
    – UnixBench,	
      lmbench,	
      sysbench,	
      perf	
      bench	
      
    • FS/disk:	
      
    – dd,	
      hdparm,	
      fio	
      
    • App/lib:	
      
    – ab,	
      wrk,	
      jmeter,	
      openssl	
      
    • Networking:	
      
    – ping,	
      hping3,	
      iperf,	
      jcp,	
      traceroute,	
      mtr,	
      pchar	
      
    
slide 53:
    AcRve	
      Benchmarking	
      
    • Most	
      benchmarks	
      are	
      misleading	
      or	
      wrong	
      
    – You	
      benchmark	
      A,	
      but	
      actually	
      measure	
      B,	
      and	
      
    conclude	
      that	
      you	
      measured	
      C	
      
    • AcRve	
      Benchmarking	
      
    1. Run	
      the	
      benchmark	
      for	
      hours	
      
    2. While	
      running,	
      analyze	
      and	
      confirm	
      the	
      
    performance	
      limiter	
      using	
      observability	
      tools	
      
    • We	
      just	
      covered	
      those	
      tools	
      –	
      use	
      them!	
      
    
slide 54:
    lmbench	
      
    • CPU,	
      memory,	
      and	
      kernel	
      micro-­‐benchmarks	
      
    • Eg,	
      memory	
      latency	
      by	
      stride	
      size:	
      
    $ lat_mem_rd 100m 128 >gt; out.latencies!
    some R processing…!
    L2	
      cache	
      
    Main	
      
    Memory	
      
    L1	
      cache	
      
    L3	
      cache	
      
    
slide 55:
    fio	
      
    • FS	
      or	
      disk	
      I/O	
      micro-­‐benchmarks	
      
    $ fio --name=seqwrite --rw=write --bs=128k --size=122374m!
    […]!
    seqwrite: (groupid=0, jobs=1): err= 0: pid=22321!
    write: io=122374MB, bw=840951KB/s, iops=6569 , runt=149011msec!
    clat (usec): min=41 , max=133186 , avg=148.26, stdev=1287.17!
    lat (usec): min=44 , max=133188 , avg=151.11, stdev=1287.21!
    bw (KB/s) : min=10746, max=1983488, per=100.18%, avg=842503.94,
    stdev=262774.35!
    cpu
    : usr=2.67%, sys=43.46%, ctx=14284, majf=1, minf=24!
    IO depths
    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >gt;=64=0.0%!
    submit
    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >gt;=64=0.0%!
    complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >gt;=64=0.0%!
    issued r/w/d: total=0/978992/0, short=0/0/0!
    lat (usec): 50=0.02%, 100=98.30%, 250=1.06%, 500=0.01%, 750=0.01%!
    lat (usec): 1000=0.01%!
    lat (msec): 2=0.01%, 4=0.01%, 10=0.25%, 20=0.29%, 50=0.06%!
    lat (msec): 100=0.01%, 250=0.01%!
    • Results	
      include	
      basic	
      latency	
      distribuRon	
      
    
slide 56:
    pchar	
      
    • Traceroute	
      with	
      bandwidth	
      per	
      hop!	
      
    $ pchar 10.71.83.1!
    […]!
    4: 10.110.80.1 (10.110.80.1)!
    Partial loss:
    0 / 5 (0%)!
    Partial char:
    rtt = 9.351109 ms, (b = 0.004961 ms/B), r2 = 0.184105!
    stddev rtt = 4.967992, stddev b = 0.006029!
    Partial queueing: avg = 0.000000 ms (0 bytes)!
    Hop char:
    rtt = --.--- ms, bw = 1268.975773 Kbps!
    Hop queueing:
    avg = 0.000000 ms (0 bytes)!
    5: 10.193.43.181 (10.193.43.181)!
    Partial loss:
    0 / 5 (0%)!
    Partial char:
    rtt = 25.461597 ms, (b = 0.011934 ms/B), r2 = 0.228707!
    stddev rtt = 10.426112, stddev b = 0.012653!
    Partial queueing: avg = 0.000000 ms (0 bytes)!
    Hop char:
    rtt = 16.110487 ms, bw = 1147.210397 Kbps!
    Hop queueing:
    avg = 0.000000 ms (0 bytes)!
    […]!
    • Needs	
      love.	
      Based	
      on	
      pathchar	
      (Linux	
      2.0.30).	
      
    
slide 57:
    Benchmarking	
      Tools	
      
    
slide 58:
    Tuning	
      Tools	
      
    
slide 59:
    Tuning	
      Tools	
      
    • Generic	
      interfaces:	
      
    – sysctl,	
      /sys	
      
    • Many	
      areas	
      have	
      custom	
      tuning	
      tools:	
      
    – ApplicaRons:	
      their	
      own	
      config	
      
    – CPU/scheduler:	
      nice,	
      renice,	
      taskset,	
      ulimit,	
      chcpu	
      
    – Storage	
      I/O:	
      tune2fs,	
      ionice,	
      hdparm,	
      blockdev,	
      …	
      
    – Network:	
      ethtool,	
      tc,	
      ip,	
      route	
      
    – Dynamic	
      patching:	
      stap,	
      kpatch	
      
    
slide 60:
    Tuning	
      Methods	
      
    • ScienRfic	
      Method:	
      
    QuesRon	
      
    Hypothesis	
      
    PredicRon	
      
    Test	
      
    Analysis	
      
    • Any	
      observa3onal	
      or	
      benchmarking	
      tests	
      you	
      can	
      
    try	
      before	
      tuning?	
      	
      
    • Consider	
      risks,	
      and	
      see	
      previous	
      tools	
      
    
slide 61:
    Tuning	
      Tools	
      
    
slide 62:
    StaRc	
      Tools	
      
    
slide 63:
    StaRc	
      Tools	
      
    • StaRc	
      Performance	
      Tuning:	
      check	
      the	
      staRc	
      state	
      
    and	
      configuraRon	
      of	
      the	
      system	
      
    – CPU	
      types	
      
    – Storage	
      devices	
      
    – File	
      system	
      capacity	
      
    – File	
      system	
      and	
      volume	
      configuraRon	
      
    – Route	
      table	
      
    – State	
      of	
      hardware	
      
    • What	
      can	
      be	
      checked	
      on	
      a	
      system	
      without	
      load	
      
    
slide 64:
    StaRc	
      Tools	
      
    
slide 65:
    Tracing	
      
    
slide 66:
    Tracing	
      Frameworks:	
      Tracepoints	
      
    • StaRcally	
      placed	
      at	
      logical	
      places	
      in	
      the	
      kernel	
      
    • Provides	
      key	
      event	
      details	
      as	
      a	
      “format”	
      string	
      
    
slide 67:
    Tracing	
      Frameworks:	
      +	
      probes	
      
    • kprobes:	
      dynamic	
      kernel	
      tracing	
      
    – funcRon	
      calls,	
      returns,	
      line	
      numbers	
      
    • uprobes:	
      dynamic	
      user-­‐level	
      tracing	
      
    
slide 68:
    Tracing	
      Tools	
      
    • OpRons:	
      
    – crace	
      
    – perf_events	
      
    – eBPF	
      
    – SystemTap	
      
    – ktap	
      
    – LTTng	
      
    – dtrace4linux	
      
    – Oracle	
      Linux	
      DTrace	
      
    – sysdig	
      
    • Too	
      many	
      choices,	
      and	
      many	
      sRll	
      in-­‐development	
      
    
slide 69:
    Imagine	
      Linux	
      with	
      Tracing	
      
    • With	
      a	
      programmable	
      tracer,	
      high	
      level	
      tools	
      can	
      
    be	
      wrijen,	
      such	
      as:	
      
    – iosnoop	
      
    – iolatency	
      
    – opensnoop	
      
    – …	
      
    
slide 70:
    iosnoop	
      
    • Block	
      I/O	
      (disk)	
      events	
      with	
      latency:	
      
    # ./iosnoop –ts!
    Tracing block I/O. Ctrl-C to end.!
    STARTs
    ENDs
    COMM
    5982800.302061 5982800.302679 supervise
    5982800.302423 5982800.302842 supervise
    5982800.304962 5982800.305446 supervise
    5982800.305250 5982800.305676 supervise
    […]!
    PID
    TYPE DEV
    202,1
    202,1
    202,1
    202,1
    BLOCK
    BYTES LATms!
    0.62!
    0.42!
    0.48!
    0.43!
    # ./iosnoop –h!
    USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]!
    -d device
    # device string (eg, "202,1)!
    -i iotype
    # match type (eg, '*R*' for all reads)!
    -n name
    # process name to match on I/O issue!
    -p PID
    # PID to match on I/O issue!
    # include queueing time in LATms!
    # include start time of I/O (s)!
    # include completion time of I/O (s)!
    # this usage message!
    duration
    # duration seconds, and use buffers!
    […]!
    
slide 71:
    iolatency	
      
    • Block	
      I/O	
      (disk)	
      latency	
      distribuRons:	
      
    # ./iolatency !
    Tracing block I/O. Output every 1 seconds. Ctrl-C to end.!
    >gt;=(ms) .. gt; 1
    : 2104
    |######################################|!
    1 ->gt; 2
    : 280
    |######
    2 ->gt; 4
    : 2
    4 ->gt; 8
    : 0
    8 ->gt; 16
    : 202
    |####
    >gt;=(ms) .. gt; 1
    : 1144
    |######################################|!
    1 ->gt; 2
    : 267
    |#########
    2 ->gt; 4
    : 10
    4 ->gt; 8
    : 5
    8 ->gt; 16
    : 248
    |#########
    16 ->gt; 32
    : 601
    |####################
    32 ->gt; 64
    : 117
    |####
    […]!
    
slide 72:
    opensnoop	
      
    • Trace	
      open()	
      syscalls	
      showing	
      filenames:	
      
    # ./opensnoop -t!
    Tracing open()s. Ctrl-C to end.!
    TIMEs
    COMM
    PID
    postgres
    postgres
    postgres
    postgres
    postgres
    postgres
    postgres
    svstat
    svstat
    stat
    stat
    stat
    stat
    stat
    stat
    […]!
    FD FILE!
    0x8 /proc/self/oom_adj!
    0x5 global/pg_filenode.map!
    0x5 global/pg_internal.init!
    0x5 base/16384/PG_VERSION!
    0x5 base/16384/pg_filenode.map!
    0x5 base/16384/pg_internal.init!
    0x5 base/16384/11725!
    0x4 supervise/ok!
    0x4 supervise/status!
    0x3 /etc/ld.so.cache!
    0x3 /lib/x86_64-linux-gnu/libselinux…!
    0x3 /lib/x86_64-linux-gnu/libc.so.6!
    0x3 /lib/x86_64-linux-gnu/libdl.so.2!
    0x3 /proc/filesystems!
    0x3 /etc/nsswitch.conf!
    
slide 73:
    funcgraph	
      
    • Trace	
      a	
      graph	
      of	
      kernel	
      code	
      flow:	
      
    # ./funcgraph -Htp 5363 vfs_read!
    Tracing "vfs_read" for PID 5363... Ctrl-C to end.!
    # tracer: function_graph!
    TIME
    CPU DURATION
    FUNCTION CALLS!
    4346366.073832 |
    | vfs_read() {!
    4346366.073834 |
    rw_verify_area() {!
    4346366.073834 |
    security_file_permission() {!
    4346366.073834 |
    apparmor_file_permission() {!
    4346366.073835 |
    0.153 us
    common_file_perm();!
    4346366.073836 |
    0.947 us
    4346366.073836 |
    0.066 us
    __fsnotify_parent();!
    4346366.073836 |
    0.080 us
    fsnotify();!
    4346366.073837 |
    2.174 us
    4346366.073837 |
    2.656 us
    4346366.073837 |
    tty_read() {!
    4346366.073837 |
    0.060 us
    tty_paranoia_check();!
    […]!
    
slide 74:
    kprobe	
      
    • Dynamically	
      trace	
      a	
      kernel	
      funcRon	
      call	
      or	
      return,	
      
    with	
      variables,	
      and	
      in-­‐kernel	
      filtering:	
      
    # ./kprobe 'p:open do_sys_open filename=+0(%si):string' 'filename ~ "*stat"'!
    Tracing kprobe myopen. Ctrl-C to end.!
    postgres-1172 [000] d... 6594028.787166: open: (do_sys_open
    +0x0/0x220) filename="pg_stat_tmp/pgstat.stat"!
    postgres-1172 [001] d... 6594028.797410: open: (do_sys_open
    +0x0/0x220) filename="pg_stat_tmp/pgstat.stat"!
    postgres-1172 [001] d... 6594028.797467: open: (do_sys_open
    +0x0/0x220) filename="pg_stat_tmp/pgstat.stat”!
    ^C!
    Ending tracing...!
    • Add	
      -­‐s	
      for	
      stack	
      traces;	
      -­‐p	
      for	
      PID	
      filter	
      in-­‐kernel.	
      
    • Quickly	
      confirm	
      kernel	
      behavior;	
      eg:	
      did	
      a	
      
    tunable	
      take	
      effect?	
      
    
slide 75:
    Imagine	
      Linux	
      with	
      Tracing	
      
    • These	
      tools	
      aren’t	
      using	
      dtrace4linux,	
      SystemTap,	
      
    ktap,	
      or	
      any	
      other	
      add-­‐on	
      tracer	
      
    • These	
      tools	
      use	
      exis.ng	
      Linux	
      capabili.es	
      
    – No	
      extra	
      kernel	
      bits,	
      not	
      even	
      kernel	
      debuginfo	
      
    – Just	
      Linux’s	
      built-­‐in	
      8race	
      profiler	
      
    – Demoed	
      on	
      Linux	
      3.2	
      
    • Solving	
      real	
      issues	
      now	
      
    
slide 76:
    crace	
      
    • Added	
      by	
      Steven	
      Rostedt	
      and	
      others	
      
    since	
      2.6.27	
      
    • Already	
      enabled	
      on	
      our	
      servers	
      (3.2+)	
      
    – CONFIG_FTRACE,	
      CONFIG_FUNCTION_PROFILER,	
      …	
      
    – Use	
      directly	
      via	
      /sys/kernel/debug/tracing	
      
    • My	
      front-­‐end	
      tools	
      to	
      aid	
      usage	
      
    – hjps://github.com/brendangregg/perf-­‐tools	
      
    – Unsupported	
      hacks:	
      see	
      WARNINGs	
      
    – Also	
      see	
      the	
      trace-­‐cmd	
      front-­‐end,	
      as	
      well	
      as	
      perf	
      
    • lwn.net:	
      “Ftrace:	
      The	
      Hidden	
      Light	
      Switch”	
      
    
slide 77:
    My	
      perf-­‐tools	
      (so	
      far…)	
      
    
slide 78:
    Tracing	
      Summary	
      
    • crace	
      
    • perf_events	
      
    • eBPF	
      
    • SystemTap	
      
    • ktap	
      
    • LTTng	
      
    • dtrace4linux	
      
    • sysdig	
      
    
slide 79:
    perf_events	
      
    • aka	
      “perf”	
      command	
      
    • In	
      Linux.	
      Add	
      from	
      linux-­‐tools-­‐common,	
      …	
      
    • Powerful	
      mulR-­‐tool	
      and	
      profiler	
      
    – interval	
      sampling,	
      CPU	
      performance	
      counter	
      events	
      
    – user	
      and	
      kernel	
      dynamic	
      tracing	
      
    – kernel	
      line	
      tracing	
      and	
      local	
      variables	
      (debuginfo)	
      
    – kernel	
      filtering,	
      and	
      in-­‐kernel	
      counts	
      (perf	
      stat)	
      
    • Not	
      very	
      programmable,	
      yet	
      
    – limited	
      kernel	
      summaries.	
      May	
      improve	
      with	
      eBPF.	
      
    
slide 80:
    perf_events	
      Example	
      
    # perf record –e skb:consume_skb -ag!
    ^C[ perf record: Woken up 1 times to write data ]!
    [ perf record: Captured and wrote 0.065 MB perf.data (~2851 samples) ]!
    # perf report!
    [...]!
    74.42% swapper [kernel.kallsyms] [k] consume_skb!
    --- consume_skb!
    arp_process!
    arp_rcv!
    __netif_receive_skb_core!
    __netif_receive_skb!
    netif_receive_skb!
    virtnet_poll!
    net_rx_action!
    __do_softirq!
    irq_exit!
    do_IRQ!
    ret_from_intr!
    default_idle!
    cpu_idle!
    start_secondary!
    […]!
    Summarizing	
      stack	
      
    traces	
      for	
      a	
      tracepoint	
      
    	
      
    perf_events	
      can	
      do	
      
    many	
      things	
      –	
      hard	
      to	
      
    pick	
      just	
      one	
      example	
      	
      
    
slide 81:
    eBPF	
      
    • Extended	
      BPF:	
      programs	
      on	
      tracepoints	
      
    – High	
      performance	
      filtering:	
      JIT	
      
    – In-­‐kernel	
      summaries:	
      maps	
      
    • Linux	
      in	
      3.18?	
      Enhance	
      perf_events/crace/…?	
      
    # ./bitesize 1!
    writing bpf-5 ->gt; /sys/kernel/debug/tracing/events/block/block_rq_complete/filter!
    I/O sizes:!
    Kbytes
    : Count!
    4 ->gt; 7
    : 131!
    8 ->gt; 15
    : 32!
    in-­‐kernel	
      summary	
      
    16 ->gt; 31
    : 1!
    32 ->gt; 63
    : 46!
    64 ->gt; 127
    : 0!
    128 ->gt; 255
    : 15!
    […]!
    
slide 82:
    SystemTap	
      
    • Fully	
      programmable,	
      fully	
      featured	
      
    • Compiles	
      tracing	
      programs	
      into	
      kernel	
      modules	
      
    – Needs	
      a	
      compiler,	
      and	
      takes	
      Rme	
      
    • “Works	
      great	
      on	
      Red	
      Hat”	
      
    – I	
      keep	
      trying	
      on	
      other	
      distros	
      and	
      have	
      hit	
      trouble	
      in	
      
    the	
      past;	
      make	
      sure	
      you	
      are	
      on	
      the	
      latest	
      version.	
      
    – I’m	
      liking	
      it	
      a	
      bit	
      more	
      acer	
      finding	
      ways	
      to	
      use	
      it	
      
    without	
      kernel	
      debuginfo	
      (a	
      difficult	
      requirement	
      in	
      
    our	
      environment).	
      Work	
      in	
      progress.	
      
    • Ever	
      be	
      mainline?	
      
    
slide 83:
    ktap	
      
    • Sampling,	
      staRc	
      &	
      dynamic	
      tracing	
      
    • Lightweight,	
      simple.	
      Uses	
      bytecode.	
      
    • Suited	
      for	
      embedded	
      devices	
      
    • Development	
      appears	
      suspended	
      acer	
      
    suggesRons	
      to	
      integrate	
      with	
      eBPF	
      (which	
      itself	
      is	
      
    in	
      development)	
      
    • ktap	
      +	
      eBPF	
      would	
      be	
      awesome:	
      easy,	
      
    lightweight,	
      fast.	
      Likely?	
      
    
slide 84:
    sysdig	
      
    • sysdig:	
      InnovaRve	
      new	
      tracer.	
      Simple	
      expressions:	
      
    sysdig fd.type=file and evt.failed=true!
    sysdig evt.type=open and fd.name contains /etc!
    sysdig -p"%proc.name %fd.name" "evt.type=accept and proc.name!=httpd”!
    • Replacement	
      for	
      strace?	
      (or	
      “perf	
      trace”	
      will)	
      
    • Programmable	
      “chisels”.	
      Eg,	
      one	
      of	
      mine:	
      
    # sysdig -c fileslower 1!
    TIME
    PROCESS
    2014-04-13 20:40:43.973 cksum
    2014-04-13 20:40:44.187 cksum
    2014-04-13 20:40:44.689 cksum
    […]!
    	
      
    TYPE
    read
    read
    read
    LAT(ms) FILE!
    2 /mnt/partial.0.0!
    1 /mnt/partial.0.0!
    2 /mnt/partial.0.0!
    • Currently	
      syscalls	
      and	
      user-­‐level	
      processing	
      only.	
      It	
      is	
      
    opRmized,	
      but	
      I’m	
      not	
      sure	
      it	
      can	
      be	
      enough	
      for	
      kernel	
      tracing	
      
    
slide 85:
    Present	
      &	
      Future	
      
    • Present:	
      
    – crace	
      can	
      serve	
      many	
      needs	
      today	
      
    – perf_events	
      some	
      more,	
      esp.	
      with	
      debuginfo	
      
    – ad	
      hoc	
      SystemTap,	
      ktap,	
      …	
      as	
      needed	
      
    • Future:	
      
    – crace/perf_events/ktap	
      with	
      eBPF,	
      for	
      a	
      fully	
      
    featured	
      and	
      mainline	
      tracer?	
      
    – One	
      of	
      the	
      other	
      tracers	
      going	
      mainline?	
      
    
slide 86:
    The	
      Tracing	
      Landscape,	
      Oct	
      2014	
      
    (less	
      brutal)	
      
    (my	
      opinion)	
      
    Ease	
      of	
      use	
      
    sysdig	
      
    perf	
      
    stap	
      
    crace	
      
    (alpha)	
      
    (brutal)	
      
    dtrace4L.
    ktap	
      
    (mature)	
      
    Stage	
      of	
      
    Development	
      
    eBPF	
      
    Scope	
      &	
      Capability	
      
    
slide 87:
    In	
      Summary	
      
    
slide 88:
    In	
      Summary…	
      
    • Plus	
      diagrams	
      for	
      benchmarking,	
      tuning,	
      tracing	
      
    • Try	
      to	
      start	
      with	
      the	
      quesRons	
      (methodology),	
      to	
      
    help	
      guide	
      your	
      use	
      of	
      the	
      tools	
      
    • I	
      hopefully	
      turned	
      some	
      unknown	
      unknowns	
      into	
      
    known	
      unknowns	
      
    
slide 89:
    References	
      &	
      Links	
      
    Systems	
      Performance:	
      Enterprise	
      and	
      the	
      Cloud,	
      PrenRce	
      Hall,	
      2014	
      
    hjp://www.brendangregg.com/linuxperf.html	
      
    hjp://www.brendangregg.com/perf.html#FlameGraphs	
      
    nicstat:	
      hjp://sourceforge.net/projects/nicstat/	
      
    Rptop:	
      hjp://Rptop.gforge.inria.fr/	
      
    • Tiptop:	
      Hardware	
      Performance	
      Counters	
      for	
      the	
      Masses,	
      Erven	
      Rohou,	
      Inria	
      
    Research	
      Report	
      7789,	
      Nov	
      2011.	
      
    crace	
      &	
      perf-­‐tools	
      
    • hjps://github.com/brendangregg/perf-­‐tools	
      
    • hjp://lwn.net/ArRcles/608497/	
      
    MSR	
      tools:	
      hjps://github.com/brendangregg/msr-­‐cloud-­‐tools	
      
    pcstat:	
      hjps://github.com/tobert/pcstat	
      
    eBPF:	
      hjp://lwn.net/ArRcles/603983/	
      
    ktap:	
      hjp://www.ktap.org/	
      
    SystemTap:	
      hjps://sourceware.org/systemtap/	
      
    sysdig:	
      hjp://www.sysdig.org/	
      
    hjp://www.slideshare.net/brendangregg/linux-­‐performance-­‐analysis-­‐and-­‐tools	
      
    Tux	
      by	
      Larry	
      Ewing;	
      Linux®	
      is	
      the	
      registered	
      trademark	
      of	
      Linus	
      Torvalds	
      in	
      the	
      U.S.	
      
    and	
      other	
      countries.	
      
    
slide 90:
    Thanks	
      
    • QuesRons?	
      
    • hjp://slideshare.net/brendangregg	
      	
      
    • hjp://www.brendangregg.com	
      
    • bgregg@ne8lix.com	
      
    • @brendangregg