Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Intel InnovatiON 2021: Processor Benchmarking

Case study by Brendan Gregg (Netflix) for Intel InnovatiON 2021

Video: https://www.youtube.com/watch?v=-Y4pDLhqKI4

Description: "A short summary of processor benchmarking for the Netflix cloud by Brendan Gregg: a case study of misleading results, and methodologies to do accurate benchmarking."

next
prev
1/17
next
prev
2/17
next
prev
3/17
next
prev
4/17
next
prev
5/17
next
prev
6/17
next
prev
7/17
next
prev
8/17
next
prev
9/17
next
prev
10/17
next
prev
11/17
next
prev
12/17
next
prev
13/17
next
prev
14/17
next
prev
15/17
next
prev
16/17
next
prev
17/17

PDF: IntelON2021_ProcessorBenchmarking.pdf

Keywords (from pdftotext):

slide 1:
    Processor
    Benchmarking
    Brendan Gregg
    Senior Performance Engineer
    IntelON, Oct 2021
    
slide 2:
    Case Study (2021)
    New processor
    Popular CPU benchmark: 2.6x faster than Intel
    What would you do?
    
slide 3:
    ~100% of benchmarks are wrong
    
slide 4:
    Active Benchmarking
    Low-level analysis while it is still running
    Not just statistical analysis of the results
    
slide 5:
    Flame Graphs
    Showed CPU time was
    in a single function
    Flame Graphs are now in Intel vTune!
    
slide 6:
    Instruction-Level Profiling...
    
slide 7:
    linux$ perf top -e cycles:ppp -p 18641
    Samples: 274K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 61489970617
    for(l = 2; l gt;
    if (c % l == 0)
    0.15 │20296:
    test
    $0x1,%bl
    0.15 │20299: ↑ je
    20270 gt;
    for(l = 2; l gt;
    │202a2:
    nopw
    0x0(%rax,%rax,1)
    3.57 │202a8:
    pxor
    %xmm0,%xmm0
    0.21 │202ac:
    cvtsi2sd %rcx,%xmm0
    0.26 │202b1:
    comisd
    %xmm0,%xmm1
    3.51 │202b5: ↑ jb
    20270 gt;
    if (c % l == 0)
    0.09 │202b7:
    mov
    %rbx,%rax
    0.02 │202ba:
    xor
    %edx,%edx
    85.00 │202bc:
    div
    %rcx
    0.12 │202bf:
    test
    %rdx,%rdx
    
slide 8:
    linux$ perf top -e cycles:ppp -p 18641
    Samples: 274K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 61489970617
    for(l = 2; l gt;
    if (c % l == 0)
    0.15 │20296:
    test
    $0x1,%bl
    0.15 │20299: ↑ je
    20270 gt;
    for(l = 2; l gt;
    │202a2:
    nopw
    0x0(%rax,%rax,1)
    3.57 │202a8:
    pxor
    %xmm0,%xmm0
    0.21 │202ac:
    cvtsi2sd %rcx,%xmm0
    0.26 │202b1:
    comisd
    %xmm0,%xmm1
    3.51 │202b5: ↑ jb
    20270 gt;
    if (c % l == 0)
    0.09 │202b7:
    mov
    %rbx,%rax
    0.02 │202ba:
    xor
    %edx,%edx
    85.00 │202bc:
    div
    %rcx
    0.12 │202bf:
    test
    %rdx,%rdx
    85% of cycles in
    the div instruction
    
slide 9:
    Instruction-level Analysis
    ● Determined it’s really a div benchmark
    ● Other processor has a faster div
    
slide 10:
    Netflix Cloud
    ● 
slide 11:
    Challenges
    ● This benchmark is widely used
    ● Cycle analysis is nearly impossible in the cloud
    Under hypervisors: Limited PMCs; no PEBS
    ● Accurate benchmarking needs senior engineers
    
slide 12:
    ~100% of benchmarks are wrong
    
slide 13:
    My Benchmarking Checklist
    Why not double?
    Was it tuned?
    Did it break limits?
    Did it error?
    Does it reproduce?
    Does it matter?
    Did it even happen?
    https://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html
    
slide 14:
    An Exciting New Era of
    Processor Innovation
    Vertical stacking, new capabilities
    More processors & competition
    
slide 15:
    But also a Challenging New Era of
    Processor Benchmarking
    Increased demand
    Hard to do debug in the cloud
    Popular benchmarks can be wrong
    
slide 16:
    Good benchmarking
    drives innovation
    
slide 17:
    Thank you.
    Brendan Gregg
    @brendangregg