Intel InnovatiON 2021: Processor Benchmarking
Case study by Brendan Gregg (Netflix) for Intel InnovatiON 2021Video: https://www.youtube.com/watch?v=-Y4pDLhqKI4
Description: "A short summary of processor benchmarking for the Netflix cloud by Brendan Gregg: a case study of misleading results, and methodologies to do accurate benchmarking."
next prev 1/17 | |
next prev 2/17 | |
next prev 3/17 | |
next prev 4/17 | |
next prev 5/17 | |
next prev 6/17 | |
next prev 7/17 | |
next prev 8/17 | |
next prev 9/17 | |
next prev 10/17 | |
next prev 11/17 | |
next prev 12/17 | |
next prev 13/17 | |
next prev 14/17 | |
next prev 15/17 | |
next prev 16/17 | |
next prev 17/17 |
PDF: IntelON2021_ProcessorBenchmarking.pdf
Keywords (from pdftotext):
slide 1:
Processor Benchmarking Brendan Gregg Senior Performance Engineer IntelON, Oct 2021slide 2:
Case Study (2021) New processor Popular CPU benchmark: 2.6x faster than Intel What would you do?slide 3:
~100% of benchmarks are wrongslide 4:
Active Benchmarking Low-level analysis while it is still running Not just statistical analysis of the resultsslide 5:
Flame Graphs Showed CPU time was in a single function Flame Graphs are now in Intel vTune!slide 6:
Instruction-Level Profiling...slide 7:
linux$ perf top -e cycles:ppp -p 18641 Samples: 274K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 61489970617 for(l = 2; lslide 8:gt; if (c % l == 0) 0.15 │20296: test $0x1,%bl 0.15 │20299: ↑ je 20270 gt; for(l = 2; l gt; │202a2: nopw 0x0(%rax,%rax,1) 3.57 │202a8: pxor %xmm0,%xmm0 0.21 │202ac: cvtsi2sd %rcx,%xmm0 0.26 │202b1: comisd %xmm0,%xmm1 3.51 │202b5: ↑ jb 20270 gt; if (c % l == 0) 0.09 │202b7: mov %rbx,%rax 0.02 │202ba: xor %edx,%edx 85.00 │202bc: div %rcx 0.12 │202bf: test %rdx,%rdx
linux$ perf top -e cycles:ppp -p 18641 Samples: 274K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 61489970617 for(l = 2; lslide 9:gt; if (c % l == 0) 0.15 │20296: test $0x1,%bl 0.15 │20299: ↑ je 20270 gt; for(l = 2; l gt; │202a2: nopw 0x0(%rax,%rax,1) 3.57 │202a8: pxor %xmm0,%xmm0 0.21 │202ac: cvtsi2sd %rcx,%xmm0 0.26 │202b1: comisd %xmm0,%xmm1 3.51 │202b5: ↑ jb 20270 gt; if (c % l == 0) 0.09 │202b7: mov %rbx,%rax 0.02 │202ba: xor %edx,%edx 85.00 │202bc: div %rcx 0.12 │202bf: test %rdx,%rdx 85% of cycles in the div instruction
Instruction-level Analysis ● Determined it’s really a div benchmark ● Other processor has a faster divslide 10:
Netflix Cloud ●slide 11: Challenges ● This benchmark is widely used ● Cycle analysis is nearly impossible in the cloud Under hypervisors: Limited PMCs; no PEBS ● Accurate benchmarking needs senior engineersslide 12:~100% of benchmarks are wrongslide 13:My Benchmarking Checklist Why not double? Was it tuned? Did it break limits? Did it error? Does it reproduce? Does it matter? Did it even happen? https://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.htmlslide 14:An Exciting New Era of Processor Innovation Vertical stacking, new capabilities More processors & competitionslide 15:But also a Challenging New Era of Processor Benchmarking Increased demand Hard to do debug in the cloud Popular benchmarks can be wrongslide 16:Good benchmarking drives innovationslide 17:Thank you. Brendan Gregg @brendangregg