IntelON2021_ProcessorBenchmarking.pdf

Intel InnovatiON 2021: Processor Benchmarking

Case study by Brendan Gregg (Netflix) for Intel InnovatiON 2021

Video: https://www.youtube.com/watch?v=-Y4pDLhqKI4

Description: "A short summary of processor benchmarking for the Netflix cloud by Brendan Gregg: a case study of misleading results, and methodologies to do accurate benchmarking."

	next prev 1/17
	next prev 2/17
	next prev 3/17
	next prev 4/17
	next prev 5/17
	next prev 6/17
	next prev 7/17
	next prev 8/17
	next prev 9/17
	next prev 10/17
	next prev 11/17
	next prev 12/17
	next prev 13/17
	next prev 14/17
	next prev 15/17
	next prev 16/17
	next prev 17/17

PDF: IntelON2021_ProcessorBenchmarking.pdf

Keywords (from pdftotext):

slide 1:

Processor
Benchmarking
Brendan Gregg
Senior Performance Engineer
IntelON, Oct 2021

slide 2:

Case Study (2021)
New processor
Popular CPU benchmark: 2.6x faster than Intel
What would you do?

slide 3:

~100% of benchmarks are wrong

slide 4:

Active Benchmarking
Low-level analysis while it is still running
Not just statistical analysis of the results

slide 5:

Flame Graphs
Showed CPU time was
in a single function
Flame Graphs are now in Intel vTune!

slide 6:

Instruction-Level Profiling...

slide 7:

linux$ perf top -e cycles:ppp -p 18641
Samples: 274K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 61489970617
for(l = 2; l gt;
if (c % l == 0)
0.15 │20296:
test
$0x1,%bl
0.15 │20299: ↑ je
20270 gt;
for(l = 2; l gt;
│202a2:
nopw
0x0(%rax,%rax,1)
3.57 │202a8:
pxor
%xmm0,%xmm0
0.21 │202ac:
cvtsi2sd %rcx,%xmm0
0.26 │202b1:
comisd
%xmm0,%xmm1
3.51 │202b5: ↑ jb
20270 gt;
if (c % l == 0)
0.09 │202b7:
mov
%rbx,%rax
0.02 │202ba:
xor
%edx,%edx
85.00 │202bc:
div
%rcx
0.12 │202bf:
test
%rdx,%rdx

slide 8:

linux$ perf top -e cycles:ppp -p 18641
Samples: 274K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 61489970617
for(l = 2; l gt;
if (c % l == 0)
0.15 │20296:
test
$0x1,%bl
0.15 │20299: ↑ je
20270 gt;
for(l = 2; l gt;
│202a2:
nopw
0x0(%rax,%rax,1)
3.57 │202a8:
pxor
%xmm0,%xmm0
0.21 │202ac:
cvtsi2sd %rcx,%xmm0
0.26 │202b1:
comisd
%xmm0,%xmm1
3.51 │202b5: ↑ jb
20270 gt;
if (c % l == 0)
0.09 │202b7:
mov
%rbx,%rax
0.02 │202ba:
xor
%edx,%edx
85.00 │202bc:
div
%rcx
0.12 │202bf:
test
%rdx,%rdx
85% of cycles in
the div instruction

slide 9:

Instruction-level Analysis
● Determined it’s really a div benchmark
● Other processor has a faster div

slide 10:

Netflix Cloud
●

slide 11:

Challenges
● This benchmark is widely used
● Cycle analysis is nearly impossible in the cloud
Under hypervisors: Limited PMCs; no PEBS
● Accurate benchmarking needs senior engineers

slide 12:

~100% of benchmarks are wrong

slide 13:

My Benchmarking Checklist
Why not double?
Was it tuned?
Did it break limits?
Did it error?
Does it reproduce?
Does it matter?
Did it even happen?
https://www.brendangregg.com/blog/2018-06-30/benchmarking-checklist.html

slide 14:

An Exciting New Era of
Processor Innovation
Vertical stacking, new capabilities
More processors & competition

slide 15:

But also a Challenging New Era of
Processor Benchmarking
Increased demand
Hard to do debug in the cloud
Popular benchmarks can be wrong

slide 16:

Good benchmarking
drives innovation

slide 17:

Thank you.
Brendan Gregg
@brendangregg