AWS re:Invent 2017: How Netflix Tunes EC2

My last talk for 2017 was at AWS re:Invent, on "How Netflix Tunes EC2 Instances for Performance," an updated version of my 2014 talk. There was so much demand for it this year that I had three overflow rooms streaming it, and people still couldn't get in. (I shouldn't let this go to my head, as there were 42,000 attendees at re:Invent looking for something to see!) Fortunately, it was videoed for those who missed it.

A video of the talk is on youtube:

Here are the slides or as a PDF:

/ permalink/zoom

I love this talk as I get to share more about what the Performance and Operating Systems team at Netflix does, rather than just my work. Our team looks after the BaseAMI, kernel tuning, OS performance tools and profilers, and self-service tools like Vector. We're not the only people doing performance and performance tuning at Netflix either: all the development teams do performance work. We help where we can.

My talk included a section on Linux kernel tunables, as follows. WARNING: These tunables were developed in late 2017, for Ubuntu Xenial instances on EC2.

CPU

schedtool –B PID

Virtual Memory

vm.swappiness = 0       # from 60

Huge Pages

# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled

NUMA

kernel.numa_balancing = 0

File System

vm.dirty_ratio = 80                     # from 40
vm.dirty_background_ratio = 5           # from 10
vm.dirty_expire_centisecs = 12000       # from 3000
mount -o defaults,noatime,discard,nobarrier …

Storage I/O

/sys/block/*/queue/rq_affinity  2
/sys/block/*/queue/scheduler        noop
/sys/block/*/queue/nr_requests  256
/sys/block/*/queue/read_ahead_kb    256
mdadm –chunk=64 ...

Networking

net.core.somaxconn = 1000
net.core.netdev_max_backlog = 5000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_wmem = 4096 12582912 16777216
net.ipv4.tcp_rmem = 4096 12582912 16777216
net.ipv4.tcp_max_syn_backlog = 8096
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_abort_on_overflow = 1    # maybe

Hypervisor (Xen)

echo tsc > /sys/devices/system/clocksource/clocksource0/current_clocksource

Not a lot has changed with these tunables since my 2014 talk.

What I was most excited for was the launch of a new EC2 hypervisor, which I referred to in the video as the "c5 hypervisor". Later that night the real name was released: the "Nitro" hypervisor, as well as the bare metal instance type. My last post on Introducing Nitro explained it and the hypervisor development journey.

Many other Netflix staff spoke at re:Invent (list here). Here are the talks from my immediate colleagues in building F, level 2 at Netflix:

Vadim Filanovsky (perf team) co-presented Auto Scaling Made Easy: How Target Tracking Scaling Policies Hit the Bullseye
Dave Hahn (CORE team) gave an updated A Day in the Life of a Netflix Engineer III
Nora Jones (chaos team) gave a keynote on Why We Need More Chaos - Chaos Engineering, That Is, as well as a talk Performing Chaos at Netflix Scale
Casey Rosenthal (traffic and chaos) Models of Availability
John Bennett (networking) co-presented How Netflix Monitors Applications in Near Real-Time with Amazon Kinesis
Donovan Fritz and Joel Kodama (networking) A Day in the Life of a Cloud Network Engineer at Netflix
Alex Maestretti (security) co-presented SecOps 2021 Today: Using AWS Services to Deliver SecOps
Will Bengtson (security) co-presented Best Practices for Managing Security Operations on AWS
Patrick Kelley and Travis McPeak (security) Using Access Advisor to Strike the Balance Between Security and Usability
Andrew Spyker (Titus) co-presented Elastic Load Balancing Deep Dive and Best Practices
Andrew Park and Sebastien de Larquier Tooling Up for Efficiency: DIY Solutions @ Netflix
Rajan Mittal and Andrew Park Why Regional Reserved Instances Are a Game Changer for Netflix
Monal Daxini Netflix Keystone SPaaS: Real-time Stream Processing as a Service
Our department director Coburn Watson Walking the tightrope: Balancing Innovation, Reliability, Security, and Efficiency

Check them out. It's awesome to see my coworkers on the big stage doing great!

Brendan Gregg's Blog

AWS re:Invent 2017: How Netflix Tunes EC2