Systems Performance 2nd Ed.



BPF Performance Tools book

Recent posts:
Blog index
About
RSS

Working at Netflix

20 Jan 2015

I've been at Netflix now for several months, and have found it to be an amazing place to work. What has surprised me most is the culture, how different it is to other companies, and how well it works.

In this post I'll describe my experience at Netflix: starting with recruiting, the culture, my work as a performance architect, and finally our mission. I'm excited by what we are doing as a company, and I hope this can (or can continue to) inspire positive changes across our industry.

I hope to also address the questions I'm frequently asked nowadays, including: “Is the culture deck true?", and “How is performance work at Netflix?".

No one at Netflix asked me to write this, and these are my own opinions.

Recruiting

My first direct experience with Netflix was the hiring process. Netflix recruiting is outstanding. I was scheduled quickly and with the right people: my would-be manager, co-workers, and higher management. The focus was on finding out if we were a good fit for each other.

I had had quite different experiences interviewing with some other companies. One major tech company was not just initially slow to interview, but focused the interviews on topics unrelated to my expected role. I was told that the hiring process had known issues, which couldn't be fixed. Really? If hiring is broken, then what else is broken? And who would my co-workers be, who passed an unrelated interview? This was my first exposure to the company, and it showed a culture of "stupid things happen, we know they’re stupid, and they can't be fixed". (I don't even think the recruiters were to blame; this was a company problem.)

The question of compensation was also handled differently at Netflix. In many hiring negotiations, both parties try to bluff their way around this topic. With Netflix, I was encouraged to find out my market worth and discuss it with them, so that we could both agree on what constituted a good salary. They also did their own research by collecting data from candidates and other companies throughout the hiring process, to ensure the offer was top of market. I think some other companies are terrified of staff discovering what they are really worth! But this openness and honesty was characteristic of the Netflix recruiting process.

As part of the recruiting process, I was encouraged to study the culture deck. I did, and found it attractive.

Culture

Many companies talk about about how great their culture is, but this is more aspirational than reality. After joining the company, the real culture is learned by word of mouth, or trial and error. However, with Netflix, the culture deck is true. I think its emphasis in the recruitment stage reinforces this, as everyone learns how to act before they walk in the door.

One of the key principles in the culture is "freedom and responsibility": you have the freedom to do the right thing – provided you take responsibility. Management’s role is to provide context and help, not be an obstacle. This is awesome: I have a long list of projects I want to do that will help Netflix, and I'm in an environment that helps me do it.

But how does this work? It's not freedom gone wild. You could introduce a new technology, for example, provided you plan who will service and maintain it, how it's debugged when it fails, how fault tolerance works, etc. Freedom and responsibility.

This works because Netflix hires professionals: people who have good judgement to start with and who use freedom wisely. People with self-discipline. This doesn't mean we're always right: freedom and responsibility includes having the courage and curiosity to try risky projects, when it makes sense to do so, as these can lead to important innovations. The focus is always on making a positive impact for the company.

Apart from professionalism, Netflix also aims to hire high performers. People who are self-driven, highly productive, and who also work well with others. This means no "brilliant jerks", who are explicitly not welcome (this has been described as a polite version of the "no asshole rule" [1]).

There's much more in the culture deck. I may be able to help explain it by describing what I think Netflix is not:

  • It's not the kind of company where known stupid things happen that can't be fixed (like the flawed recruiting process, or the antics parodied by Dilbert).
  • It's not the kind of company where managers or departments hate each other, and use their roles to conduct political warfare.
  • It's not the kind of company where bad mistakes are frequently made, and no one is held accountable.
  • It's not the kind of company where all effort is on fire-fighting, and little on fire-proofing.
  • It's not the kind of company that locks its engineers in the basement, for fear of them being poached.
  • It's not a company where an unhealthy work/life balance is either encouraged or necessary.
  • It's not a company suffering Not Invented Here syndrome, No Bad News syndrome, or groupthink.
  • And it's not a company where waste or insanity run rampant.

It's like... a company run by adults instead of children! (Adultlike behavior has even been described previously as a tenet of hiring: "hire, reward, and tolerate only fully formed adults" [2].)

While the Netflix culture attracted me, it doesn't suit everyone, as described on slide 38. And that's ok. Part of being open (and honest) about our culture is that it helps people self-select.

Performance Engineering

There are a few companies doing important work in performance, and Netflix is one of them. With over 50 million subscribers, the largest cloud environment, and handling over a third of the US Internet traffic at night, there are numerous opportunities for performance engineering. This involves not just applying existing practices, but developing the state of the art.

I'm working on many technologies: AWS, Linux, FreeBSD, Java, Node.js, Perl, Python, Cassandra, Nginx, ftrace, perf_events, and eBPF to name a few. There are many technologies we've developed ourselves, which are typically open sourced, including the rxNetty reactive framework, Atlas for performance monitoring, and (coming up) Vector for instance analysis. There's also hardware performance work (for the FreeBSD appliances) and capacity planning.

My day-to-day work includes resolving immediate issues of poor performance, and short- and long-term projects. We're using Linux on our cloud, where advanced performance analysis tools have historically been lacking. I have the freedom to figure out what best to do, which has included developing ftrace and perf_events performance analysis tools, for short-term wins (perf-tools); Java hotspot hacking, for short- and long-term wins; and eBPF testing, for long-term wins. I'll be putting some time into the other tracers (SystemTap, ktap, LTTng), to find wins from them, too.

The FreeBSD Open Connect Appliances, our CDN which streams the actual content, are amazing to work on. FreeBSD provides the most advanced performance analysis environment, including many standard tools as well as pmcstat and DTrace (I summarized these in my MeetBSDCA 2014 talk: slides, video). I was delighted to recently develop CPI flame graphs on the OCAs, using pmcstat.

Other specific work I've been posting here on my blog. In particular, From Clouds to Roots shows the full performance analysis process for our cloud, from cloud-wide to instance-level tools. I've found analysis of Linux instances to be easier than I’d feared, thanks to ftrace and perf_events being built into the kernel. The biggest challenge has been determining low-level CPU behavior in the cloud, without current access to CPU performance counters. However, I've made some progress using MSRs instead.

I'm part of a great team, Performance and Reliability Engineering. I'm not just applying my skills, but learning more from other colleagues, and developing those skills in our environment. We're also expanding our team, and hiring performance and systems engineering roles (if interested, contact Coburn, our manager).

Mission

Netflix’s mission is to change how entertainment is consumed worldwide, by building a good product that people choose to buy. It's exciting: we are pioneering the modern age of entertainment, and taking on all the technical and political challenges that this involves.

We aren't building a technical facade, where the real mission is to sell the company. We aren't winning by unsavory sales, legal, or marketing tactics. And our customers are not unwittingly themselves the product (we don't read their private emails). We want to win by making a product so good that people choose to buy it. We're an honest company.

Conclusion

When I joined Netflix, I expected many aspects to be excellent: the technical challenges, my colleagues, the compensation, the mission, and the ambition of the company – and they are. What surprised me most by its excellence was the culture. It's made me think differently about our industry: Netflix is proving that company culture doesn't just have to be accepted: it can be engineered to be positive.

In this post I described my opinions of Netflix after ten months at the company. I'm motivated to write because I think the topic of positive company cultures is worth discussing. For more about the Netflix culture, apart from the culture deck, you can read how Netflix reinvented HR, behind the slides, and the woman behind the Netflix culture doc. Maybe you can experience this culture directly (we are hiring), or perhaps more of our industry can adopt or engineer their own positive cultures.

UPDATE: Working at Netflix 2016.

UPDATE: Working at Netflix 2017.

Update: Many people have been emailing me their resumes. I'm glad they are interested in working at Netflix, but I'm an engineer, not a hiring manager. Please use jobs.netflix.com, which will send your details to the right people.



Click here for Disqus comments (ad supported).