A large and unexpected opportunity has come my way outside of Netflix that I've decided to try. Netflix has been the best job of my career so far, and I'll miss my colleagues and the culture.
desk (2020); office wall
I joined Netflix in 2014, a company at the forefront of cloud computing with an attractive work culture. It was the most challenging job among those I interviewed for. On the Netflix Java/Linux/EC2 stack there were no working mixed-mode flame graphs, no production safe dynamic tracer, and no PMCs: All tools I used extensively for advanced performance analysis. How would I do my job? I realized that this was a challenge I was best suited to fix. I could help not only Netflix but all customers of the cloud.
Since then I've done just that. I developed the original JVM changes to allow mixed-mode flame graphs, I pioneered using eBPF for observability and helped develop the front-ends and tools, and I worked with Amazon to get PMCs enabled and developed tools to use them. Low-level performance analysis is now possible in the cloud, and with it I've helped Netflix save a very large amount of money, mostly from service teams using flame graphs. There is also now a flourishing industry of observability products based on my work.
Apart from developing tools, much of my time has been spent helping teams with performance issues and evaluations. The Netflix stack is more diverse than I was expecting, and is explained in detail in the Netflix tech blog: The production cloud is AWS EC2, Ubuntu Linux, Intel x86, mostly Java with some Node.js (and other languages), microservices, Cassandra (storage), EVCache (caching), Spinnaker (deployment), Titus (containers), Apache Spark (analytics), Atlas (monitoring), FlameCommander (profiling), and at least a dozen more applications and workloads (but no 3rd party agents in the BaseAMI). The Netflix CDN runs FreeBSD and NGINX (not Linux: I published a Netflix-approved footnote in my last book to explain why). This diverse environment has always provided me with interesting things to explore, to understand, analyze, debug, and improve.
I've also used and helped develop many other technologies for debugging, primarily perf, Ftrace, eBPF (bcc and bpftrace), PMCs, MSRs, Intel vTune, and of course, flame graphs and heat maps. Martin Spier and I also created Flame Scope while at Netflix, to analyze perturbations and variation in profiles.
I've also had the chance to do other types of work. For 18 months I joined the CORE SRE team rotation, and was the primary contact for Netflix outages. It was difficult and fascinating work. I've also created internal training materials and classes, apart from my books. I've worked with awesome colleagues not just in cloud engineering, but also in open connect, studio, DVD, NTech, talent, immigration, HR, PR/comms, legal, and most recently ANZ content.
Last time I quit a job, I wanted to share publicly the reasons why I left, but I ultimately did not. I've since been asked many times why I resigned that job (not unlike The Prisoner) along with much speculation (none true). I wouldn't want the same thing happening here, and having people wondering if something bad happened at Netflix that caused me to leave: I had a great time and It's a great company!
I'm thankful for the opportunities and support I've had, especially from my former managers Coburn and Ed. I'm also grateful for the support for my work by other companies, technical communities, social communities (Twitter, HackerNews), conference organizers, and all who have liked my work, developed it further, and shared it with others. Thank you. I hope my last two books, Systems Performance 2nd Ed and BPF Performance Tools serve Netflix well in my absence and everyone else who reads them.
I'll still be posting here in my next job. More on that soon...