Jobs

Site Reliability Engineer


Overview

Yelp trademark rgb outline
Apply for position

Details

Our Site Reliability Engineers are the primary interface between Yelp’s developers and production infrastructure. No matter how many times we get searched, scraped, scanned, spammed, pinged, paged, or queried, we gotta keep our cool - and keep the site running smoothly.

We work in both the dev and systems worlds, implementing key parts of the core architecture and supporting devs as they try to do the same. We get to tackle interesting challenges that you can only find at the kind of scale that serves over 100 million users per month.

With us, you'll build monitoring and alerting systems to support site stability and performance. You'll proactively scale our AWS infrastructure to meet an ever-increasing demand. You'll make sure that when something goes bump in the night, someone hears it. In short, you'll play a key role in keeping Yelp fast, available, and growing, connecting users to great local businesses.

What You Will Do:

- Work closely with developers in supporting new features and services

- Build tools to monitor site stability and performance

- Help scale our AWS-based infrastructure (no racking servers or swapping hard drives here!)

- Work with our open source platform as a service, PaaSTA

- Troubleshoot site issues using industry-leading tools like Splunk and SignalFX

- Automate everything with Puppet, Git, Jenkins, and Terraform

- Develop custom tools when off-the-shelf solutions don’t work at our scale and contribute upstream to open source projects

- Design new systems, tests, and procedures

- Participate in light on-call rotations - we have geographically distributed SRE teams for follow-the-sun support, which means no 2:00 AM pages!

What We Are Looking For:

- Mastery of Linux (we use Ubuntu)

- Command of your favourite modern programming language: Python, Ruby, Go, Rust, Java, C++, etc.

- Solid understanding of fundamental technologies like TCP/IP, HTTP, and DNS

- Knowledge of best practices related to security, performance, and disaster recovery

- Experience with web server configuration (Apache/Nginx/HAproxy), monitoring, trending, network design, and high availability

- Strong scripting and automation skills

- Expertise in Configuration Management (Puppet/Ansible/Chef/etc.)

- Experience with public cloud platforms (we use AWS, but Azure/GCP are fine) and related tooling (Terraform, etc.)

- Experience with Docker or other container technologies

- Excellent communication and documentation skills

What We Offer:

- Private health insurance

- Monthly fitness/gym subsidy

- Income protection, disability, and life insurance

- Flexible working hours and meeting-free Thursdays

- Fully adjustable sit/standing desks

- Regular 2-day Hackathons and weekly learning groups, always with interesting topics

- Central London location, close to public transport and restaurants

- Quarterly team offsites

Apply for position

Don't See your Job?

Create a free job listing on Tech.London

Submit a Job

Tech.London
Weekly

Get London tech news, jobs, and a preview of upcoming events straight to your inbox each week for free.

Keep me updated

Tech.London Weekly

×