Senior Site Reliability Engineer

Tech, Full time

Posted: Thursday, 28th April 2022
Job Reference: R006481
Squad: OGM Core
Team: Tech

It's a great time to join our ambitious team and get involved with a business that is focused, fast paced and a really great place to work.

You'll be encouraged to share your ideas, try new things (and not worry if they don't go right the first time!) and take ownership of your own development. We have a training fund for you to spend each year, as well as the opportunity to earn up to 25% in performance related bonus. We also have; a beer fridge, pizza Friday's, sport always on in the office and regular social events!

Job Title: Site Reliability Engineer (Senior)

Contract type: Full time/Permanent

Reports to: Head of Platform

Department: Platform

Location: Hybrid (Remote and/or Hammersmith)

What you’ll do as a Site Reliability Engineer

  • As a Senior Site Reliability Engineer (SRE), you will ensure our customers get the best quality of service and uptime we can give them.
  • Act as a role model for the more junior members of the team, offering support as and when needed
  • Identify where we can expect and how we can tolerate IT failures from our systems as well as those we depend upon. 
  • Work closely with our developers and architects to build and run services and systems that respond consistently to failures by gracefully degrading our services.
  • Be responsible for ensuring the systems and applications we launch remain available, reliable, and efficient at accomplishing their duties even as their duties scale and evolve. 
  • To be involved in every part of our site, from the conception of products and their development to deployment, troubleshooting, and analysis. 
  • Design, build and automate tools and processes to ensure and improve scalability, reliability, availability, and performance across areas of technology. 
  • Build, integrate and run tools to inject, predict and identify infrastructure and service failures on an ongoing basis to help optimise our sites.
  • As a team member, you will contribute, oversee and implement technical projects and have a say in what Technology Oddschecker uses. 

We need someone who…

  • Has a desire to learn new technologies and apply them where appropriate to improve the quality of our software and processes.
  • Experienced at a senior/strong mid-level as an Engineer
  • Experienced working with Cloud-based providers (Google Cloud Platform/AWS/Azure)
  • Good experience working with infrastructure as a code - Terraform, Google Deployment Manager, AWS CloudFormation
  • Strong understanding of containerization technologies and orchestration (Kubernetes, Docker Swarm, Nomad)
  • UNIX/Linux systems administration background (Centos/Ubuntu)
  • Experienced with 3/4+ years’ experience troubleshooting in Unix/Linux
  • Good understanding of distributed systems, microservices architecture, and principles
  • Experience with HTTP web technologies (Tomcat/Apache/Nginx/HAProxy) and highly available, scalable web architecture
  • Experience with at least one message queue system (Kafka, Google PubSub, RabbitMQ)
  • Good understanding of database administration (MySQL, Elasticsearch)
  • Experience in at least one configuration management solution (Ansible/Puppet/Chef/SaltStack/cfengine).
  • Experience in using monitoring/observability tools (Prometheus, Grafana, Thanos, Jaeger)
  • Passionate about SRE principles (SLA, SLOs, SLIs)
  • Programming skills (Java, Go, Python, Bash)
  • Knowledge of service mesh-like (istio,linkerd) desirable

Apply for this role