Site Reliability Engineer

Product, Full time

Hammersmith
Posted: Wednesday, 20th July 2022
Job Reference: R006566
Squad: UK
Team: Product

It's a great time to join our ambitious team and get involved with a business that is focused, fast paced and a really great place to work.

You'll be encouraged to share your ideas, try new things (and not worry if they don't go right the first time!) and take ownership of your own development. We have a training fund for you to spend each year, as well as the opportunity to earn up to 25% in performance related bonus. We also have; a beer fridge, pizza Friday's, sport always on in the office and regular social events!

What you’ll do as a Site Reliability Engineer

  • As a Site Reliability Engineer (SRE), you will ensure our customers get the best quality of service and up time we can give them.
  • Act as a role model for the more junior members of the team, offering support as and when needed
  • Identify where we can expect and how we can tolerate IT failures from our systems as well as those we depend upon.
  • Work closely with our developers and architects to build and run services and systems that respond consistently to failures by gracefully degrading our services.
  • Be responsible for ensuring the systems and applications we launch remain available, reliable and efficient at accomplishing their duties even as their duties scale and evolve.
  • To be involved in every part of our site, from conception of products and their development to deployment, troubleshooting and analysis.
  • Design, build and automate tools and processes to ensure and improve scalability, availability and performance across areas of technology.
  • Build, integrate and run tools to inject, predict and identify infrastructure and service failures on an ongoing basis to help optimise our sites.
  • As a team member you will contribute, oversee and implement technical projects and have a say in what Technology oddschecker uses.

We need someone who…

  • Has a desire to learn new technologies and apply them where appropriate to improve the quality of our software and processes.
  • Experienced at a senior/strong mid-level as an Engineer
  • Experienced working with Cloud based providers (Google Cloud Platform/AWS/Azure)
  • Good experience working with infrastructure as a code - Terraform, Google Deployment Manager, AWS CloudFormation
  • UNIX/Linux systems administration background (Centos/Ubuntu)
  • Experienced with 2/3+ years’ experience troubleshooting in Unix/Linux
  • Understand TCP/IP network stacks
  • Experience with HTTP web technologies (Tomcat/Apache/Nginx/HAProxy) and highly available, scalable web architecture
  • Good understanding of database administration (MySQL, Elasticsearch)
  • Experience in at least one configuration management solution (Ansible/Puppet/Chef/SaltStack/cfengine).
  • Experience in using monitoring tools (Prometheus, Grafana, InfluxDB, Telegraf, Cacti, Icinga)
  • Programming skills (Java, Go, Python, Bash)

Apply for this role