Lead Site Reliability Engineer
, Full time
It's a great time to join our ambitious team and get involved with a business that is focused, fast paced and a really great place to work.
You'll be encouraged to share your ideas, try new things (and not worry if they don't go right the first time!) and take ownership of your own development. We have a training fund for you to spend each year, as well as the opportunity to earn up to 25% in performance related bonus. We also have; a beer fridge, pizza Friday's, sport always on in the office and regular social events!
About Oddschecker Global Media
OGM is a global sports betting publishing group, being private-equity backed by Bruin Capital. We are the trusted connection between audiences and their favourite sports betting experiences. iGaming is one of the fastest growing and technologically innovative sectors and we're on top of our game, powered by market-leading tech and driven by brilliant people. OGM currently comprises of two brands (Oddschecker, WhoScored) and a digital media agency (VIME), with Oddschecker being the leading name in sports betting and odds comparison globally. We champion diversity and operate an open and inclusive culture as well as being focused, fast-paced and always making sure to have fun along the way. So why not join us at OGM and be part of something bigger…
We are currently looking for a Lead Site Reliability Engineer to join our team in London (Hammersmith). This is a permanent role with a hybrid or remote working model reporting to the Head of Platform.
What you’ll do as a Lead Site Reliability Engineer
• As a Lead Site Reliability Engineer (SRE), you will ensure our customers get the best quality of service and up time we can give them.
• Be responsible for building capability and maturing operational ways of working across multiple cross-functional teams with focus on technical excellence and a high performance culture.
• As a lead SRE you will contribute, oversee and implement technical projects and have a say in what Technology oddschecker uses.
• Act as a role model for the members of the team, offering support as and when needed
• Identify where we can expect and how we can tolerate IT failures from our systems as well as those we depend upon.
• Work closely with our developers and architects to build and run services and systems that respond consistently to failures by gracefully degrading our services.
• Be responsible for ensuring the systems and applications we launch remain available, reliable and efficient at accomplishing their duties even as their duties scale and evolve.
• To be involved in every part of our site, from conception of products and their development to deployment, troubleshooting and analysis.
• Design, build and automate tools and processes to ensure and improve scalability, reliability, availability and performance across areas of technology. • Build, integrate and run tools to inject, predict and identify infrastructure and service failures on an ongoing basis to help optimise our sites.
We need someone who
• Has significant experience in DevOps implementation and in evolving practices and ways of working through multi-disciplinary teams, business frameworks and culture.
• Experienced working with Cloud based providers (Google Cloud Platform/AWS/Azure)
• Excellent experience working with infrastructure as a code - Terraform, Google Deployment Manager, AWS CloudFormation
• Strong understanding of containerization technologies and orchestration (Kubernetes, Docker Swarm, Nomad)
• Experience in using monitoring/observability tools (Prometheus, Grafana, Thanos, Jaeger) • Passionate about SRE principles (SLA, SLOs, SLIs)
• UNIX/Linux systems administration background (Centos/Ubuntu)
• Has 3/4+ years’ experience in a senior SRE or DevOps roles
• Very good understanding of distributed systems, microservices architecture and principles
• Experience with HTTP web technologies (Tomcat/Apache/Nginx/HAProxy) and highly available, scalable web architecture
• Experience with at least one message queue systems (Kafka, Google PubSub, RabbitMQ)
• Good understanding of database administration (MySQL, Elasticsearch)
• Experience in at least one configuration management solution (Ansible/Puppet/Chef/SaltStack/cfengine).
• Programming skills (Java, Go, Python, Bash) • Knowledge on service mesh like (istio,linkerd) desirable
What you’ll get back from us
Alongside being challenged daily and a real interest in your development, you will also receive:
• Subsidized Sky HD package, broadband and discounted sky talk
• Free Puregym membership
• Free healthcare with Bupa, life assurance and income protection
• Pension scheme with up to 9% contribution from the company
• £1,000 training fund each financial year, to spend on your professional development
• 25 days holiday which increases by 1 day for each year of service (up to 30 days)
Did you know
Every team member gets a £1,000 a year to spend on their development
One in ten bets online in the UK are touched by Oddschecker
We process on average 450 price changes per second.