Job Description
Lead Engineer, SRE
Advert Reference Number:  1036
Job Location:  Milton Keynes, Remote/Hybrid
Department:  Engineering
Salary:  £59,966 to £67,468 + Market Related Pay of £6,746 per year until 31 December 2027 initially
Closing Date:  11 January 2026
Weekly Working Hours:  37
Contract Type:  Permanent
Fixed Term Contract: End Date:  Not Applicable
Welsh Language:  Not Applicable

Change your career, change lives

The Open University is the UK’s largest university, a world leader in flexible part-time education combining a mission to widen access to higher education with research excellence, transforming lives through education. Find out more about us and our mission by watching this short video (you will be taken to YouTube by clicking this link).

About the Role

As a Lead Engineer, SRE at The Open University, you will be a key driver of reliability, scalability, and operational excellence across our platforms and services. This role goes beyond traditional operations—you will help shape how we design, build, deploy, and run systems in production, applying both a software engineering mindset and deep cloud expertise.

Your primary focus will be on Microsoft Azure, where you will architect and manage resilient cloud infrastructure, implement automation at scale, and integrate observability into all our deliverables. By partnering with architects, software engineers, and product teams, you will ensure our systems meet the highest standards of performance, security, and cost efficiency while remaining easy to operate.

In this senior role, you will:

  • Drive the adoption of SRE practices such as SLIs, SLOs, error budgets, and blameless postmortems to improve system reliability.
  • Embed automation and self-healing mechanisms to reduce manual toil and accelerate recovery from failures.
  • Champion infrastructure as code (IaC) using Bicep, ensuring consistent, repeatable, and compliant environments.
  • Build out end-to-end observability, enabling proactive issue detection and actionable insights into system health.
  • Partner with engineering leadership to shape the technical roadmap, guiding investments in scalability, resilience, and DevOps culture.

This role also carries a strong mentorship and leadership component. You will coach engineers across teams, advocate for best practices, and foster a build-run-own mindset that elevates operational maturity. As part of the engineering profession, you will influence architectural decisions, guide platform evolution, and ensure that our technical direction aligns with both short-term delivery goals and long-term strategic vision.

We are seeking an individual who excels in complex, hybrid environments—encompassing on-premises, cloud-native, and multi-cloud (Azure, AWS) platforms—and can effectively balance tactical problem-solving with strategic foresight. The ideal candidate will be passionate about automation, resilience engineering, and cloud-scale operations, with the ambition to make a lasting impact on how services are delivered and operated.

Key Responsibilities

  • Reliability & Performance: Ensure critical systems and applications are highly available, fault-tolerant, and performant. Implement SLIs, SLOs, and SLAs to measure and drive service reliability. Conduct capacity planning, performance tuning, and chaos engineering exercises to validate system resilience.
  • Cloud Platform Ownership: Design, build, and manage scalable infrastructure on Azure, leveraging services such as App Services, Functions, Service Bus, Front Door, Azure SQL and Event Hub. Use Infrastructure as Code (IaC) with Bicep and Terraform to standardise deployments. Optimise cloud cost efficiency (FinOps) while ensuring stability and performance.
  • Automation & Operations: Automate operational tasks using PowerShell, Bash, or Python. Enhance CI/CD pipelines to accelerate deployments and reduce production risks. Lead efforts to minimise toil by building self-healing and auto-scaling systems.
  • Observability & Incident Management: Implement robust monitoring, logging, and tracing solutions (e.g., Azure Monitor, Application Insights, Splunk). Lead incident response and postmortem reviews, identifying root causes and driving long-term fixes. Establish operational runbooks and playbooks to facilitate the rapid resolution of incidents.
  • Security & Compliance: Embed security-by-design practices in infrastructure and pipelines. Ensure compliance with relevant standards through proactive monitoring and automation. Collaborate with security teams to manage vulnerability assessments and remediation efforts.
  • Collaboration & Leadership: Act as a trusted advisor to engineering teams on scalability, reliability, and operational excellence. Mentor engineers in SRE practices, cloud engineering, and automation tooling. Contribute to the engineering community by sharing best practices and driving the adoption of SRE principles across teams.

 

You will help lead the transformation towards a DevOps culture by promoting automation, integrating practical AI, and fostering a continuous improvement mindset to enhance workflow efficiency, accelerate feedback loops, and support ongoing learning.

You will support this work by developing and advocating for the Open University’s Engineering manifesto and promoting best practices.

About You

Essential:

  • Strong technical leadership in software and systems development, with a proven track record of delivering high-quality solutions on time.
  • Excellent people management and mentoring skills, with the ability to inspire and grow engineering teams.
  • Proven Cloud Expertise: Proven experience with Azure services (Compute, Networking, Storage, AKS, Azure DevOps).
  • Expert in systems architecture and designing scalable, reliable software solutions with deep expertise in cloud services, microservices, and distributed systems, ensuring seamless integration across complex environments.
  • Infrastructure as Code: Strong knowledge of “infrastructure as code” using Bicep.
  • Automation: Proficiency in scripting (C#, PowerShell, Bash, Python) for operational efficiency.
  • DevOps & CI/CD: Experience integrating reliability practices into pipelines (Azure DevOps).
  • Observability Tools: Strong knowledge of monitoring, logging, and distributed tracing (Azure Monitor, Prometheus, Grafana, App Insights).
  • Resilience Engineering: Hands-on experience with disaster recovery, failover design, chaos testing, and performance optimisation.
  • Networking & Security: Solid understanding of networking, identity, access management, and security compliance in cloud environments.
  • Software Engineering Background: Ability to read, review, and contribute to codebases (C#, Python, or other modern languages).
  • SRE Mindset: Familiarity with budgets, SLOs, SLIs, toil reduction, and blameless postmortems.
  • Leadership: Demonstrated ability to mentor teams, influence stakeholders, and drive cultural change toward SRE practices.
  • Stakeholder relationship management, with a strong ability to communicate technical concepts to non-technical stakeholders and manage expectations.

 

We are seeking an experienced SRE with proven expertise in modern cloud-native operations and engineering practices, including Azure, DevOps, automation, test-driven reliability, and IaC. The role requires strong experience in building and running resilient platforms based on MACH principles (Microservices, API-first, Cloud-native, and Headless). Expertise with observability and monitoring tools is essential, along with a track record of improving system reliability through data-driven insights. Success in this role also depends on the ability to lead and collaborate effectively across hybrid teams, including permanent staff and offshore/nearshore partners, while fostering a culture of operational excellence and continuous improvement.

Behaviours:

  • Leadership and Ownership: You take full ownership of your work and lead by example. You are not only focused on the “what” but also the “how” and inspire those around you to strive for excellence.
  • Collaboration: You are a team player, a proactive communicator who works effectively across functions and departments, actively sharing your team’s progress and seeking wider feedback from those outside of the team. You foster a collaborative environment that encourages knowledge-sharing and ensures alignment with the broader organisational goals.
  • Problem-Solving: You have a knack for diagnosing complex problems and offering practical, innovative solutions, while always keeping the bigger picture in mind.
  • Adaptability: In a rapidly changing environment, you embrace change and thrive on the challenge of learning new technologies and methodologies.
  • Strategic Thinking: You have a forward-thinking mindset, always looking for ways to improve processes, systems, and technology to achieve long-term success.

Support with your application

If you have any questions, or need support or adjustments relating to your application, the recruitment process, or the role, please contact us on 01908 541111 or email careers@open.ac.uk quoting the advert reference number.

What's in it for you?

At The Open University, we offer a range of benefits to recognise and reward great work, alongside policies and flexible working that contribute towards a great work life balance. Get all the details of what benefits we offer by visiting our Staff Benefits page (clicking this link will open a new window).

Flexible working

We are open to discussions about flexible working. Whether it’s a job share, part time, compressed hours or another working arrangement. Please reach out to us to discuss what works best for you.

It is anticipated that a hybrid working pattern can be adopted for this role, where the successful candidate can work from home and the office. However, as this role is contractually aligned to our Milton Keynes office it is expected that some attendance in the office will be required when necessary and in response to business needs. We’d expect this to be approximately twice per month.

Next steps in the Recruitment process

We anticipate that interviews for this role will be taking place online via Microsoft Teams during the week commencing 19 January 2026. 

Early closing date notification

We may close this job advert earlier than the published closing date where a satisfactory number of applications are received. We would therefore encourage early applications.

How to apply

To apply for this role please submit the following documents:

  • CV
  • A personal statement (up to 1000 words) that summarises why you’re interested in the role and how your skills and experience make you a good fit. 

 

You can view your progress and application communications when you are logged into our recruitment system.  Please check your spam/junk folders if you do not receive associated email updates.

Information at a Glance
Share this job

Contact us

If you have any queries or questions about the recruitment process, or regarding your application, please contact: Careers@open.ac.uk.

Looking for Associate Lecturer (AL) roles?

Please use our AL home page to find AL vacancies.

Search AL vacancies now >

The Open University is committed to equality, diversity and inclusion which is reflected in our mission to be open to people, places, methods and ideas. We aim to foster a diverse and inclusive environment so that all in our OU community can reach their potential.  We recognise that different people bring different perspectives, ideas, knowledge, and culture, and that this difference brings great strength.  We strive to recruit, retain and develop the careers of a diverse pool of students and staff, and particularly encourage applications from all underrepresented groups. We also aspire to make The Open University a supportive workplace for all through our policies, services and staff networks.