Lead Site Reliability Engineer

Website Civil Service

Job summary

The Government Digital Service (GDS) is the digital centre of government. We are responsible for setting, leading and delivering the vision for a modern digital government.

Our priorities are to drive a modern digital government, by:
• joining up public sector services
• harnessing the power of AI for the public good
• strengthening and extending our digital and data public infrastru

cture
• elevating leadership and investing in talent
• funding for outcomes and procuring for growth and innovation
• committing to transparency and driving accountability

We are home to the Incubator for Artificial Intelligence (I.AI), the world-leading GOV.UK and at the forefront of coordinating the UKs geospatial strategy and activity. We lead the Government Digital and Data function and champion the work of digital teams across government.

Were part of the Department for Science, Innovation and Technology (DSIT) and employ more than 1,000 people all over the UK, with hubs in Manchester, London and Bristol.

The Government Digital Service is where talent translates into impact. From your first day, youll be working with some of the worlds most highly-skilled digital professionals, all contributing their knowledge to make change on a national scale.

Join us for rewarding work that makes a difference across the UK. You’ll solve some of the nations highest-priority digital challenges, helping millions of people access services they need.

Job description

The GOV.UK One Login for Government Programme represents a once in a generation opportunity to simplify and widen access to all digital government services. Sitting at the heart of the government, we are building one simple, safe and secure way for users to Log in and prove who they are that will work across all government services.

GOV.UK One Login is being designed and built for the many, not the few. It will unite services across government, revolutionising the way government departments digitally interact with users. One Login will deliver an accessible and essential function that will change lives and help millions. We are an ambitious and visionary team so if you want to be at heart of this truly ground-breaking programme keep reading

The GOV.UK One Login programme is full of talented and passionate people who are consistently delivering high quality products for services and individuals. Were half way through our build phase and features are being shipped almost weekly as we work to mature our product set so that we can expand the range of services and departments benefitting from our work.

As a Lead Site Reliability Engineer, youll share the responsibility for the delivery of our new way for people to prove their identity online.

Youll work closely with the programme Head of SRE, Head of Architecture, Head of Engineering,Head of Test & Quality Engineering, other Lead SREs as well as other senior leaders. Youll work to enable our multi-disciplinary teams to build and run services as quickly and safely as possible. Youll be a leader across the engineering team with a focus on how we build, scale and improve the infrastructure platform and tooling. You will run these as service offerings to service teams across the programme.

As a Lead Site Reliability Engineer you’ll:
• identify and promote best practice in reliability engineering
• work in a multidisciplinary manner across the programme by working with SREs, developers, technical architects, product managers and others, to provide robust, resilient and scalable platforms
• ensure the programme has the right processes in place, including identifying and measuring important metrics to drive continual improvement
• work with colleagues on identification of technical risks in relation to the infrastructure, as well as plans to resolve or mitigate the risks
• communicate concerns, risks and issues with the broader team and senior management
• prioritise and deliver recommendations and improvements in response to incident reviews
• set an example for and encourage open, positive, and constructive communication both within the team and when communicating with other GDS teams
• cultivate and maintain relationships with other teams within GDS, Cabinet Office, and the rest of government
• mentor and manage site reliability engineers and developers
• work with teams, Cyber Security and Information Assurance to ensure ongoing integrity and security of our service and infrastructure
• participate in an out-of-hours rota

Person specification

Were interested in people who can demonstrate evidence of
• operating effectively at a senior level across a large programme of work with experience of collaborating with and managing stakeholders and senior management
• working with large scale fully cloud based serverless or container-based architectures
• technical leadership of people across multiple teams, feeding into strategy and long-term roadmaps, as well as delegating design decisions as necessary
• working at a mixture of various product stages – greenfield, maturing greenfield into operational services, in addition to established live services
• bringing an operational mindset to all stages of development
• troubleshooting a complex, multi-application service in a high availability environment
• enabling large scale delivery through managing fully automated and staged CI & CD pipelines that deliver to production
• an infrastructure-as-code approach and experience of Terraform or AWS SAM or similar
• managing the lifecycle of cloud based data stores
• leading on observability – logging, monitoring and alerting and defining service level indicators and objectives
• detailed knowledge of application, infrastructure and network security practices with experience of embedding these into service teams
• managing teams or projects, helping colleagues with their career development and coaching more junior staff members

To apply for this job please visit restless.co.uk.