Site Reliability Engineer H/F (Paris or Full remote in France)
Alma
Software Engineering
Paris, France
Posted on Saturday, July 29, 2023
Job Description
About the job
Within our Engineering team, a Site Reliability Engineer will be part of the Platform tribe and be responsible for:
- Ensuring that the infrastructure is aligned with the internal and external customers' needs and with the requirements of our SLAs/SLOs
- Working with the Engineering teams to design and implement scalable and resilient solutions
- Promoting automation and SRE best practices to optimise operational efficiency
- Developing and maintaining backup and disaster recovery strategies to protect data and ensure business continuity
- Designing, implementing and maintaining monitoring tools to track key system metrics and health indicators
- Providing technical support and expertise to engineering teams for the resolution of application and infrastructure incidents
- Carrying out in-depth analyses of incidents in order to identify the underlying causes and put in place corrective measures
- Maintaining the platform in operational condition by implementing updates, security patches and continuous improvements
- Participating in the optimisation of the platform's operating costs
As a bonus element, our technical stack is:
- Cloud providers: GCP, CloudFlare, AWS
- Backend: Python + FastAPI and Flask
- Frontend: React / Typescript
- Databases technologies: PostgreSQL, Redis, BigQuery
- Log and error management: Datadog, Sentry
- CI/CD: Github Actions, Docker
- Monitoring: Datadog
- Infrastructure as Code: Terraform
About you
We are looking for a candidate who embodies the following qualities:
- At least 5 years of experience in the management of cloud infrastructures
- Deep knowledge of Google Cloud Platform or other cloud providers
- Good network knowledge
- Strong appeal for security topics
- Experience in setting up and maintaining monitoring tools, analysing metrics and malfunctions
- Practice of Infrastructure as code
- Ability to solve problems methodically and work effectively under pressure during critical incidents.
- Strong communication skills to collaborate with different teams and communicate problems and solutions effectively.
- Good practice of English
Recruitment Process
- Recruiter Interview
- Hiring Manager Interview
- Technical Test
- Final Interview