Site Reliability Engineer
Há 7 dias
What you will be doing:
Join our dynamic and collaborative technology team as a Site Reliability Engineer You'll be at the heart of our operations, playing a pivotal role in ensuring the reliability, scalability, and performance of the critical services our customers depend on.
As part of the DevOps team within our Infrastructure tribe, you'll collaborate with fellow SREs and other engineering teams to support the entire Technology organization and the wider company. The Infrastructure tribe is dedicated to building and maintaining the foundational systems, tooling, and services that empower our developers to bring exceptional products to life and keep them running smoothly and securely in production. We're focused on standardizing key areas like cloud infrastructure, deployment pipelines, and observability, allowing product teams to concentrate on their core applications.
The DevOps team is crucial in architecting, building, and operating the tools that underpin our production environment. Recent initiatives include evolving our Internal Developer Portal, architecting a High Availability solution for Nexus, optimizing our observability costs, championing the adoption of Service Level Objectives (SLOs), and migrating to GitLab SaaS.
As a Site Reliability Engineer (DevOps) you will:
- Design and Build: Architect, implement, and maintain highly available and reliable foundational services for CI/CD pipelines, observability platforms, and our Internal Developer Platform, which are essential for our engineering teams to deliver scalable services daily.
- Ensure Reliability: Participate in an on-call rotation to effectively respond to and resolve production incidents swiftly. Lead thorough post-incident reviews to identify root causes and implement proactive preventative measures.
- Automate Infrastructure: Manage and automate our cloud infrastructure using Terraform and Helm, adhering to GitOps best practices.
- Collaborate Effectively: Partner closely with development and data engineering teams to ensure seamless customer experiences of our services and provide robust operational support.
Our Tech Stack:
- Cloud-Based Infrastructure: Fully cloud-based with a Kubernetes-focused tech stack. Compute workloads run in Kubernetes clusters across multiple regions on AWS and GCP.
- Cloudops uses Golang and Python for our backend languages, and leverages TypeScript to build and maintain our Cloudflare Workers and related edge services. A basic knowledge of bash and shell scripting will be useful.
- Our products are built using Kotlin and Python at the backend, with Typescript and React forming the frontend. All workloads are containerised.
- Our products make substantial use of relational database technologies, notably Postgres and Yugabyte
- We use an event-sourced model powered by Kafka for our communication bus and gRPC for our intra-service communication protocol
- We use modern observability solutions from Grafana Cloud, we build with GitLab tooling and deploy our code using ArgoCD
We have a strong emphasis on engineering excellence and strive to ship the best possible code and the best possible solutions to our customers.
About you:
- Deep expertise in cloud services (AWS and/or GCP) particularly IAM
- Significant experience managing and troubleshooting services within Kubernetes environments, and an understanding of Kubernetes as an ecosystem
- Strong proficiency in observability platforms, including monitoring, alerting, and production operations. Particularly Prometheus / Grafana.
- Hands-on experience codifying infrastructure with Terraform and Helm charts.
- Excellent incident response and troubleshooting abilities.
- Proficiency in scripting and automation using Python.
- Experience working with containerized workloads.
- Experience collaborating with software engineers to support production cloud-native applications.
Nice to have:
- Familiarity with ArgoCD, GitLab CI, Backstage and the Grafana, Mimir, Loki & Prometheus stack.
Education:
- BSc/BA degree in computer science, engineering or related discipline OR relevant years of experience in required skills.
What's in it for you?
- Equity as we want you to have a part of what we are building
- Private medical insurance designed to keep you ensuring peace of mind while you excel in your career.
- Unlimited Time Off Policy- A work-life balance and focus on our well-being are critical to keeping us performing at our best
- We embrace a hybrid approach that requires employees to be in the office for two days a week. We strongly believe that this approach fosters collaboration and enables the building of meaningful relationships
- You will also get a new starter budget to kit out your home office
- Opportunity to work on innovative projects with smart-minded people keen to share their knowledge and continuously improve
- Annual learning budget (prorated based on start date) to drive your performance and career development.
About us:
Our mission is to empower every business to eliminate financial crime.
By harnessing AI, a unified platform, and an extensive partner ecosystem, we help customers turn compliance into a catalyst for growth, operational resilience, and enduring regulatory trust.
More than 3,000 enterprises across 75 countries rely on our end-to-end platform and the world's most comprehensive financial crime risk intelligence. With full-stack agentic automation, we help organizations automate up to 95% of KYC, AML, and sanctions reviews, cut onboarding times by 50%, reduce false positives by 70%, and handle 7x more work with the same staff.
ComplyAdvantage is headquartered in London and has global hubs in New York, Lisbon, Singapore, and Cluj-Napoca. It is backed by Balderton Capital, Index Ventures, Ontario Teachers' Pension Plan, Goldman Sachs, and Andreessen Horowitz. Learn more about compliance re-engineered for the age of AI
-
Site Reliability Engineer
1 semana atrás
Lisboa, Lisboa, Portugal Sperton Global AS Tempo inteiroJob Title: Site Reliability Engineer (SRE) Location: Lisbon, Portugal (Hybrid)Job Type: Contract (6 months)Role Overview:We are looking for an experienced Site Reliability Engineer (SRE) to support business-critical systems in the banking and financial services domain. The role has a strong focus on production support, monitoring, automation, CI/CD...
-
Site Reliability Engineer
2 semanas atrás
Lisboa, Lisboa, Portugal Claire Joster Tempo inteiroClaire Joster is currently recruiting for a reference client in car rental services, who aims to strengthen its internal structure with the integration of aSite Reliability Engineer(m/f).Functions:Define Reliability: design, implement, and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for our production services;Automation:...
-
Site Reliability Engineer
1 dia atrás
Lisboa, Lisboa, Portugal ISPROX Tempo inteiroISPROX is a talent recruiting organization. Our goal is to find and select the best human capital and talent for our clients in order to help them to grow or sustain as a company. ISPROX has presence in several locations in Europe in order to be as much close as possible from our clients.ISPROX is looking for:We are selecting for our client, a multinational...
-
Azure Site Reliability Engineer
1 semana atrás
Lisboa, Lisboa, Portugal act digital Tempo inteiroWe are looking for an Azure Site Reliability Engineer to join a Cloud Operations team focused on digital transformation and cloud optimization. The team works closely with development and infrastructure teams to deliver secure, scalable and highly available cloud platforms.Role OverviewAs an Azure SRE, you will be responsible for ensuring the operational...
-
Site Reliability Engineer
1 dia atrás
Lisboa, Lisboa, Portugal Lynxmind Tempo inteiroWe are looking for aSite Reliability Engineer (SRE)with a strong background in systems and software engineering, capable of designing, implementing and operating highly reliable, scalable and secure platforms. The role involves applying software engineering principles to infrastructure and operations, building automation and observability solutions and...
-
Site Reliability Engineer
2 semanas atrás
Lisboa, Lisboa, Portugal IDW Tempo inteiroA IDW é uma empresa Portuguesa, reconhecida pela qualidade dos seus serviços e recursos humanos, focada em apresentar aos seus clientes as melhores soluções de negócio, baseadas em tecnologias de Informação. Na IDW desenhamos e implementamos soluções e serviços em algumas das maiores empresas a operar em Portugal e a nível internacional.Estamos à...
-
SRE - Site Reliability Engineer
Há 5 dias
Lisboa, Lisboa, Portugal MCC Consulting Tempo inteiroSRE – Site Reliability EngineerEstamos à procura de umSREexperiente para integrar uma equipa dinâmica e orientada para a excelência operacional. Se gostas de automação, estabilidade, observabilidade e boas práticas DevOps, esta oportunidade pode ser para tiRequisitos obrigatóriosExperiência ProfissionalMínimo de 5 anos de experiência comprovada...
-
Site Reliability Engineer
1 semana atrás
Lisboa, Lisboa, Portugal ComplyAdvantage Tempo inteiroWhat you will be doing:Join our dynamic and collaborative technology team as a Site Reliability Engineer You'll be at the heart of our operations, playing a pivotal role in ensuring the reliability, scalability, and performance of the critical services our customers depend on. As part of the CloudOps team within our Platform tribe, you'll collaborate with...
-
Site Reliability Engineer
Há 5 dias
Lisboa, Lisboa, Portugal ComplyAdvantage Tempo inteiroWhat you will be doing:Join our dynamic and collaborative technology team as a Site Reliability Engineer You'll be at the heart of our operations, playing a pivotal role in ensuring the reliability, scalability, and performance of the critical services our customers depend on.As part of the CloudOps team within our Platform tribe, you'll collaborate with...
-
Site Reliability Engineer
Há 7 dias
Lisboa, Lisboa, Portugal ComplyAdvantage Tempo inteiroWhat you will be doing:Join our dynamic and collaborative technology team as a Site Reliability Engineer You'll be at the heart of our operations, playing a pivotal role in ensuring the reliability, scalability, and performance of the critical services our customers depend on.As part of the DevOps team within our Infrastructure tribe, you'll collaborate with...