Site Reliability Engineer

2 semanas atrás


Lisboa, Portugal Air Apps Tempo inteiro

About Air Apps

At Air Apps, we believe in thinking bigger—and moving faster. We’re a family‑founded company on a mission to create the world’s first AI‑powered Personal & Entrepreneurial Resource Planner (PRP), and we need your passion and ambition to help us change how people plan, work, and live. Born in Lisbon, Portugal, in 2018 and now with offices in both Lisbon and San Francisco, we’ve remained self‑funded while reaching over 100 million downloads worldwide. Our long‑term focus drives us to challenge the status quo every day, pushing the boundaries of AI‑driven solutions that truly make a difference.

The Role

As a Site Reliability Engineer (SRE) at Air Apps, you will be responsible for ensuring the reliability, availability, and scalability of our systems. You will work at the intersection of software development and operations, implementing automation, monitoring, and performance optimization strategies to minimize downtime and improve system resilience.

Responsibilities

- Design and implement scalable, reliable, and fault‑tolerant systems across cloud environments.

- Develop and maintain observability tools, including monitoring, logging, and alerting (e.g., Prometheus, Grafana, Datadog, ELK).

- Automate infrastructure provisioning, deployment, and incident response using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.

- Optimize system performance, scalability, and incident response workflows to improve uptime.

- Work closely with development and DevOps teams to improve system design for reliability.

- Conduct root cause analysis (RCA) and implement preventative measures to minimize failures.

- Ensure high availability by designing and maintaining load balancing, failover, and disaster recovery strategies.

- Improve CI/CD pipelines to enhance deployment speed while maintaining stability.

- Optimize cloud cost and resource utilization for AWS, Azure, or Google Cloud Platform (GCP).

- Participate in on‑call rotations to quickly address system failures and minimize downtime.

Requirements

- Around 4+ years of experience in Site Reliability Engineering (SRE), DevOps, or System Engineering.

- Strong knowledge of cloud platforms (AWS, Azure, or GCP) and cloud‑native architectures.

- Experience with observability and monitoring tools (Prometheus, Grafana, ELK, Datadog, New Relic).

- Proficiency in IaC tools such as Terraform, CloudFormation, or Pulumi.

- Hands‑on experience with containerization and orchestration (Docker, Kubernetes, Helm).

- Strong Linux system administration and networking fundamentals.

- Experience with incident management, debugging, and root cause analysis.

- Proficiency in scripting (Bash, Python, or Go) for automation and system monitoring.

- Knowledge of load balancing, failover strategies, and distributed systems.

- Understanding of security best practices, access control, and compliance requirements.

- Strong communication skills and the ability to collaborate with cross‑functional teams.

Benefits

- Apple hardware ecosystem for work.

- Flexible Paid Time Off (PTO) to support work‑life balance.

- Annual Bonus.

- Top‑tier Health and Life Insurance.

- Transportation Budget.

- Coverflex benefits package for meal allowances, well‑being, and more.

- Childcare support.

- Air Conference – an opportunity to meet the team, collaborate, and grow together.

- Pension Fund.

- Urban Sports Club membership.

- Meals 100% free at the hub.

Diversity & Inclusion

At Air Apps, we are committed to fostering a diverse, inclusive, and equitable workplace. We enthusiastically welcome applicants from all backgrounds, experiences, and perspectives. We celebrate diversity in all its forms and believe that varied voices and experiences make us stronger.

Application Disclaimer

At Air Apps, we value transparency and integrity in our hiring process. Applicants must submit their own work without any AI‑generated assistance. Any use of AI in application materials, assessments, or interviews will result in disqualification.

#J-18808-Ljbffr


  • site reliability engineer

    3 semanas atrás


    lisboa, Portugal Randstad Tempo inteiro

    A Randstad Digital está a recrutar um Site Reliability Engineer para integração direta num cliente em Lisboa.Regime de trabalho Híbrido.


  • lisboa, Portugal Tata Consultancy Services Tempo inteiro

    Are you a Site Reliability Engineer seeking a new interesting challenge ? If your answer is yes, it’s your lucky day so keep reading, it can be just what you're looking for !


  • lisboa, Portugal Tata Consultancy Services Tempo inteiro

    Are you a Site Reliability Engineer seeking a new interesting challenge ? If your answer is yes, it’s your lucky day so keep reading, it can be just what you're looking for !


  • Lisboa, Portugal Sperton Global AS Tempo inteiro

    Job Title: Site Reliability Engineer (SRE) Location:  Lisbon, Portugal (Hybrid)Job Type: Contract (6 months) Role Overview: We are looking for an experienced Site Reliability Engineer (SRE) to support business-critical systems in the banking and financial services domain. The role has a strong focus on production support, monitoring, automation, CI/CD...

  • Site Reliability Engineer

    1 semana atrás


    Lisboa, Lisboa, Portugal Sperton Global AS Tempo inteiro

    Job Title: Site Reliability Engineer (SRE) Location:  Lisbon, Portugal (Hybrid)Job Type: Contract (6 months)Role Overview:We are looking for an experienced Site Reliability Engineer (SRE) to support business-critical systems in the banking and financial services domain. The role has a strong focus on production support, monitoring, automation, CI/CD...


  • Lisboa, Portugal QiBit Portugal Tempo inteiro

    We are looking for a Senior Site Reliability Engineer (SRE) to join the IT team of our client - a company specialized in the financial technology sector. What will be your main tasks and responsibilities? - Act as the primary contact and leader for platform incidents, ensuring swift resolution through collaboration with engineering teams and effective...


  • Lisboa, Portugal Paymentology Tempo inteiro

    Join to apply for the Site Reliability Engineer role at Paymentology. Be among the first 25 applicants. Paymentology is the first truly global issuer‑processor, giving banks and fintechs the technology, team and experience to rapidly issue and process Mastercard, Visa and UnionPay cards across more than 60 countries at scale. Our advanced multi‑cloud...

  • Site Reliability Engineer

    2 semanas atrás


    Lisboa, Lisboa, Portugal Claire Joster Tempo inteiro

    Claire Joster is currently recruiting for a reference client in car rental services, who aims to strengthen its internal structure with the integration of aSite Reliability Engineer(m/f).Functions:Define Reliability: design, implement, and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for our production services;Automation:...

  • Site Reliability Engineer

    1 semana atrás


    Lisboa, Portugal Ubique Systems Tempo inteiro

    3 days ago Be among the first 25 applicants Direct message the job poster from Ubique Systems This will be a B2B or Frrelance contract role. Location - Lisbon, Portugal. Responsibilities - Strong working knowledge in DevOps tools (CI/CD pipelines in Jenkins), Git, Bitbucket, XLR - Good Linux skills (proper hands-on of using commands and scripting) as...


  • Lisboa, Portugal act digital Tempo inteiro

    We are looking for an Azure Site Reliability Engineer to join a Cloud Operations team focused on digital transformation and cloud optimization. The team works closely with development and infrastructure teams to deliver secure, scalable and highly available cloud platforms. Role Overview As an Azure SRE, you will be responsible for ensuring the operational...