Senior / Lead Machine Learning Engineer (AI & LLM Systems)

Há 5 dias


Lisboa, Portugal HumanIT Digital Consulting Tempo inteiro

ABOUT THE OPPORTUNITY Join a leading global technology platform as a Senior or Lead Machine Learning Engineer and drive the design, development, and production deployment of advanced AI and Large Language Model (LLM) systems that power intelligent product features and workflows at scale. Reporting to the Head of AI or VP Engineering, you'll shape the architecture of scalable machine learning solutions, guide technical direction across AI initiatives, and collaborate with world-class engineering, product, and research teams to deliver high-impact AI products. This role offers the perfect blend of technical leadership and hands-on engineering, where you'll architect and implement robust LLM-based systems, lead training and fine-tuning of large-scale models, and develop sophisticated workflows including retrieval-augmented generation (RAG), vector databases, and agentic AI components. You'll have the opportunity to translate cutting-edge research into production-ready systems while mentoring engineers and driving best practices across the organization. Working at the intersection of research and engineering, you'll transform advanced AI concepts into scalable, reliable production systems that deliver measurable business value to millions of users worldwide. Critical Requirements: This is a senior to lead-level position requiring 5+ years of experience in machine learning and AI with proven expertise in Python, ML frameworks (PyTorch or TensorFlow), hands-on production experience training and deploying LLMs and generative models, deep knowledge of modern ML tooling including RAG pipelines and vector search, and comprehensive understanding of ML system design, optimization, and evaluation. Advanced English (C1 level) is essential for cross-functional collaboration. PROJECT & CONTEXT You'll architect and build production-grade machine learning and LLM systems that power intelligent features, automation, and AI-driven workflows for a global technology platform serving millions of users. Your core responsibilities center on designing robust, scalable ML infrastructure, making critical architectural decisions about system design, model selection, deployment strategies, and technical approaches that balance innovation with production reliability, scalability, and performance requirements. A significant portion of your work involves leading training, fine-tuning, and optimization of large-scale language models for real production use cases. This includes selecting appropriate base models, implementing fine-tuning strategies using domain-specific data, optimizing model performance for inference efficiency, and ensuring models meet quality, safety, and compliance requirements for production deployment. You'll develop and maintain sophisticated AI workflows including RAG systems that ground LLM outputs in accurate information, information retrieval systems, vector databases for semantic search, and agentic AI components that can reason, plan, and execute multi-step tasks autonomously. Experimentation and validation are critical - you'll drive comprehensive experimentation frameworks, design and execute benchmarks to evaluate model performance, and implement rigorous evaluation methodologies measuring both technical metrics (accuracy, latency, throughput) and business outcomes (user satisfaction, task completion). Cross-functional collaboration defines your daily work as you partner with engineering teams to integrate ML systems, work with product managers to translate business requirements into technical solutions, and collaborate with research teams to apply cutting-edge approaches to production problems. Technical leadership and mentorship are key expectations at the Senior/Lead level. You'll mentor engineers through code reviews, architecture reviews, and knowledge sharing, establish best practices for ML engineering including testing strategies and deployment patterns, and guide the team in model development, system design, and production ML practices. You'll translate research insights into production systems by evaluating emerging research, prototyping promising approaches, and implementing production-ready versions with clear success metrics. The role demands expertise in modern ML tooling including vector databases (Pinecone, Weaviate, Chroma), embedding models, semantic search, model serving infrastructure, and ML observability tools. You'll make critical architectural decisions about fine-tuning versus few-shot prompting, efficient RAG retrieval, multi-step agentic workflow orchestration, and ensuring systems scale efficiently. Working in a hybrid environment with 2 days per week in the Lisbon office enables both focused individual work and collaborative team activities. Core Tech Stack: Python (primary), PyTorch (preferred) or TensorFlow, LLM frameworks (LangChain, LlamaIndex), vector databases, embedding models ML Infrastructure: Model training frameworks, experiment tracking, model serving platforms, feature stores, ML monitoring tools AI Focus: Large language models (LLMs), generative AI, RAG, vector search, agentic AI, multi-agent systems Engineering Practices: Production ML system design, model evaluation, A/B testing, monitoring and observability, scalable deployment Scale: Global platform serving millions of users with high-volume, low-latency AI inference requirements WHAT WE'RE LOOKING FOR (Required) Machine Learning Experience: Minimum 5+ years of hands-on experience in machine learning, AI, or closely related fields with proven track record delivering production ML systems - this is the core requirement Python Proficiency: Strong proficiency in Python for machine learning engineering with deep understanding of Python best practices, libraries, and frameworks relevant to ML development ML Frameworks Expertise: Production experience with machine learning frameworks such as PyTorch (strongly preferred) or TensorFlow, including model development, training, fine-tuning, and deployment LLM Production Experience: Hands-on experience training and deploying Large Language Models (LLMs) and generative models in production environments, understanding of LLM architectures, training techniques, inference optimization, and production deployment challenges Modern ML Tooling: Strong knowledge of modern ML tooling including RAG (Retrieval-Augmented Generation) pipelines, vector search and embeddings, vector databases, model serving infrastructure, and related technologies for building production LLM applications ML System Design: Deep understanding of ML system design principles including data processing pipelines, feature engineering, model architecture selection, evaluation metrics design, optimization strategies, and production deployment patterns Data Processing: Experience with large-scale data processing for ML training and inference, understanding data quality requirements, and implementing efficient data pipelines Model Evaluation: Expertise in designing and implementing comprehensive model evaluation frameworks, selecting appropriate metrics, conducting benchmarks, and validating production readiness Optimization Skills: Strong skills in model optimization including hyperparameter tuning, training efficiency, inference performance, cost optimization, and resource utilization Production ML Deployment: Demonstrated experience deploying ML models to production including model serving, versioning, monitoring, A/B testing, and continuous improvement cycles Technical Leadership: Ability to guide technical direction, make architectural decisions, drive technical strategy, and influence engineering practices across teams Collaboration Skills: Excellent collaboration abilities working effectively with cross-functional teams including engineering, product, research, and business stakeholders Communication Excellence: Outstanding communication skills capable of articulating complex technical concepts to both technical and non-technical audiences, writing clear technical documentation, and presenting to stakeholders Mentorship Capability: Experience and willingness to mentor other engineers through code reviews, knowledge sharing, best practices establishment, and technical guidance Problem-Solving: Strong analytical and problem-solving skills for debugging complex ML systems, identifying performance bottlenecks, and resolving production issues English Proficiency: C1 level (Advanced) or higher in English for technical communication, documentation, collaboration with international teams, and stakeholder engagement - this is mandatory Work Authorization: Eligibility to work in Portugal with availability for hybrid work model (2 days per week in Lisbon office) NICE TO HAVE (Preferred) Agentic AI Experience: Hands-on experience with agentic AI systems, multi-agent workflows, autonomous agents, planning and reasoning systems, and tool-using LLMs Developer Tooling: Experience building developer-facing tools, APIs, SDKs, or platforms that enable other engineers to leverage ML capabilities Applied Research Background: Background in applied research with publications in relevant areas including NLP, LLMs, generative AI, information retrieval, or machine learning conferences/journals Research to Production: Track record of successfully translating academic research into production systems with clear business impact Cloud Platform Expertise: Hands-on experience with cloud platforms including AWS (SageMaker, Bedrock, EC2, S3), GCP (Vertex AI, Cloud ML), or Azure (Azure ML, OpenAI Service) Scalable Deployment Frameworks: Experience with scalable ML deployment frameworks including Kubernetes for ML workloads, model serving platforms (TensorFlow Serving, TorchServe, Triton), and container orchestration Vector Database Deep Expertise: Advanced knowledge of vector databases like Pinecone, Weaviate, Chroma, Milvus, FAISS including optimization strategies and production deployment Additional ML Frameworks: Experience with complementary ML frameworks like JAX, scikit-learn, Hugging Face Transformers, or specialized libraries for NLP and generative AI LLM Fine-Tuning Advanced: Deep expertise in advanced fine-tuning techniques including LoRA, QLoRA, PEFT methods, instruction tuning, and RLHF (Reinforcement Learning from Human Feedback) Prompt Engineering: Advanced skills in prompt engineering, few-shot learning, chain-of-thought prompting, and prompt optimization techniques Embeddings Expertise: Deep understanding of embedding models, similarity search, semantic search optimization, and embedding space analysis Multi-Modal AI: Experience with multi-modal models handling text, images, audio, or other modalities MLOps Practices: Strong MLOps experience including ML pipeline automation, CI/CD for ML, model monitoring and observability, feature store implementation, and experiment tracking Distributed Training: Experience with distributed training of large models across multiple GPUs or machines using frameworks like DeepSpeed, Megatron, or FSDP Model Compression: Knowledge of model compression techniques including quantization, pruning, distillation, and knowledge



  • Lisboa, Lisboa, Portugal HumanIT Digital Consulting Tempo inteiro

    ABOUT THE OPPORTUNITYJoin a leading global technology platform as a Senior or Lead Machine Learning Engineer and drive the design, development, and production deployment of advanced AI and Large Language Model (LLM) systems that power intelligent product features and workflows at scale.Reporting to the Head of AI or VP Engineering, you'll shape the...


  • Lisboa, Portugal Upwork Tempo inteiro

    A leading talent platform is seeking a Lead Machine Learning Engineer/Scientist to design and build memory management systems for LLM-powered experiences. The role involves developing memory architectures, leading cross-functional teams, and enhancing personalization in AI systems. Candidates should have a strong background in LLM technologies and...


  • Lisboa, Portugal Upwork Tempo inteiro

    Senior Lead Machine Learning Engineer, Agentic AI Lisbon, Portugal Upwork Inc.’s (Nasdaq: UPWK) family of companies connects businesses with global, AI-enabled talent across every contingent work type including freelance, fractional, and payrolled. This portfolio includes the Upwork Marketplace, which connects businesses with on-demand access to highly...


  • Lisboa, Portugal Tripadvisor Tempo inteiro

    We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we want you to bring your unique identities, abilities, and experiences, so we can collectively revolutionize travel and together find the good out there. Tripadvisor is the web's...


  • Lisboa, Portugal Tripadvisor Tempo inteiro

    **We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we want you to bring your unique identities, abilities, and experiences, so we can collectively revolutionize travel and together find the good out there.** Tripadvisor is the web's...


  • Lisboa, Portugal Upwork Tempo inteiro

    Upwork Inc.'s (Nasdaq: UPWK) family of companies connects businesses with global, AI-enabled talent across every contingent work type including freelance, fractional, and payrolled. This portfolio includes the Upwork Marketplace, which connects businesses with on-demand access to highly skilled talent across the globe, and Lifted, which provides a...


  • Lisboa, Lisboa, Portugal Upwork Tempo inteiro

    Upwork Inc.'s (Nasdaq: UPWK) family of companies connects businesses with global, AI-enabled talent across every contingent work type including freelance, fractional, and payrolled. This portfolio includes the Upwork Marketplace, which connects businesses with on-demand access to highly skilled talent across the globe, and Lifted, which provides a...


  • Lisboa, Portugal Upwork Tempo inteiro

    A leading remote work platform is hiring a Senior Lead Machine Learning Engineer to architect and scale agentic intelligence. You will design multi-agent systems and lead the data strategy for LLMs, shaping the future of AI development. Ideal candidates have extensive ML experience and strong software fundamentals. The role offers an opportunity to optimize...


  • Lisboa, Portugal Upwork Tempo inteiro

    Upwork Inc.'s (Nasdaq: UPWK) family of companies connects businesses with global, AI-enabled talent across every contingent work type including freelance, fractional, and payrolled. This portfolio includes the Upwork Marketplace, which connects businesses with on-demand access to highly skilled talent across the globe, and Lifted, which provides a...


  • Lisboa, Portugal Upwork Tempo inteiro

    A leading technology company in Lisbon is seeking a Senior Lead Machine Learning Engineer to design, build, and scale agentic intelligence. You will lead the development of AI agents and infrastructures while ensuring reliability and performance. Ideal candidates should have senior-level experience in applied machine learning and a strong background in...