[Remote] Staff Machine Learning Engineer
Note: The job is a remote job and is open to candidates in USA. Cresta is a company focused on transforming customer experiences through AI technology. The Staff Machine Learning Engineer will lead high-impact AI initiatives, focusing on building next-generation agentic AI systems and improving the reliability and performance of LLM-powered agents.
Responsibilities
- Define and lead the technical vision for Cresta’s next-generation Agentic AI systems, including Agentic Assist and enterprise AI Agents
- Architect scalable, production-grade LLM systems that integrate reasoning, retrieval, planning, tool use, and real-time decision-making into cohesive, intelligent workflows
- Design and evolve multi-agent orchestration frameworks that combine RAG, structured knowledge, domain-adapted models, and automated actions
- Establish best practices for building robust, reliable, and cost-efficient LLM-powered systems in high-scale production environments
- Own evaluation strategy for complex, non-deterministic AI systems, including offline benchmarking, online experimentation, LLM-as-a-judge methodologies, and systematic failure analysis
- Proactively identify and mitigate agent failure modes such as hallucinations, tool misuse, retrieval errors, prompt brittleness, context drift, and multi-step reasoning breakdowns
- Define measurable quality standards (accuracy, faithfulness, task completion, latency, cost efficiency, robustness) and drive continuous system improvement
- Influence cross-team architecture decisions across ML, backend, and product engineering to ensure seamless integration of AI capabilities
- Mentor senior engineers, raise the technical bar, and contribute to long-term AI strategy and roadmap planning
- Translate cutting-edge research advances into practical, high-impact production systems
Skills
- Bachelor's degree in Computer Science, Mathematics, or a related field; Master's or Ph.D. strongly preferred
- 7+ years of experience building and deploying machine learning systems in production, including deep hands-on experience with LLMs at scale
- Demonstrated leadership in architecting complex AI systems, particularly agentic or multi-step LLM workflows
- Deep expertise in transformer-based models, embeddings, retrieval systems, and Retrieval-Augmented Generation (RAG) pipelines
- Experience designing evaluation frameworks for LLM systems beyond single-turn prompts, including robustness testing and production monitoring
- Strong systems thinking: ability to design for scalability, latency constraints, cost efficiency, security, and long-term maintainability
- Extensive experience with modern ML frameworks (e.g., PyTorch, TensorFlow, Hugging Face) and distributed/cloud-based infrastructure
- Proven ability to influence technical direction across teams as a senior individual contributor
- A strong bias toward action — able to prototype rapidly while maintaining production rigor
Benefits
- Comprehensive medical, dental, and vision coverage with plans to fit you and your family
- Flexible PTO to take the time you need, when you need it
- Paid parental leave for all new parents welcoming a new child
- Retirement savings plan to help you plan for the future
- Remote work setup budget to help you create a productive home office
- Monthly wellness and communication stipend to keep you connected and balanced
- In-office meal program and commuter benefits provided for onsite employees
- Equity
Company Overview
Company H1B Sponsorship