[Remote] Sr. Engineering Manager, MLOps

Remote Full-time Live

Note: The job is a remote job and is open to candidates in USA. Quince is a tech company disrupting the retail industry by leveraging AI, analytics, and automation. They are seeking a Senior Engineering Manager, MLOps to build and scale the infrastructure that supports production-grade Machine Learning, ensuring seamless operations for their Data Scientists and AI Researchers.

Responsibilities

Define the MLOps Vision & Strategy: Architect a long-term roadmap that transitions ML workflows from manual scripts to a fully automated, self-service platform for all Quince Data Scientists and AI Researchers
Own the "Paved Road" for Production: Build and maintain the end-to-end infrastructure for model training, deployment, and serving, ensuring researchers can move from "idea to production" with zero friction
Drive Strategic Prioritization: Partner with business leaders to align infrastructure investments with core e-commerce drivers like real-time personalization, dynamic pricing, and inventory forecasting
Lead "Build vs. Buy" Evaluations: Make high-judgment decisions on when to leverage cloud-native services (e.g., SageMaker, Vertex AI) versus building custom internal tools to optimize for cost, speed, and flexibility
Guarantee System Scalability & Reliability: Oversee the uptime and performance of production ML services, ensuring the stack can handle massive traffic surges and seasonal spikes without degradation
Manage Compute Governance & Costs: Direct the optimization of high-cost computational resources, such as GPU clusters and cloud instances, balancing high-performance training needs with fiscal responsibility
Recruit and Mentor Top Talent: Build and lead a high-performing team of ML Infra and DevOps engineers, providing technical coaching, career pathing, and performance management
Establish MLOps Standards: Drive the adoption of best practices in CI/CD for ML, Infrastructure as Code (IaC), and automated testing to ensure a modular and maintainable system
Bridge the Research-Engineering Gap: Act as the primary cross-functional lead, translating the complex needs of AI Researchers into actionable engineering requirements for the infrastructure team
Define and Track Velocity Metrics: Establish KPIs for the infrastructure team, such as model deployment frequency, mean time to recovery (MTTR), and infrastructure cost per inference
Champion Operational Excellence: Lead root-cause analyses (RCAs) for production failures and foster a culture of accountability where systemic fixes are prioritized over "quick patches."
Stay Ahead of the AI Curve: Monitor emerging trends in LLM-ops, vector databases, and real-time feature engineering to ensure Quince’s infrastructure remains competitive and future-proof

Skills

10+ years of industry experience, with at least 3-5 years in a leadership or management role specifically focused on ML Infrastructure, MLOps, or large-scale Data Platform engineering
Proven track record of building and scaling MLOps platforms that support the full model lifecycle—from data ingestion and distributed training to real-time inference and monitoring
Deep technical expertise in cloud-native infrastructure (preferably AWS) and orchestration tools like Kubernetes (EKS), Docker, and Infrastructure as Code (Terraform/Pulumi)
Hands-on experience with ML frameworks and tooling, such as PyTorch, TensorFlow, Kubeflow, or SageMaker, and a strong opinion on how to integrate them into a cohesive developer experience
Expertise in building and managing Feature Stores and high-throughput data pipelines (using tools like Spark, Flink, or Kafka) to ensure data consistency across training and serving
Experience partnering with AI Research and Data Science teams to understand their unique workflows and translate research needs into robust, scalable engineering solutions
Strong understanding of CI/CD for ML, including automated testing for models, model versioning, and 'blue-green' or 'canary' deployment strategies
Demonstrated ability to manage high-cost compute resources, with experience optimizing GPU utilization and cloud spend in a hyper-growth environment
Excellence in operational leadership, with a history of driving service availability, performance, and stability through rigorous on-call rotations and root-cause analysis
A product-oriented mindset, with the ability to treat infrastructure as a platform and prioritize the roadmap based on researcher velocity and business ROI
Exceptional communication and influence skills, capable of navigating ambiguity and building consensus across engineering, product, and data science leadership
Kindness and high standards: You move fast and push for excellence, but you do so as a supportive team player who fosters a culture of psychological safety and extreme candor

Benefits

Bonus and equity may also be provided for eligible roles

Company Overview

Quince is an e-commerce company that offers apparel, accessories, home goods, and personal care products through an online platform. It was founded in 2018, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.quince.com.

Company H1B Sponsorship

Quince has a track record of offering H1B sponsorships, with 1 in 2023. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Sr. Engineering Manager, MLOps

On the same wavelength

[Remote] Account Manger

[Remote] Lead, Sales

[Remote] Director, Laboratory Business Development

[Remote] Group Manager, Product (Remote)

[Remote] Technology Program Manager III

[Remote] Sr. Director of Business Development - Softlines

[Remote] Key Account Manager - CPI

[Remote] Account Executive

[Remote] Vice President of Sales

[Remote] Vice President of Training (Healthcare)

Department Chair - Online Prelicensure Nursing

Experienced Information Investigator – Advanced Data Analysis and Insights for blithequark's Direct-to-Consumer Business

Experienced Data Entry Specialist – Remote Work Opportunity with blithequark

Pharmacy Customer Service Associate – Front‑Line Patient Support and Retail Sales Specialist (Taos, NM)

Parent Influencer, Publisher, Local Marketing Specialist

Remote Data Entry Amazon Specialist - Flexible Part-Time Opportunity with Comprehensive Training at blithequark

Healthcare Physician Recruiter

Guest Advocate (Cashier or Front of Store Atten...

Project Manager – Applications & New Build Projects

Senior Private Wealth Advisor, Practice Lead (Personal Strategy) - Dallas/Houston Texas region