Back to feed

Engineering Manager - ML Platform and Infrastructure

Remote Full-time Live

About Applied Intuition Applied Intuition, Inc. is powering the future of physical AI. Founded in 2017 and now valued at $15 billion, the Silicon Valley company is creating the digital infrastructure needed to bring intelligence to every moving machine on the planet. Applied Intuition services the automotive, defense, trucking, construction, mining and agriculture industries in three core areas: tools and infrastructure, operating systems, and autonomy. Eighteen of the top 20 global automakers, as well as the United States military and its allies, trust the company’s solutions to deliver physical intelligence. Applied Intuition is headquartered in Sunnyvale, California, with offices in Washington, D.C.; San Diego; Ft. Walton Beach, Florida; Ann Arbor, Michigan; London; Stuttgart; Munich; Stockholm; Bangalore; Seoul; and Tokyo. Learn more at applied.co. We are an in-office company, and our expectation is that employees primarily work from their Applied Intuition office 5 days a week. However, we also recognize the importance of flexibility and trust our employees to manage their schedules responsibly. This may include occasional remote work, starting the day with morning meetings from home before heading to the office, or leaving earlier when needed to accommodate family commitments. About the role As an Engineering Manager on the ML Platform team, you'll lead a world-class group of engineers focused on building the infrastructure that powers Physical AI at scale. Your team will own three critical areas: Training & Inference Orchestration, where we build frameworks to efficiently schedule and run massive jobs across thousands of GPUs; GPU Cluster Architecture, where we design and scale what will be the largest GPU cluster for Physical AI in the industry; and Performance Optimization, where we push the limits of hardware utilization, throughput, and cost efficiency for large-scale training and inference workloads. You'll work at the intersection of systems engineering and ML, partnering directly with stack development and research teams to remove bottlenecks and accelerate the path from experimentation to production. At Applied Intuition, you will:

  • Grow and manage a team of world-class infrastructure and systems engineers with the goal of delivering a best-in-class ML platform for Physical AI
  • Own the design and evolution of frameworks for orchestrating distributed training and inference jobs across thousands of GPUs
  • Drive the buildout and scaling of our GPU cluster infrastructure, making critical decisions on architecture, scheduling, networking, and resource management
  • Lead efforts to optimize training and inference performance — including throughput, fault tolerance, GPU utilization, and cost efficiency at scale
  • Set team goals and roadmap in alignment with research milestones, model development timelines, and production deployment requirements
  • Partner closely with research, stack development, and infrastructure teams to understand their workflows and accelerate their iteration speed
  • Drive hiring, mentoring, and growth for a high-performing, mission-driven team We’re looking for someone who has:
  • 3+ years of engineering management experience, ideally leading infrastructure or platform teams
  • Passion for building and leading high-performing teams that operate at the frontier of scale
  • Deep experience with distributed systems, GPU computing, or large-scale ML infrastructure
  • Direct experience building or operating large GPU clusters (1,000+ GPUs)
  • Strong understanding of distributed training frameworks (e.g., PyTorch Distributed, Megatron-LM, DeepSpeed, FSDP) and job orchestration at scale
  • Familiarity with GPU cluster management, high-performance networking (InfiniBand, RDMA), and resource scheduling (Slurm, Kubernetes)
  • Track record of building and operating systems that run reliably at massive scale Nice to have:
  • Background in training optimization techniques such as mixed-precision training, pipeline/tensor/data parallelism, or checkpointing strategies
  • Experience with inference optimization (batching, model serving, quantization, compiler-level optimizations)
  • Familiarity with Physical AI domains such as autonomous driving, robotics, or simulation
  • Contributions to open-source ML infrastructure projects Compensation at Applied Intuition for eligible roles includes base salary, equity, and benefits. Base salary is a single component of the total compensation package, which may also include equity in the form of options and/or restricted stock units, comprehensive health, dental, vision, life and disability insurance coverage, 401k retirement benefits with employer match, learning and wellness stipends, and paid time off. Note that benefits are subject to change and may vary based on jurisdiction of employment. Applied Intuition pay ranges reflect the minimum and maximum intended target base salary for new hire salaries for the position. The actual base salary offered to a successf

Apply tot his job Apply To this Job

On the same wavelength

AI Powered Mobile App Developer, (Remote, Full-Time) PK [AS198]

Remote Full-time

Senior MLOps Engineer, GenAI Framework

Remote Full-time

MLOps Engineer, Senior

Remote Full-time

Remote Community Moderator Jobs - Monitor Online Interactions & Enforce Policies | Earn $25-$35 Per Hour

Remote Full-time

Remote Chat Moderator Position – Online Platform Support

Remote Full-time

Mortgage QA Specialist; Remote - WA, CA,

Remote Full-time

Inside Loan Advisor (Licensed Mortgage Loan Officer)

Remote Full-time

Mortgage Compliance Officer

Remote Full-time

Mortgage Loan Officer – (Wholesale Non-Delegated Lender)

Remote Full-time

Loan Advisor - Purchase

Remote Full-time

Experienced Senior Data Analyst for Remote Data Entry and Business Intelligence Development at Blithequark

Remote Full-time

[Remote] Work From Home -Entry Level Sales (No Experience Needed, Will train!)

Remote Full-time

Experienced Remote Customer Service and Sales Professional – Full-Time Virtual Career Opportunity with Comprehensive Training and Benefits at blithequark

Remote Full-time

Avionics/Electrical BGA Field Service Engineer

Remote Full-time

Cybersecurity Analyst I

Remote Full-time

Experienced Remote Data Entry Specialist – Flexible Work Arrangement

Remote Full-time

Earn Money with Online Typing Work - Freelance Jobs Available

Remote Full-time

Remote Customer Service Representative – Entry-Level Work From Home Opportunity Supporting Global Retail Operations at arenaflex

Remote Full-time

Retail Sales Associate Part Time – Amazon Store

Remote Full-time

Experienced Social Media Customer Support Specialist – Delivering Exceptional Service and Magic to Customers through Social Media Channels at arenaflex

Remote Full-time