Back to feed

[Remote] Machine Learning Infrastructure Engineer

Remote Full-time Live

Note: The job is a remote job and is open to candidates in USA. TRM Labs is a company dedicated to building a safer world through AI-powered intelligence solutions. The Senior Software Engineer, ML Infrastructure will design and operate scalable GPU-backed infrastructure that supports TRM's AI systems, collaborating with various teams to ensure effective model deployment and optimization.

Responsibilities

  • Design and operate GPU cluster infrastructure
  • Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users
  • Optimize high-throughput inference
  • Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads
  • Enable distributed inference strategies
  • Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models
  • Implement model optimization and compilation workflows
  • Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost
  • Schedule heterogeneous workloads
  • Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand
  • Build observability into ML infrastructure
  • Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability
  • Partner across engineering teams
  • Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services

Skills

  • Bachelor's degree (or equivalent) in Computer Science or related field
  • 5+ years of experience building and operating distributed systems or infrastructure in production environments
  • Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP)
  • Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost
  • Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum
  • Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems
  • Familiarity with distributed inference strategies including model parallelism and tensor parallelism
  • Experience working with Kubernetes or equivalent orchestration systems in cloud environments
  • Adaptable. Goals can change fast. You anticipate and react quickly
  • Autonomous. You own what you work on. You move fast and get things done
  • Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing
  • Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization
  • Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus
  • CUDA familiarity and experience debugging GPU-related issues is a plus

Company Overview

  • TRM Labs is a software company that offers blockchain, transaction monitoring, and analytics to help financial institutions and governments. It was founded in 2018, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is https://trmlabs.com.
  • Company H1B Sponsorship

  • TRM Labs has a track record of offering H1B sponsorships, with 2 in 2026, 1 in 2025, 4 in 2024, 3 in 2023, 3 in 2022, 1 in 2021. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    On the same wavelength