[Remote] Machine Learning Infrastructure Engineer

Remote Full-time Live

Note: The job is a remote job and is open to candidates in USA. TRM Labs is a company dedicated to building a safer world through AI-powered intelligence solutions. The Senior Software Engineer, ML Infrastructure will design and operate scalable GPU-backed infrastructure that supports TRM's AI systems, collaborating with various teams to ensure effective model deployment and optimization.

Responsibilities

Design and operate GPU cluster infrastructure
Build and manage GPU-backed environments in cloud settings, including orchestration, autoscaling, resource isolation, and workload management across multiple concurrent models and users
Optimize high-throughput inference
Implement and tune serving systems that maximize token throughput, batching efficiency, GPU occupancy, and cost effectiveness across interactive and batch workloads
Enable distributed inference strategies
Support and operationalize model parallelism, tensor parallelism, and other distributed serving patterns for large-scale models
Implement model optimization and compilation workflows
Integrate and optimize acceleration stacks such as TensorRT, ONNX Runtime, vLLM, FlashAttention, and related tooling to improve performance and reduce inference cost
Schedule heterogeneous workloads
Design systems that manage multiple models, multiple users, and mixed workload types across heterogeneous accelerators (e.g., NVIDIA GPUs, Inferentia), ensuring predictable performance under varying demand
Build observability into ML infrastructure
Instrument systems to measure GPU load, memory utilization, batching efficiency, queue depth, and token throughput, and use data to continuously improve performance and reliability
Partner across engineering teams
Work closely with infrastructure, ML, and product teams to ensure models transition smoothly from experimentation to production-grade, highly available services

Skills

Bachelor's degree (or equivalent) in Computer Science or related field
5+ years of experience building and operating distributed systems or infrastructure in production environments
Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP)
Deep understanding of high-throughput inference systems, including batching strategies, token throughput optimization, and the trade-offs between latency, throughput, and cost
Experience with one or more ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum
Experience optimizing GPU load, memory efficiency, and performance bottlenecks in production systems
Familiarity with distributed inference strategies including model parallelism and tensor parallelism
Experience working with Kubernetes or equivalent orchestration systems in cloud environments
Adaptable. Goals can change fast. You anticipate and react quickly
Autonomous. You own what you work on. You move fast and get things done
Excellent communication. You communicate complex ideas effectively to both technical and non-technical audiences, verbally and in writing
Collaborative. You work effectively in a cross-functional team and with people at all levels in an organization
Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus
CUDA familiarity and experience debugging GPU-related issues is a plus

Company Overview

TRM Labs is a software company that offers blockchain, transaction monitoring, and analytics to help financial institutions and governments. It was founded in 2018, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is https://trmlabs.com.

Company H1B Sponsorship

TRM Labs has a track record of offering H1B sponsorships, with 2 in 2026, 1 in 2025, 4 in 2024, 3 in 2023, 3 in 2022, 1 in 2021. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Machine Learning Infrastructure Engineer

On the same wavelength

[Remote] NetSuite Administrator - Remote

[Remote] VP - IT and Digital Operations

[Remote] Senior Enterprise Program Manager

[Remote] Analytics Developer 3

[Remote] Head of Legal

[Remote] Clinical Administrative Assistant

[Remote] Data Center Development - Project Manager

[Remote] Sales Enablement PM

[Remote] Senior Product Manager - Clinical Quality

[Remote] Chief Marketing Officer

Looking for Online English Tutor ? Flexible Hours in Miami, FL

Salesforce Quality Analyst

[Remote] Channel Account Executive - South-Central

Remote Customer Service Amazon Jobs Hiring For Teens - Part-Time

Experienced Remote Customer Service Representative – Delivering Exceptional Support and Solutions from Anywhere

Entry-Level Remote Data Entry Specialist – Flexible Hours, Comprehensive Training & Career Advancement at arenaflex

Remote Data Entry Specialist – Accurate Data Management & Integrity – Work‑From‑Home Role at arenaflex

Trade Services Specialist, US Trade Operations - Full-time

Vice President, Global Strategic Alliance - Google

Senior Director, Chief of Staff (US, Remote)