Research Scientist, LLM Evaluation – Post-Training

Remote Full-time Live

Job Description:

Define and execute a rigorous research agenda focused on LLM evaluation and post-training, with emphasis on evaluation-driven model improvement
Design experiments to study how evaluation methodologies impact fine-tuning and post-training outcomes
Develop and validate comprehensive evaluation frameworks for LLM and multimodal systems
Lead research on frontier evaluation domains including long-context, cross-modal, and dynamic multi-turn evaluations
Analyze model behavior and failure patterns; generate actionable recommendations for model improvement
Partner with Language Data Scientists to integrate human-in-the-loop and synthetic data/evaluation strategies

Requirements:

MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, AI, or a related quantitative field (PhD strongly preferred)
5+ years of relevant experience in applied ML research or research science, with substantial work in LLMs or foundation models (graduate research counts)
Demonstrated experience with LLM evaluation, benchmarking, alignment, post-training, or model quality research
Strong foundation in experimental design, statistical analysis, and scientific reasoning for ML systems
Strong Python coding skills for research experimentation, data processing, evaluation pipelines, statistical analysis, and visualization
Hands-on experience with modern ML frameworks (PyTorch, Hugging Face, JAX/TensorFlow)

Benefits:

Apply tot his job Apply To this Job

On the same wavelength