Back to jobs

Research Engineer, Post-training & Reasoning

Cobalt
San Francisco Bay Area
Full-time
AI tools:
PyTorch
Hugging Face
Applications go directly to the hiring team

Full Description

Company Description

Cobalt builds expert reasoning data infrastructure for AI. We work with credentialed domain experts, physicians, nurses, surgeons, payer Medical Directors, to capture how they actually reason through high-stakes decisions, and we turn those traces into training data, benchmarks, and evals for frontier AI labs and applied AI companies.

Role Description

This is a Research Engineer role focused on post-training and reasoning. Full-time or part-time; we're open to candidates currently pursuing a PhD or Master's. The responsibilities include conducting research in post-training optimization and reasoning techniques, developing innovative algorithms, and collaborating with cross-functional teams to apply findings to advanced AI systems. The role also involves analyzing complex datasets, enhancing AI models, and contributing to cutting-edge R&D projects aimed at optimizing AI performance and interpretability.

What we're looking for:

* Strong ML engineering fundamentals. Comfortable training and fine-tuning LLMs end-to-end (PyTorch, HF, vLLM, deepspeed/FSDP, or similar)

* Real exposure to post-training methods (SFT, preference optimization, RL fine-tuning), not just having read the papers

* A track record of shipping research or research-grade engineering: publications, strong open-source contributions, or production ML systems at a lab/frontier company

* Comfortable working with a part-time research lead. You can take a direction and run, surface tradeoffs early, and don't need someone in the room every day

* Excited by applied work in a domain with real-world consequences (you don't have to come from healthcare; you do have to care about it)

The work spans the full post-training stack as applied to expert reasoning:

* Designing and running SFT, DPO, and RL (GRPO/PPO and successors) experiments on reasoning traces from our expert network

* Building benchmarks and evals that meaningfully measure clinical and adjudication reasoning — not just final-answer accuracy, but the reasoning path

* Turning raw expert outputs into high-quality training datasets: schema design, quality controls, scaling pipelines

* Working directly with customers (frontier labs, healthcare AI companies) on bespoke data and eval engagements

* Publishing where it makes sense

What we offer

* Founding-team equity

* Competitive salary (band depends on level; let's talk)

* Flexible work environment (hybrid in-person SF/NY + WFH options)

Apply directly on LinkedIn.

Applications go to the hiring team directly