Company Description

Cobalt builds expert reasoning data infrastructure for AI. We work with credentialed domain experts, physicians, nurses, surgeons, payer Medical Directors, to capture how they actually reason through high-stakes decisions, and we turn those traces into training data, benchmarks, and evals for frontier AI labs and applied AI companies.

Role Description

This is a Research Engineer role focused on post-training and reasoning. Full-time or part-time; we're open to candidates currently pursuing a PhD or Master's. The responsibilities include conducting research in post-training optimization and reasoning techniques, developing innovative algorithms, and collaborating with cross-functional teams to apply findings to advanced AI systems. The role also involves analyzing complex datasets, enhancing AI models, and contributing to cutting-edge R&D projects aimed at optimizing AI performance and interpretability.

What we're looking for:

* Strong ML engineering fundamentals. Comfortable training and fine-tuning LLMs end-to-end (PyTorch, HF, vLLM, deepspeed/FSDP, or similar)

* Real exposure to post-training methods (SFT, preference optimization, RL fine-tuning), not just having read the papers

* A track record of shipping research or research-grade engineering: publications, strong open-source contributions, or production ML systems at a lab/frontier company

* Comfortable working with a part-time research lead. You can take a direction and run, surface tradeoffs early, and don't need someone in the room every day

* Excited by applied work in a domain with real-world consequences (you don't have to come from healthcare; you do have to care about it)

The work spans the full post-training stack as applied to expert reasoning:

* Designing and running SFT, DPO, and RL (GRPO/PPO and successors) experiments on reasoning traces from our expert network

* Building benchmarks and evals that meaningfully measure clinical and adjudication reasoning — not just final-answer accuracy, but the reasoning path

* Turning raw expert outputs into high-quality training datasets: schema design, quality controls, scaling pipelines

* Working directly with customers (frontier labs, healthcare AI companies) on bespoke data and eval engagements

* Publishing where it makes sense

What we offer

* Founding-team equity

* Competitive salary (band depends on level; let's talk)

* Flexible work environment (hybrid in-person SF/NY + WFH options)

Apply directly on LinkedIn.

Research Engineer, Post-training & Reasoning

Full Description