Technical Research Engineer (AI Safety & Evaluation)
AtellaFull Description
Most public safety benchmarks for frontier models test single-turn refusals. We focus on pushing frontier models in multi-turn, high-pressure conversations. A model that cleanly refuses to write an insecure script on Turn 1 will often fail by Turn 8 under sustained pressure from a "frustrated senior developer" persona. We call this alignment drift, and as the industry shifts from stateless chatbots to long-horizon autonomous agents, it's one of the most consequential open problems in the field.
What we're building
Atella builds the empirical infrastructure to test AI character and stability under pressure. The company was co-founded by Dr. Roy Perlis (Chair of Psychiatry at Harvard/MGH, Editor of JAMA AI) alongside a team of ML researchers. We build multi-turn, persona-driven adversarial simulation harnesses.
Rather than just prompting models for bad output, we use clinical behavioral science to construct adversarial agents that apply specific psychological pressure over 20+ turns. We then mathematically map the point where a model's safety guardrails collapse, tracking signals like response-length decay, persona sensitivity, and failure-cascade rates.
We run the industry's leading dynamic leaderboards for AI Safety and Code Security, and our data is actively used by safety teams at the frontier labs.
The role
We're hiring a Technical Research Engineer to help scale STELLA, our multi-turn evaluation engine. The work sits at the intersection of ML research, automated red teaming, and serious software engineering.
What you'll do:
* Scale the harness. Build and optimize the infrastructure that runs LLM-driven adversarial personas against frontier models for thousands of turns concurrently.
* Design adaptive attacks. Implement novel automated red-teaming strategies from recent literature — tree-based search, multi-agent debate, dynamic prompt generation — to surface failure modes more efficiently.
* Extract signal from noise. Build analysis pipelines over thousands of raw transcripts: failure-cascade probabilities, behavioral-drift metrics, persona sensitivity scores.
* Publish and open-source. Co-author methodology papers (in the lineage of our recent medRxiv preprints) and ship open-source tooling for the broader AI safety community.
Who you are:
* A strong software engineer. You write clean, scalable Python and are fluent with LLM APIs, async programming, and data pipelines.
* You can read a paper on Constitutional AI or persona modeling, extract the core math or architecture, and have a working implementation shortly after.
* You're intellectually aggressive about breaking things. You care deeply about AI safety but prefer empirical, transcript-level evidence over abstract alignment debates.
* Bonus: experience with RLHF, automated red teaming, or evaluation of long-horizon agentic workflows.
Why join
* You get an unusually direct look at the failure modes of the world's most advanced AI systems.
* You'll work alongside top clinical scientists from Harvard/MGH and collaborate closely with the safety and red teams at the frontier labs.
Compensation: $250,000–$300,000 base + 0.5%–1% equity