Job Description

This is a contracting engagement - initially 6 months - with potential for long term engagement.

Location: Paris-based preferred; alternatively Europe remote for strong candidates

We are building and evaluating state-of-the-art large language models (LLMs) and are looking for experienced software engineers to join our evaluation and annotation team. This role sits at the intersection of real-world software engineering, model evaluation, and applied AI, and is critical to improving model reliability, reasoning, and code quality.

You will design challenging coding tasks, evaluate model outputs against rigorous benchmarks, identify failure modes, and contribute to reinforcement learning and model improvement workflows.

This is not a junior annotation role. We are looking for practitioners with deep hands-on coding experience who can think like both an engineer and an evaluator.

What You’ll Do

* Create high-quality coding prompts and reference answers (benchmark-style, e.g. SWE-Bench-like problems).

* Evaluate LLM outputs for code generation, refactoring, debugging, and implementation tasks.

* Identify and document model failures, edge cases, and reasoning gaps.

* Perform head-to-head evaluations between private LLMs (Mistral-based) and leading external models.

* Build or configure coding environments to support evaluation and reinforcement learning (RL).

* Follow detailed annotation and evaluation guidelines with high consistency.

What We’re Looking For

* 5+ years of professional software development experience.

* Strong Python skills (required).

* Knowledge of at least one additional programming language (bonus).

* 1+ year of coding annotation and/or LLM evaluation experience (part-time OK) for a major frontier AI lab or AI infrastructure company.

* Prior code reviewer experience is a plus.

* Proven ability to apply structured evaluation criteria and write clear technical feedback.

* Fluent in English (written and spoken).

* Team lead or mentoring experience is a strong plus.

Why This Role

* Work hands-on with cutting-edge LLMs.

* Apply real-world engineering judgment to model evaluation and improvement.

* High-impact, technical work with a focused, senior team.

Senior Coding Annotator / LLM Evaluation Engineer (Contract)