Back to jobs

Lead LLM Engineer

Leonar
Paris, Île-de-France, France
Full-time
AI tools:
LangChain
Azure OpenAI
Applications go directly to the hiring team

Full Description

Licorne Society a été missionné par une startup IA en pleine croissance pour les aider à trouver leur Lead LLM Engineer.

What You Will Own

You will be responsible for one thing:

Make our AI outputs reliable, fast, and indispensable in real workflows.

Concretely

* Design and evolve our LLM / agent architecture

* Own output quality across key use cases (emails, document analysis, etc.)

* Build evaluation systems (datasets, metrics, regression detection)

* Drive fast iteration loops from production data

* Improve retrieval, reasoning, and tool usage

* Ensure production reliability (latency, failure modes, fallback)

* Work directly with product + founders on what to build and why

What This Role Is Really About

Most teams fail because:

* they don’t know what “good output” means

* they don’t have evals

* they iterate randomly

* they overuse agents

Your job is to fix that.

You Will Turn

* vague user problems

* → into structured AI systems

* → with measurable performance

* → that improve every week

What You Need To Be Excellent At

* Shipping real LLM systems

* You’ve built systems used in production (not demos)

* You understand RAG, tools, agents, structured outputs

* You can design full pipelines, not just prompts

* Evaluation-driven development

* You know how to define quality metrics

* You build datasets from real usage

* You run continuous evals to prevent regressions

* Debugging complex failures

* You can trace issues across:

* [ul data=1]

* retrieval

* prompts

* model behavior

* You don’t guess — you isolate and fix

* Speed of iteration

* You move from problem → improvement in hours or days, not weeks

* You use logs, traces, and data — not intuition alone

* Strong judgment

* You know when to:

* [ul data=1]

* use an agent vs a pipeline

* add complexity vs simplify

* You optimize for reliability and user value, not novelty

What We Don’t Care About

* Number of years of experience

* Whether you’ve used a specific framework

* Fancy research credentials

If you can build, debug, and improve real systems, you’re a fit.

What Success Looks Like (first 90 Days)

* Clear eval framework for core use cases

* Measurable improvement in output quality

* Faster iteration cycles across the team

* Reduced hallucinations / failures

* Stronger system architecture decisions

Stack (context, Not Requirements)

* Python (FastAPI)

* Postgres

* Google Cloud

* LangGraph / LangChain (evolving)

* PostHog (product analytics)

* Langfuse (LLM traces)

* LLM APIs (Azure OpenAI)

Applications go to the hiring team directly