Lead LLM Engineer
LeonarFull Description
Licorne Society a été missionné par une startup IA en pleine croissance pour les aider à trouver leur Lead LLM Engineer.
What You Will Own
You will be responsible for one thing:
Make our AI outputs reliable, fast, and indispensable in real workflows.
Concretely
* Design and evolve our LLM / agent architecture
* Own output quality across key use cases (emails, document analysis, etc.)
* Build evaluation systems (datasets, metrics, regression detection)
* Drive fast iteration loops from production data
* Improve retrieval, reasoning, and tool usage
* Ensure production reliability (latency, failure modes, fallback)
* Work directly with product + founders on what to build and why
What This Role Is Really About
Most teams fail because:
* they don’t know what “good output” means
* they don’t have evals
* they iterate randomly
* they overuse agents
Your job is to fix that.
You Will Turn
* vague user problems
* → into structured AI systems
* → with measurable performance
* → that improve every week
What You Need To Be Excellent At
* Shipping real LLM systems
* You’ve built systems used in production (not demos)
* You understand RAG, tools, agents, structured outputs
* You can design full pipelines, not just prompts
* Evaluation-driven development
* You know how to define quality metrics
* You build datasets from real usage
* You run continuous evals to prevent regressions
* Debugging complex failures
* You can trace issues across:
* [ul data=1]
* retrieval
* prompts
* model behavior
* You don’t guess — you isolate and fix
* Speed of iteration
* You move from problem → improvement in hours or days, not weeks
* You use logs, traces, and data — not intuition alone
* Strong judgment
* You know when to:
* [ul data=1]
* use an agent vs a pipeline
* add complexity vs simplify
* You optimize for reliability and user value, not novelty
What We Don’t Care About
* Number of years of experience
* Whether you’ve used a specific framework
* Fancy research credentials
If you can build, debug, and improve real systems, you’re a fit.
What Success Looks Like (first 90 Days)
* Clear eval framework for core use cases
* Measurable improvement in output quality
* Faster iteration cycles across the team
* Reduced hallucinations / failures
* Stronger system architecture decisions
Stack (context, Not Requirements)
* Python (FastAPI)
* Postgres
* Google Cloud
* LangGraph / LangChain (evolving)
* PostHog (product analytics)
* Langfuse (LLM traces)
* LLM APIs (Azure OpenAI)