Back to jobs

Operations Team Lead (Production & Reliability)

Complexio
Amsterdam, North Holland, Netherlands
Full-time
AI tools:
LLMs
LangChain
LlamaIndex
OpenAI API
PyTorch
Hugging Face
Gemini

Complexio is Foundational AI works to automate business activities by ingesting whole company data – both structured and unstructured – and making sense of it. Using proprietary models and algorithms Complexio forms a deep understanding of how humans are interacting and using it. Automation can then replicate and improve these actions independently.

Complexio is a joint venture between Hafnia and Símbolo, in partnership with Marfin Management, C Transport Maritime, Trans Sea Transport and BW Epic Kosan.

Operations Team Lead (Production & Reliability)

We’re looking for an Operations Team Lead to own production.

Not just keep it running, but build a system that scales.

You’ll lead operational excellence across all live customer-facing systems. Your mission: make production reliable, observable, predictable, and continuously improving.

This is a hands-on role. You’ll shape process, lead incidents, build the team, and move us from reactive firefighting to proactive reliability engineering.

What You’ll Own

Production

* Stability and availability of all live systems

* Operational readiness for new releases

* Safe production access and change coordination

Production is a high-discipline environment. You make sure it stays that way.

Incident Management

You own the full lifecycle:

* High-signal alerting and fast detection

* Structured incident response

* Clear internal and customer communication

* Blameless postmortems

* Systemic fixes that prevent repeats

Goal: Fast recovery. Fewer recurring incidents.

On-Call

* Design sustainable rotations

* Clear escalation paths

* Defined severity levels

* Strong runbooks

* No burnout culture

Someone accountable is always reachable. Escalations are fast and predictable.

Monitoring & Reliability

* Define SLIs/SLOs for critical systems

* Improve visibility across availability, latency, errors, and saturation

* Track MTTR, incident frequency, and escalation trends

* Drive reliability roadmap initiatives

We measure reliability, and improve it continuously.

Team Leadership

* Lead and grow the Operations team

* Set clear standards and KPIs

* Build a culture of ownership and accountability

* Raise the bar on operational discipline

You’re responsible for both system performance and team performance.

Requirements

What We’re Looking For

* Strong experience in SRE, DevOps, Infrastructure, or Production Engineering

* Prior experience leading technical teams

* Deep hands-on incident management experience

* Strong observability and reliability mindset

* Calm under pressure, clear in communication

* Systems thinker, fixes root causes, not symptoms

How We Think

* Production is sacred

* Clear ownership beats ambiguity

* Blameless culture, high accountability

* Fix systems, not people

* Reliability is a product feature

Applications go to the hiring team directly