Back to jobs

AI Trace Generation Engineer

turbalance
Germany
Full-time
AI tools:
LLM frameworks
TensorRT-LLM
DeepSpeed
Megatron-LM

Turbalance is an innovative, emerging startup that transforms AI laws. We are a team of passionate problem-solvers who believe in what we’re building. We constantly push boundaries and embrace our inner nerds as we find new ways to tackle complex challenges. You will find a dynamic work environment here, with flat or even non-existent hierarchies and the chance to take on responsibility from day one.

Your mission

* Design and implement a trace collection system for distributed LLM workloads, capturing compute operations, communication primitives, memory usage, and cluster topology across multi-GPU and multi-node setups

* Validate that collected traces accurately reflect real workload behavior - verifying operation completeness, timing consistency, and data integrity across inference and training pipelines

* Integrate with and instrument major LLM frameworks (vLLM, TensorRT-LLM, DeepSpeed, Megatron-LM and others) to extract meaningful execution data without disrupting performance

* Use collected traces as input to discrete event simulations that model and replay distributed AI workload behavior at scale

* Analyze trace data to surface bottlenecks and inefficiencies across the stack, from individual kernel execution to cluster-wide communication patterns

Your profile

* 3+ years of experience in AI systems, ML infrastructure, or a closely related area

* Hands-on experience with at least one major LLM serving or training framework

* Strong proficiency in Python and C++, with a solid understanding of GPU architecture, memory bandwidth, and the difference between compute-bound and memory-bound operations

* Solid understanding of distributed communication

* Familiarity with parallelism strategies and how they shape execution behavior across large clusters

* Open source contributions or published research in relevant areas will definitely be appreciated!

* Previous startup experience is a plus - we move fast and value people who are comfortable with that

Why this role is exciting?

* Competitive pay & perks: because great work deserves great rewards

* Work on your terms: flexible hours and remote-friendly culture

* Fast lanes, no red tape: flat hierarchies and rapid decision-making mean your ideas go live

* Make it happen: your ideas aren’t just heard - they’re shipped

* Right place, right time: be part of our growth story and build a career-defining legacy

* Global by design: work with a diverse, international team across Germany and the US

* Work with the best: work alongside exceptional engineers and raise the bar together

Turbalance is committed to providing a respectful, safe, and inclusive workplace. Diversity at Turbalance means fostering an environment where individual differences are recognized, valued, and respected, enabling everyone to fully contribute their talents and strengths.

This commitment begins with our recruitment process. If you require any accommodations to support your application or interview, please let us know as we are happy to assist.

Applications go to the hiring team directly