Robotics Data Engine Lead
UbundiFull Description
About Ubundi
Ubundi is a South-African venture studio shaping human-centred AI. Our name reflects our core belief in Ubuntu principles. Our tech prioritises the collective while empowering the individual.
We’re building a Physical AI data engine: the ground-truth infrastructure layer that turns real-world, multimodal capture into clean, replayable, train-ready datasets. Our first vertical is likely to be healthcare, but the architecture needs to generalise across domains and capture channels.
Our Culture
We live the Ubuntu idea: “I am because we are.” That means high trust, radical candor, and respect for life outside the terminal—build great things without becoming the product. We prize craft, curiosity, and constructive provocation; we’d rather experiment and learn in daylight than polish in secrecy. Bring your whole self, pull up a chair, and let’s make technology that makes AI more human.
About the role
We’re hiring a Robotics Data Engine Lead — a robotics-experienced generalist who will lead the technical build of our data engine. You don’t need a PhD or a publication list. You do need enough robotics intuition to know what “good data” looks like for real robot learning and real robot debugging, and enough systems engineering strength to build the pipeline that produces it repeatably.
You will work closely with product, data/platform engineering, capture operations, and domain experts. You’ll set the standards for what we capture, how we validate it, and how we package it into usable dataset artefacts.
This role is not just about data ingestion. It spans capture, processing, verification, and delivery — from raw multimodal sessions through to benchmark-ready, train-ready outputs.
This is a high-agency role. You’ll be expected to make decisions, build prototypes, harden them, and teach the team how to run them.
What you’ll do
You will own the capture → ground-truth-ready part of the stack:
* Design the data engine architecture for multimodal Physical AI capture: what we log, how we timestamp it, how we store it, and how we make it replayable.
* Build and ship the first production-grade capture pipeline for our studio and on-site operations, including sensor/robot stream ingest, session manifests/metadata, and reliable upload/storage flows — with an architecture that can extend across field capture, teleoperation, institutional partners, and enterprise workflows.
* Define data contracts and schemas: episode structure, metadata standards, versioning conventions, and acceptance criteria.
* Implement quality gates that automatically flag bad sessions early: timing drift, dropped frames, missing modalities, calibration failures, protocol violations, and completeness issues.
* Make sessions replayable: build the workflows and tooling so engineers can reproduce issues from logs and compare runs over time.
* Partner with operations to turn standards into SOPs: simple checklists, calibration flows, operator-ready playbooks, and capture protocols that can be run consistently by non-engineers.
* Create a dataset release discipline: internal release logs, provenance, and a reliable mechanism for customers/partners to consume dataset versions.
* Help define train-ready outputs such as RLDS-compatible datasets, benchmark suites, failure libraries, and task ontologies.
* Work with domain experts where needed to support review, adjudication, and higher-confidence verification of captured sessions.
You might be a fit if you
* Have 2–6+ years engineering experience with robotics-adjacent systems: robotics software, autonomy stacks, sensor systems, teleoperation, robot testing, or robotics data pipelines. Exceptional candidates with less can still be considered if they show unusual slope, systems thinking, and hands-on evidence of building.
* Can work comfortably in Python and are capable in at least one systems language, often C++, and are fluent in Linux dev workflows.
* Have experience with ROS / ROS 2 logging and replay workflows, or equivalent robotics logging systems
* Have experience with multimodal sensor data: video, depth, IMU, force/torque, tactile, proprioceptive, or telemetry streams
* Have experience with dataset packaging or evaluation workflows for robotics or embodied-AI systems
* Think in systems and failure modes, not just feature checklists.
* Enjoy turning messy reality into clean interfaces: contracts, schemas, SOPs, and tools people actually use.
* Communicate clearly and enjoy leading cross-functional execution without heavy hierarchy.
Bonus points
* Experience with MCAP / Protobuf-style capture and schema contracts, or similar artefact/container approaches.
* Experience with RLDS, TFDS, or similar embodied-AI / robotics dataset formats.
* Experience building QA pipelines for large-scale sensor or time-series data.
* Experience with pose estimation, depth alignment, 3D reconstruction, or action segmentation workflows.
* Experience collaborating with data ops, annotation workflows, or ground-truth generation.