Founding Machine Learning Engineer
VelvetFull Description
About Us
Velvet is a data research company building the datasets that power the next generation of multimodal AI. Founded by Lucas Mantovani (ex Meta FAIR) and Lucas Tucker (ex Adobe Infrastructure), our mission is to make AI more human by producing high-quality audiovisual training data for frontier labs.
We're hiring a Founding Machine Learning Engineer to build the pipelines that turn raw footage into clean, structured training data. This is a hands-on, execution-heavy role at the intersection of ML engineering and research. You'll own the full lifecycle — from writing and testing processing scripts to deploying them at scale across thousands of hours of video.
What You'll Do
* Build and enhance post-processing pipelines that clean, validate, and package large volumes of video and audio data for multimodal model training. These pipelines must handle wide variation in speech, visual quality, and format — making robustness a huge engineering challenge.
* Deploy and fine-tune open-source models for speech recognition, speaker diarization, video segmentation, and related tasks.
* Design infrastructure for large-scale distributed processing — parallelizing thousands of compute jobs across cloud platforms and optimizing for throughput and cost.
What We're Looking For
* Strong experience in ML infrastructure, speech/audio processing, or large-scale data pipelines.
* Proficiency in PyTorch. Familiarity with distributed job orchestration.
* Claude Code pilled.
* A bias toward shipping. You default to building, not theorizing.
* Ability to work effectively in an early-stage environment where scope is broad and priorities shift fast.
Even Better
* Prior work at a data company or frontier AI lab.
* Track record building pipelines that process tens of thousands of hours of audio or video.
* Experience with infrastructure cost optimization or model fine-tuning for production use.
You'll Thrive Here If
* You're energized by operational work with immediate, visible impact.
* You treat broken processes as engineering problems worth solving properly.
* You hold yourself to a high bar for data quality — because you understand it directly determines model performance.