Back to jobs

Machine Learning Engineer

Voio
Berkeley, CA
Full-time
AI tools:
PyTorch

About Voio

At Voio, we’re redefining how radiologists work. Today, medical imaging is slowed by fragmented tools — one system to view scans, another to dictate, and another to search patient context. We’re building a unified system that connects it all: fast, intelligent, and deeply intuitive.

Our AI models originated from years of research at UC Berkeley and UCSF, but our mission goes far beyond the lab — we’re now building real-world systems that push the frontier of applied medical AI. Every line of code here helps doctors move faster, see clearer, and focus on care, not clicks.

The Role

We’re looking for an ML Ops Engineer to own the infrastructure and systems that move machine learning models from research into reliable, observable, production-grade clinical workflows.

This role sits at the intersection of deep learning systems, infrastructure, and production engineering. You will partner closely with research, backend, and product teams to ensure models are deployable, scalable, measurable, and correct in real-world environments.

This is a hands-on role with ownership across training pipelines, inference systems, monitoring, and iteration loops.

What You’ll Do

Production ML Systems

* Deploy, operate, and optimize GPU-based inference systems for low-latency, high-throughput workloads.

* Own model serving infrastructure, including batching, caching, and runtime optimization.

* Implement and maintain APIs for real-time model inference.

Training & Deployment Infrastructure

* Design and maintain CI/CD pipelines for model training, testing, validation, and rollout.

* Build reproducible experimentation frameworks for training, tuning, and deployment cycles.

* Manage distributed training and inference infrastructure, including GPU scheduling and scaling.

Performance, Monitoring & Reliability

* Profile and benchmark models in production, identifying bottlenecks in latency, memory, and throughput.

* Design observability systems to track model performance, drift, failures, and uptime.

* Use production signals to drive iteration decisions and system-level improvements.

Cross-Functional Execution

* Partner with research teams to transition models from research to production systems.

* Collaborate with product engineers and clinicians to meet real-world workflow constraints.

* Make clear, defensible tradeoffs between model quality, system cost, and operational reliability.

What We’re Looking For

Core Qualifications

* 4+ years of experience in ML Ops, infrastructure, or distributed systems.

* Strong hands-on experience deploying and operating GPU-based inference systems.

* Deep familiarity with PyTorch, including performance tuning and debugging.

* Proven ability to own systems end-to-end and operate independently in ambiguous environments.

Strong Signals

* Experience optimizing LLM or deep learning inference (batching, caching, memory efficiency).

* Comfort reasoning about distributed systems tradeoffs (compute, communication, scaling).

* Clear ownership of production systems—not just research exposure.

Nice to Have

* Familiarity with DICOM, HL7, or healthcare data standards.

* Experience working in regulated or safety-critical ML environments.

* Experience with Docker, Kubernetes, and cloud environments (AWS or GCP).

What We Value

We hire for clarity, ownership, and judgment.

The ideal engineer:

* Thinks in systems. Sees beyond individual tasks to how everything connects.

* Executes with precision. Moves quickly without sacrificing long-term quality.

* Owns outcomes. Takes responsibility across design, build, and delivery.

* Builds with purpose. Writes code that improves lives, not just benchmarks.

Why Join Us

You’ll work directly with leading engineers, clinicians, and researchers from UC Berkeley and UCSF — building products that didn’t exist before. If you want to shape how AI enters the clinic, and you care about craft as much as impact, this is your team.

Applications go to the hiring team directly