Back to jobs

Senior MLOps Engineer

DeepRec.ai
San Mateo County, CA
Full-time
20,000,000 – 25,000,000 / year
AI tools:
MLflow

Senior MLOps / ML Infrastructure Engineer

About the Company

Our client is a Series B, venture-backed deep-tech company building a Physics AI platform that helps engineering teams bring products to market faster, reduce development risk, and explore better designs with greater confidence. The platform combines large-scale simulation data with modern machine learning to generate high-fidelity predictions of physical behavior in near real time. Customers include leading organizations across aerospace, automotive, and advanced manufacturing, working on some of the most demanding real-world engineering problems.

The Role

This role focuses on building and operating the infrastructure that powers physics-based AI systems at scale. The position enables ML engineers and scientists to train, track, deploy, and monitor models reliably without managing low-level infrastructure. The work sits at the intersection of ML systems, cloud infrastructure, and large-scale simulation data, with a strong emphasis on performance, reliability, and developer productivity. It is a hands-on engineering role in a fast-moving, in-office environment, working closely with ML researchers, platform engineers, and product teams.

What You’ll Do

* Design, build, and maintain robust MLOps infrastructure supporting the full ML lifecycle, from experimentation and training through to production deployment and monitoring

* Implement automated training pipelines, experiment tracking, and model lifecycle management using tools such as Kubeflow, MLflow, and Argo Workflows

* Develop scalable data pipelines capable of handling large volumes of unstructured data, particularly 3D geometric data and physics simulation outputs

* Deploy machine learning models into production inference systems with strong standards for performance, reliability, and observability

* Manage model registries and integrate them with CI/CD workflows to support consistent and reliable model releases

* Implement monitoring systems that continuously track model health and performance in production

* Collaborate closely with ML researchers, platform engineers, and product teams to evolve the infrastructure platform for physics-based AI applications

* Write production-grade code and optimize cloud infrastructure, primarily on Google Cloud Platform, while making thoughtful trade-offs around scalability, cost, and operational simplicity using Docker and Kubernetes

What We’re Looking For

* Bachelor’s degree or higher in Computer Science, Data Science, Applied Mathematics, or a closely related field

* 5+ years of industry experience building MLOps platforms or ML systems in production environments

* Strong proficiency in Python, with working knowledge of BASH and SQL

* Hands-on experience with cloud infrastructure such as GCP, AWS, or Azure

* Experience with containerization and orchestration tools including Docker and Kubernetes

* Familiarity with modern MLOps frameworks such as Kubeflow, MLflow, and Argo Workflows

* Experience building and maintaining scalable data pipelines, ideally working with unstructured or high-dimensional data

* Ability to independently deploy models and implement monitored inference systems in production

* Comfortable troubleshooting complex distributed systems and building reliable infrastructure that other teams depend on

Nice to Have

* Interest in physics simulation, scientific computing, or HPC environments

* Experience building production MLOps platforms in deep-tech or simulation-heavy environments

* Familiarity with additional programming languages such as Go or C++

Working Style and Culture

This role suits someone who enjoys startup environments, learns quickly, and communicates clearly across disciplines. The team works on-site five days a week and values close collaboration, fast feedback loops, and hands-on problem solving. There is a strong belief that great infrastructure should be largely invisible, enabling engineers and scientists to move faster without friction.

Applications go to the hiring team directly