Back to jobs

Staff Machine Learning Engineer - Platform

kadence
United States
Full-time
AI tools:
TensorFlow
PyTorch
AWS SageMaker
Applications go directly to the hiring team

Full Description

Title: Senior / Staff MLOps / ML Platform Engineer

Location: USA Remote

Team: Data Science (reports into, Head of Data Science)

Role Overview

Seeking a Senior/Staff MLOps / ML Platform Engineer to build and scale the ML infrastructure that powers our client's real time decision models. This role sits in the Data Science org and focuses on platform, tooling, and operations rather than training models day to day.

You will own the systems that enable data scientists to move from notebook to production safely and repeatably: feature pipelines and stores, experiment and model tracking, and model monitoring. The foundation exists, but you will be expected to significantly elevate it and introduce industry best practices the team doesn’t yet have.

Responsibilities

* Design, build, and maintain core ML platform components used by the Data Science team.

* Implement and own feature pipelines and feature management (e.g., feature store or equivalent), including batch and/or streaming ingestion, transformation, and serving.

* Build or integrate experimentation and model tracking tools to manage datasets, configurations, model versions, and metrics.

* Implement robust model monitoring in production (performance, drift, data quality, alerting) and feed findings back into the modeling lifecycle.

* Partner with data scientists to understand their workflows and translate them into reliable services, libraries, and automation.

* Define and enforce best practices for ML operations: testing, deployment, observability, rollback, reproducibility.

* Evaluate and integrate third‑party or open‑source MLOps tools where they make sense; build bespoke components when needed.

* Identify and lead initiatives that materially improve reliability, scalability, and velocity of ML development and deployment.

Requirements

* Significant experience as an MLOps / ML Platform Engineer, Machine Learning Engineer, or Software Engineer building ML‑adjacent infrastructure.

* Demonstrated experience building ML platforms or major MLOps components from scratch or near‑scratch, not just maintaining existing systems.

* Strong programming skills in a production language (e.g., Python, Go, or similar), with solid software engineering fundamentals (testing, code review, CI/CD).

* Experience designing and operating feature pipelines and feature management solutions (custom or tools like Feast, Tecton, etc.).

* Hands‑on experience setting up model monitoring in production (e.g., tracking performance, drift, and data quality; alerting and remediation workflows).

* Experience operating services in a modern cloud environment (AWS/GCP/Azure), including containerization (Docker) and orchestration (Kubernetes or similar).

Preferred

* Background in a data‑first product company where ML is core to the business.

* Experience collaborating closely with data science teams; enough ML understanding to speak their language, while being primarily an infrastructure/ops engineer.

* Experience with model deployment systems (batch and/or online); this is a nice‑to‑have, not the main focus.

* Prior experience in a startup or “scrappy” environment where you owned ambiguous, greenfield platform problems.

What Success Looks Like

* Data scientists ship models faster with fewer production issues.

* Features are consistently defined, discoverable, and reused across models.

* Experiments and models are easily traceable and reproducible.

* Production models are actively monitored, with clear signals and alerts when performance degrades.

Applications go to the hiring team directly