Back to jobs

Machine Learning Infrastructure Engineer

Harnham
United States
Full-time
AI tools:
AWS SageMaker

Sr ML Ops Engineer

Location: United States (remote)

Compensation: Around 250,000 with a level of flexibility for the right person

We are seeking a Machine Learning Engineer to drive infrastructure scalability across state-of-the-art GenAI products. This is a role with an early age cutting edge organization looking for someone to take real ownership and responsibility for their ML Infrastructure & Scalability.

The role:

* Build and scale ML infrastructure capable of serving high-volume, low-latency model inference

* Optimize models and pipelines for performance, cost, and reliability in production environments

* Productionize research and experimental models into scalable, maintainable ML systems

* Architect infrastructure and deployment strategies that support continuous growth and evolving model complexity

* Drive infrastructure development to handle a variety of models

The core skills:

* Experience in ML platform infrastructure and deployment, including scaling training / inference, concurrency, queuing, back pressure, orchestration

* Design and operate high-performance model serving systems with proven ownership of system stability, not just in deployment

* Engineer solutions that efficiently manage parallel inference workloads at scale

* Tune end-to-end serving pipelines to maximize responsiveness and overall system capacity

* Python

* AWS native stack

* Docker, containers, SageMaker, Kubernetes

Applications go to the hiring team directly