Machine Learning Infrastructure Engineer
HarnhamSr ML Ops Engineer
Location: United States (remote)
Compensation: Around 250,000 with a level of flexibility for the right person
We are seeking a Machine Learning Engineer to drive infrastructure scalability across state-of-the-art GenAI products. This is a role with an early age cutting edge organization looking for someone to take real ownership and responsibility for their ML Infrastructure & Scalability.
The role:
* Build and scale ML infrastructure capable of serving high-volume, low-latency model inference
* Optimize models and pipelines for performance, cost, and reliability in production environments
* Productionize research and experimental models into scalable, maintainable ML systems
* Architect infrastructure and deployment strategies that support continuous growth and evolving model complexity
* Drive infrastructure development to handle a variety of models
The core skills:
* Experience in ML platform infrastructure and deployment, including scaling training / inference, concurrency, queuing, back pressure, orchestration
* Design and operate high-performance model serving systems with proven ownership of system stability, not just in deployment
* Engineer solutions that efficiently manage parallel inference workloads at scale
* Tune end-to-end serving pipelines to maximize responsiveness and overall system capacity
* Python
* AWS native stack
* Docker, containers, SageMaker, Kubernetes