Lead AI Infrastructure Engineer (Python/ML )
SphereType: Long-term contract
Location: Remote (overlap with PST)
At Sphere, we partner with global logistics company leveraging AI, Machine Learning, and Data Engineering to optimize warehouse operations, predictive maintenance, and route planning.
Role: Build and maintain scalable AI infrastructure, enabling teams to run ML experiments, deploy machine learning models, and implement MLOps pipelines for production-grade AI.
Responsibilities
* Design distributed training pipelines for large-scale machine learning and deep learning models.
* Optimize compute and storage resources for cloud-based AI/ML workloads on AWS, GCP, or Azure.
* Collaborate with data scientists and ML engineers to deploy models in production efficiently.
* Implement monitoring, logging, and alerting for model performance and AI workflows.
* Ensure scalable, maintainable, and reliable AI infrastructure to support real-time and batch ML applications.
Requirements
* 5+ years in Python and ML infrastructure.
* Experience in cloud AI platforms (AWS Sagemaker, GCP AI Platform, Azure ML).
* Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD for ML.
* Experience with distributed systems, data pipelines, and high-performance computing for AI.
* Hands-on with deep learning frameworks like TensorFlow or PyTorch.