Software/Production ML Engineer (Python, AWS)
SphereWe are looking for a Software/Production ML Engineer to own and evolve real-world, production-grade AI systems within a fast-paced insurance technology company.
This is a hands-on engineering role focused on building, deploying, and operating customer-facing and internal AI services in production. Our team owns multiple live systems, including real-time decisioning pipelines, AI-driven operational automations, chatbots, and the ML infrastructure that powers them.
This role is not focused on offline modeling or research-only machine learning. We are looking for engineers who take end-to-end ownership of ML systems - from data and features, to inference services, deployment, monitoring, and on-call support in production environments. Candidates whose experience is primarily limited to offline modeling, experimentation, or handoff-based deployment workflows will not be a good fit for this role.
Responsibilities
* Design and build APIs and pub/sub event streams to support real-time machine learning inference and automated agentic processes.
* Play a role in the development and maintenance of both online and offline feature stores for machine learning.
* Gain familiarity with the property casualty insurance sector, including key policyholder and product attributes, to help enhance model effectiveness.
* Implement industry-standard MLOps and LLMOps techniques to monitor ML models, feature sets, and agentic systems for performance degradation and data drift.
* Support the ongoing development of our core MLOps platform, as well as the codebase and infrastructure for serverless AI applications.
* Validate the performance of machine learning models through rigorous training and testing methodologies.
* Collaborate with Data Science teams to engineer new features, construct transformation pipelines, integrate custom loss functions, and experiment with novel inference strategies such as chaining and shadow deployments.
* Create and scale new agentic AI automations, guiding them from initial proof-of-concept through to full production deployment.
* Construct evaluation frameworks designed to rigorously test AI applications, covering not only standard workflows but also the complex, real-world scenarios common to the car insurance domain.
* Utilize the Python data ecosystem to execute machine learning projects and initiatives.
* Take part in the team's weekly on-call rotation, addressing alerts promptly to maintain high service availability for both customers and internal stakeholders.
Requirements
* Experience writing production-quality Python code.
* Experience with Python data science and machine learning libraries, including scikit-learn, pandas, numpy, and related tools.
* Experience deploying, operating, and supporting ML or AI services in production, including monitoring, incident response, and iterative improvement.
* Hands-on experience with AWS (e.g., Lambda, Step Functions, DynamoDB, IAM, containerized services).
* Experience with Kafka or other event-driven / pub-sub systems.
* Experience with Git and CI/CD pipelines in production environments.
Nice to have
* Experience building or operating MLOps platforms or ML infrastructure.
* Experience with real-time data pipelines and streaming architectures.
* Experience with AI chatbots, LLM-based systems, or retrieval-augmented generation (RAG).
* Familiarity with feature stores, model monitoring, and deployment strategies such as A/B or shadow deployments.