Back to jobs

MLOps Architect

Quantiphi
United States
Full-time
11,000,000 – 15,000,000 / year
AI tools:
Vertex AI
Google Kubernetes Engine (GKE)
Applications go directly to the hiring team

Full Description

About Quantiphi:

Quantiphi is an award-winning, AI-First global digital engineering company that helps the world’s leading Fortune 1000 organizations transform bold ideas into measurable business impact. We go beyond building innovative AI technologies—we solve the problems that matter most to our clients.

Since our founding in 2013, Quantiphi has built a proven track record of turning complex challenges into meaningful outcomes across industries.

Headquartered in Boston, with more than 4,000 professionals worldwide, we partner with global enterprises to deliver large-scale digital, cloud, and AI-driven transformation. #SolvingWhatMatters

We are an Elite and Premier partner to Google Cloud, AWS, NVIDIA, Snowflake, and other leading technology platforms, and our work has been recognized across the industry, including:

* 21 Google Cloud Partner of the Year awards in the past 10 years

* 3 AWS AI/ML Partner of the Year awards

* 3 NVIDIA Partner of the Year awards

* 3 Snowflake Partner of the Year awards

* Rated Leaders by Gartner, Forrester, IDC, ISG, Everest Group and other leading analyst firms

Quantiphi delivers First-in-class AI solutions across Life Sciences, Healthcare, Banking, Financial Services, CPG, Manufacturing, Energy, High-Tech, Telecommunications, etc., powered by cutting-edge Generative AI and Agentic AI accelerators.

We are also proud to be certified as a Great Place to Work—reflecting our commitment to our people and our culture.

For more details, visit: Website or LinkedIn Page

Job Description:

The Partner Consultant for AI Infrastructure & MLOps is a specialized technical expert focused on designing and building the scalable, automated, and resilient platforms required for large-scale machine learning. This role supports customers moving beyond experimentation to productionalize AI/ML, focusing on the underlying infrastructure for distributed training and low-latency inference. This consultant provides the bridge between Data Science teams and Cloud Platform teams, leveraging Google Kubernetes Engine (GKE), Vertex AI, and specialized hardware (GPUs and TPUs) to create robust MLOps "factories." This role requires a "platform engineering" mindset where all infrastructure is provisioned as code.

Key Responsibilities

* Platform Architecture: Design the foundational infrastructure for AI workloads, including secure and scalable Google Kubernetes Engine (GKE) clusters, network configurations, and IAM policies.

* Infrastructure Automation (IaC): Lead the automation of all AI infrastructure provisioning using Terraform to ensure repeatable, scalable, and secure environments.

* MLOps Pipeline Design: Architect end-to-end MLOps automation using Vertex AI Pipelines (or Kubeflow Pipelines) to cover the full lifecycle: data ingestion, validation, model training, registration, and automated deployment.

* Training & Inference Optimization: Design solutions for large-scale distributed training and scalable, low-latency serving (e.g., Vertex AI Endpoints, GKE autoscaling).

* Production Monitoring & Governance: Implement robust monitoring for model performance, data drift, and system health. Ensure all solutions adhere to security and governance standards.

* Hardware Advisory: Advise customers on the optimal hardware selection (cost vs. performance), including the provisioning and utilization of Google Cloud GPUs (A2, G2) and TPUs (v4, v5e).

* Technical Advisory & Collaboration: Act as the subject matter expert for customers and internal teams, providing guidance and hands-on support to streamline the entire ML lifecycle.

Required Credentials & Skills (Mandatory)

Google Cloud Certifications:

Google Cloud Certified - Professional Cloud Architect

Google Cloud Certified - Professional Machine Learning Engineer

HashiCorp Certification:

HashiCorp Certified: Terraform Associate

Cloud & AI Skills:

* Deep expertise with Google Kubernetes Engine (GKE), including cluster design, node pools, and security (Workload Identity).

* Hands-on, production-level experience with Terraform for automating GCP infrastructure.

* Demonstrable expertise across the Vertex AI Platform (Training, Pipelines, Endpoints).

* Strong Python programming and scripting skills.

* Concepts: Strong understanding of the complete MLOps lifecycle, CI/CD principles, and container-based workflows (Docker).

* Consulting Skills: 3-5+ years in a customer-facing technical role (DevOps, MLOps, or Cloud Engineering).

Preferred Credentials & Skills (Nice-to-Have)

Google Cloud Certification:

Google Cloud Certified - Professional Cloud DevOps Engineer

Industry Certifications (CNCF):

Certified Kubernetes Administrator (CKA)

HashiCorp Certification:

HashiCorp Certified: Terraform Authoring and Operations Professional

Technical Skills:

* Direct hands-on experience provisioning and managing Cloud TPUs.

* Deep expertise with Google Kubernetes Engine (GKE), including cluster design, node pools, and security (Workload Identity).

* Data Engineering integration experience (BigQuery, Dataflow, Pub/Sub).

* Familiarity with monitoring tools like Prometheus and Grafana.

What’s in it for YOU at Quantiphi?

* Join one of the world’s fastest-growing AI-first digital engineering companies and make a real impact at scale.

* Lead and collaborate with a high-energy team of talented, driven individuals solving complex, meaningful challenges.

* Work with Fortune 500 companies and disruptive innovators in a research-driven environment with 60+ patents.

* Stay ahead of the curve by gaining hands-on experience with cutting-edge AI, ML, data, and cloud technologies while continuously upskilling.

Applications go to the hiring team directly