Senior Platform Engineer (AI Platform)
CDAIFull Description
As a Senior Platform Engineer, you will help design, build, and evolve Compass Digitals AI platform for LLM-powered applications and agentic systems. You will create secure, scalable, production-grade capabilities for orchestration, retrieval, tool integration, evaluation, observability, and governance. Working closely with platform, data, product, security, and engineering teams, you will enable AI copilots, operational automation, and intelligent customer experiences across our digital ecosystem.
Key Responsibilities
* Design and operate the core platform capabilities that power LLM applications, copilots, and agentic workflows across multiple environments.
* Architect single-agent and multi-agent execution patterns, including tool calling, workflow routing, state management, and human-in-the-loop checkpoints.
* Build and maintain a secure integration layer that connects models to internal APIs, data products, and enterprise systems using patterns such as Model Context Protocol (MCP), OpenAPI-defined tools, and event-driven services.
* Develop retrieval and knowledge capabilities that support grounded responses, including document ingestion, chunking, embeddings, vector search, metadata filtering, reranking, and source attribution.
* Establish evaluation frameworks and regression tests for response quality, task success, reliability, and safety; use offline and online evals to continuously improve production performance.
* Implement guardrails and governance controls for identity-aware access, PII handling, content safety, prompt and tool security, auditability, and compliance.
* Create end-to-end observability for prompts, tool invocations, agent traces, latency, failure analysis, and token or cost usage to support debugging and production operations.
* Automate platform provisioning and deployment using Terraform, containers, CI/CD, and cloud-native services.
* Optimize model selection, throughput, latency, resilience, and cost efficiency across AI workloads.
* Collaborate with data and ML teams to expose governed structured and unstructured data to AI applications in a safe, reusable way.
* Help define reusable standards, platform patterns, and engineering best practices for building reliable AI and agent-based systems at scale.
Qualifications
* Proven experience building or operating production AI/LLM platforms, developer platforms, or complex distributed systems.
* Strong hands-on experience with Python and API or service development; experience with TypeScript, Go, or Java is a plus.
* Experience designing agentic systems or advanced LLM applications that use tool calling, workflow orchestration, retrieval-augmented generation (RAG), and state management.
* Familiarity with modern agent frameworks and platforms such as OpenAI Agents SDK, Amazon Bedrock Agents, LangGraph, or similar tooling.
* Strong understanding of vector search, embeddings, knowledge base design, ranking/reranking, and grounded generation.
* Experience with AWS and modern platform infrastructure, including containers, serverless services, Kubernetes, networking, and IAM.
* Experience with Terraform or similar Infrastructure-as-Code tools and strong CI/CD automation practices.
* Understanding of evaluation, prompt testing, offline benchmarks, and release guardrails for AI systems.
* Hands-on experience with observability tooling for logs, metrics, tracing, and incident response.
* Strong grasp of security, privacy, and governance for AI systems, including secrets management, RBAC, data protection, and responsible AI controls.
* Ability to work cross-functionally with product, data, ML, and platform teams and translate emerging AI capabilities into reliable platform services.
* Bachelor's degree or equivalent in Computer Science, Engineering, or a related field.
Nice to Have
* Experience building internal developer platforms or self-service tooling for AI teams.
* Experience with real-time inference, streaming workflows, or event-driven architectures.
* Familiarity with data platform concepts such as dbt, Spark, Apache Iceberg, or data product design.
* Background in hospitality, retail, or large-scale enterprise environments.