Senior Cloud Engineer – ML/AI Platform
KTek ResourcingFull Description
Job Title :- Senior Cloud Engineer – ML/AI Platform
Location :- Toronto, ON
About the Role
We are seeking a Senior Cloud Engineer with deep expertise in AWS and Azure AI/ML services to drive our enterprise ML/AI platform capabilities. You will evaluate and enable cloud AI/ML services, build reusable architectural patterns, and develop automated MLOps solutions in a highly regulated banking environment. This role requires hands-on experience with modern AI/ML platforms and the ability to design secure, compliant solutions that accelerate AI adoption across the organization.
What You Will Do
* Evaluate and enable AWS and Azure AI/ML services (SageMaker, Bedrock, Azure OpenAI, Azure AI Foundry) through proof-of-concepts and comprehensive assessments
* Design and implement reusable architectural patterns for secure AI/ML integrations including private endpoints, customer-managed keys, and service-to-service authentication
* Build end-to-end MLOps platforms and automated ML pipelines for model training, evaluation, deployment, and monitoring
* Produce technical reports on security, networking, compliance, guardrails, and cost analysis for AI/ML service enablement
* Develop frameworks, infrastructure-as-code, and automation to accelerate AI/ML adoption
* Implement observability solutions with model monitoring, metrics, and drift detection
* Partner with Enterprise Architecture and senior stakeholders to align platform capabilities with strategic roadmaps
* Provide technical leadership and mentorship on AI/ML cloud best practices
What You Need to Succeed
Must Have
* 5–7 years of cloud engineering experience with 3+ years focused on AI/ML platforms
* Deep hands-on expertise with AWS AI/ML services: SageMaker (training, pipelines, inference, JumpStart), Bedrock
* Deep hands-on expertise with Azure AI/ML services: Azure Machine Learning, Azure OpenAI, Azure AI Foundry
* Experience building MLOps platforms and automated ML pipelines
* Strong knowledge of LLMOps, LLM lifecycle management, agentic AI, RAG (retrieval-augmented generation), and prompt engineering
* Experience implementing guardrails and governance for LLM services
* Proficiency in Python and infrastructure-as-code (Terraform, CloudFormation, ARM/Bicep)
* Experience with MLflow (or similar tool), experiment tracking, and model registries
* Expertise in cloud security patterns including private endpoints, customer-managed keys, and network isolation for AI/ML services
* Strong understanding of cloud networking architecture in regulated environments
* Experience working in highly regulated industries with compliance requirements
* Agile delivery experience.
Nice to Have
* AWS or Azure AI/ML certifications
* Experience with vector databases and embedding models
* Knowledge of model optimization and inference acceleration
* Background in financial services or banking