Machine Learning Technical Lead | Toronto, CA (Hybrid)
TestingXpertsFull Description
We are seeking an experienced Machine Learning Technical Lead (10+ years) to design, build, and scale production grade ML solutions and the supporting MLOps ecosystem across the enterprise. This role combines hands on technical leadership with strong architectural judgment—turning ambiguous business needs into robust, measurable ML systems that can be deployed, monitored, and continuously improved.
Key Objectives
• Lead delivery of high impact ML use cases from discovery through production deployment
• Establish scalable, repeatable MLOps practices for training, deployment, monitoring, and retraining
• Improve model reliability, performance, and business outcomes through rigorous experimentation and evaluation
• Provide clear technical direction and pragmatic trade offs to stakeholders and senior leadership
Primary Responsibilities
• Own end to end technical execution of ML initiatives: problem framing, feature engineering, model selection, training, evaluation, deployment, and post production iteration
• Architect and implement ML pipelines (batch and real time) including data validation, feature pipelines, training workflows, and model serving patterns
• Establish MLOps foundations: model registry, experiment tracking, CI/CD for ML, automated testing, and environment reproducibility
• Define model evaluation standards and acceptance criteria (offline metrics, online KPIs, A/B testing) aligned with business goals
• Build model monitoring and observability: performance dashboards, drift detection, data quality checks, alerting, and retraining triggers
• Drive scalable deployment patterns using containerization and orchestration (Docker, Kubernetes) and cloud services (Azure/AWS/GCP), aligned to enterprise security requirements
• Partner with data engineering to ensure the data platform supports ML needs (quality, lineage, access controls, governance, feature availability)
• Mentor engineers/data scientists, enforce engineering discipline, and raise the bar on code quality, documentation, and design reviews
• Communicate trade offs and recommendations clearly—balancing accuracy, latency, cost, maintainability, and risk
Required Skills & Experience
• 10+ years of experience in delivering ML/AI solutions, with proven ownership of production deployments
• Strong foundation in ML concepts (supervised/unsupervised learning, evaluation, bias/variance, feature engineering) and hands on experience with common model families (tree based models, linear models, deep learning as needed)
• Proficiency in Python and ML libraries (scikit learn, PyTorch and/or TensorFlow) and strong SQL skills
• Hands on experience with MLOps tooling and patterns (e.g., MLflow, model registries, feature stores, workflow orchestration such as Airflow/Dagster)
• Experience deploying and operating ML services (REST/gRPC), including latency/throughput optimization and model versioning strategies
• Solid understanding of data engineering fundamentals (batch/streaming, data quality, schema evolution) and how they impact ML
• Experience with cloud platforms (Azure/AWS/GCP) and container platforms (Docker, Kubernetes)
• Strong stakeholder communication skills and ability to drive clarity in ambiguous problem spaces
Preferred / Nice to Have
• Experience with Databricks and/or Spark based ML pipelines
• Experience with feature stores and online/offline feature consistency patterns
• Experience with responsible AI practices (explainability, fairness testing, auditability) and regulated environments
• Familiarity with LLM/GenAI integration patterns (RAG, embeddings) where relevant to ML solutions
• Experience building reusable ML platform components to accelerate multiple teams