Back to jobs

Agentic AI Developer- Toronto (Hybrid)

TestingXperts
Toronto, Ontario, Canada
Contract
AI tools:
LangChain
Applications go directly to the hiring team

Full Description

Role Overview

We are seeking an Agentic AI Developer to design, build, and operate agent-based AI solutions that combine large language models (LLMs) with tools, workflows, and enterprise data to deliver measurable business outcomes. This role is hands-on and engineering-driven: you will prototype quickly, productionize reliably, and continuously improve agent performance through evaluation, observability, and iteration. You will work primarily in Python and leverage an open-source AI stack while integrating with modern data platforms (including Databricks) and enterprise security standards.

Key Objectives

* Deliver production-grade agentic AI applications (assistants, copilots, autonomous workflows) from discovery through deployment.

* Establish repeatable engineering patterns for tool use, retrieval-augmented generation (RAG), memory, planning, and orchestration.

* Implement robust evaluation, monitoring, and safety guardrails to improve reliability, accuracy, and user trust.

* Integrate agents with enterprise systems and data platforms (APIs, databases, event streams, Databricks/Spark) while meeting performance, cost, and security requirements.

Primary Responsibilities

* Design and build agentic workflows that use LLMs plus tools (function calling), retrieval, and structured reasoning to accomplish business tasks end-to-end.

* Develop and maintain Python services and libraries for agent orchestration, tool routing, prompt/version management, and policy/guardrail enforcement.

* Build RAG pipelines: document ingestion, chunking, embedding generation, indexing, and retrieval; ensure relevance, freshness, and access control alignment.

* Integrate open-source AI frameworks and components (e.g., LangChain/LangGraph, LlamaIndex, vLLM, Transformers, FastAPI, Pydantic) into a coherent production architecture.

* Implement evaluation and testing for agentic systems: offline benchmarks, golden datasets, regression tests, LLM-as-judge patterns (where appropriate), and online metrics tied to product KPIs.

* Operationalize observability: trace agent/tool calls, track latency and cost, monitor quality signals, detect retrieval/model drift, and set up alerting and feedback loops.

* Build secure integrations with enterprise tools and data sources (REST/gRPC services, SQL databases, data warehouses/lakes, vector databases), including secrets management and auditing.

* Collaborate with data engineering and platform teams to leverage Databricks/Spark for large-scale data preparation, embedding jobs, batch/stream processing, and feature pipelines where relevant.

* Deploy and operate services using containers and CI/CD; ensure reproducibility, environment management, and reliable rollbacks across versions.

* Partner with product, UX, and stakeholders to translate ambiguous needs into agent behaviors, tool contracts, and measurable acceptance criteria.

* Document designs and contribute to engineering standards; mentor peers through code reviews, design reviews, and knowledge sharing.

Required Skills & Experience

* Strong software engineering experience building production services in Python (API design, testing, packaging, performance, and maintainability).

* Hands-on experience building LLM applications, including prompt engineering, tool/function calling, RAG/embeddings, and multi-step workflows.

* Comfort with open-source AI stack and ecosystem (e.g., Hugging Face Transformers, sentence-transformers, LangChain/LangGraph or LlamaIndex, vector databases such as FAISS/Chroma/Pinecone equivalent, MLflow or similar tracking).

* Strong SQL skills and understanding of data engineering fundamentals (batch vs. streaming, data quality, schema evolution, governance) and how they impact AI systems.

* Experience with evaluation approaches for LLM systems (quality metrics, test harnesses, human-in-the-loop review, and reliability techniques).

* Experience deploying and operating services (Docker, Kubernetes or equivalent), CI/CD, and observability/monitoring practices.

* Ability to communicate tradeoffs clearly—balancing quality, latency, cost, reliability, and risk.

Preferred / Nice to Have

* Awareness of Databricks platform concepts (workspaces, notebooks, jobs, clusters), and experience using Spark for large-scale ETL or embedding generation.

* Experience with Databricks MLflow Model Registry and/or Unity Catalog (or similar governance) for managing models, features, and data access.

* Experience serving open-source LLMs (e.g., vLLM, TGI, llama.cpp) and optimizing inference (quantization, batching, caching).

* Experience with advanced agent patterns: planning, reflection, memory, tool selection, multi-agent collaboration, and workflow graphs/state machines.

* Experience with security and responsible AI practices (PII handling, prompt injection defenses, access control, auditability, and safe tool execution).

* Experience building reusable platform components (SDKs, templates, reference architectures) to enable multiple teams.

Applications go to the hiring team directly