Agentic AI Developer- Toronto (Hybrid)
TestingXpertsFull Description
Role Overview
We are seeking an Agentic AI Developer to design, build, and operate agent-based AI solutions that combine large language models (LLMs) with tools, workflows, and enterprise data to deliver measurable business outcomes. This role is hands-on and engineering-driven: you will prototype quickly, productionize reliably, and continuously improve agent performance through evaluation, observability, and iteration. You will work primarily in Python and leverage an open-source AI stack while integrating with modern data platforms (including Databricks) and enterprise security standards.
Key Objectives
* Deliver production-grade agentic AI applications (assistants, copilots, autonomous workflows) from discovery through deployment.
* Establish repeatable engineering patterns for tool use, retrieval-augmented generation (RAG), memory, planning, and orchestration.
* Implement robust evaluation, monitoring, and safety guardrails to improve reliability, accuracy, and user trust.
* Integrate agents with enterprise systems and data platforms (APIs, databases, event streams, Databricks/Spark) while meeting performance, cost, and security requirements.
Primary Responsibilities
* Design and build agentic workflows that use LLMs plus tools (function calling), retrieval, and structured reasoning to accomplish business tasks end-to-end.
* Develop and maintain Python services and libraries for agent orchestration, tool routing, prompt/version management, and policy/guardrail enforcement.
* Build RAG pipelines: document ingestion, chunking, embedding generation, indexing, and retrieval; ensure relevance, freshness, and access control alignment.
* Integrate open-source AI frameworks and components (e.g., LangChain/LangGraph, LlamaIndex, vLLM, Transformers, FastAPI, Pydantic) into a coherent production architecture.
* Implement evaluation and testing for agentic systems: offline benchmarks, golden datasets, regression tests, LLM-as-judge patterns (where appropriate), and online metrics tied to product KPIs.
* Operationalize observability: trace agent/tool calls, track latency and cost, monitor quality signals, detect retrieval/model drift, and set up alerting and feedback loops.
* Build secure integrations with enterprise tools and data sources (REST/gRPC services, SQL databases, data warehouses/lakes, vector databases), including secrets management and auditing.
* Collaborate with data engineering and platform teams to leverage Databricks/Spark for large-scale data preparation, embedding jobs, batch/stream processing, and feature pipelines where relevant.
* Deploy and operate services using containers and CI/CD; ensure reproducibility, environment management, and reliable rollbacks across versions.
* Partner with product, UX, and stakeholders to translate ambiguous needs into agent behaviors, tool contracts, and measurable acceptance criteria.
* Document designs and contribute to engineering standards; mentor peers through code reviews, design reviews, and knowledge sharing.
Required Skills & Experience
* Strong software engineering experience building production services in Python (API design, testing, packaging, performance, and maintainability).
* Hands-on experience building LLM applications, including prompt engineering, tool/function calling, RAG/embeddings, and multi-step workflows.
* Comfort with open-source AI stack and ecosystem (e.g., Hugging Face Transformers, sentence-transformers, LangChain/LangGraph or LlamaIndex, vector databases such as FAISS/Chroma/Pinecone equivalent, MLflow or similar tracking).
* Strong SQL skills and understanding of data engineering fundamentals (batch vs. streaming, data quality, schema evolution, governance) and how they impact AI systems.
* Experience with evaluation approaches for LLM systems (quality metrics, test harnesses, human-in-the-loop review, and reliability techniques).
* Experience deploying and operating services (Docker, Kubernetes or equivalent), CI/CD, and observability/monitoring practices.
* Ability to communicate tradeoffs clearly—balancing quality, latency, cost, reliability, and risk.
Preferred / Nice to Have
* Awareness of Databricks platform concepts (workspaces, notebooks, jobs, clusters), and experience using Spark for large-scale ETL or embedding generation.
* Experience with Databricks MLflow Model Registry and/or Unity Catalog (or similar governance) for managing models, features, and data access.
* Experience serving open-source LLMs (e.g., vLLM, TGI, llama.cpp) and optimizing inference (quantization, batching, caching).
* Experience with advanced agent patterns: planning, reflection, memory, tool selection, multi-agent collaboration, and workflow graphs/state machines.
* Experience with security and responsible AI practices (PII handling, prompt injection defenses, access control, auditability, and safe tool execution).
* Experience building reusable platform components (SDKs, templates, reference architectures) to enable multiple teams.