Back to jobs

AI/ML Evaluation and Alignment Engineer

hackajob
United States
Full-time
13,500,000 – 16,500,000 / year
AI tools:
LangChain
HuggingFace
PyTorch
Applications go directly to the hiring team

As an AI/ML Evaluation and Alignment Engineer at hackajob, you'll play a pivotal role in shaping the evaluation frameworks for cutting-edge AI technologies in public safety and intelligence sectors. You'll collaborate with a diverse team of AI engineers and data scientists while promoting responsible AI practices and having a significant impact on ethical AI deployment.

Full-time
3-5+ years
Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field

Skills & Expertise

Python
ML/AI engineering
LLM evaluation
bias detection
generative AI
DevOps/MLOps
cloud AI platforms
Kubernetes

Key Responsibilities

Build and maintain evaluation frameworks for LLMs and generative AI systems.

Design guardrails to minimize bias and other ethical risks in workflows.

Implement continuous evaluation pipelines for AI models integrated into production systems.

Full Description

hackajob is collaborating with Leo Technologies to connect them with exceptional professionals for this role.

Core Responsibilities

* Build and maintain evaluation frameworks for LLMs and generative AI systems tailored to public safety and intelligence use cases.

* Design guardrails and alignment strategies to minimize bias, toxicity, hallucinations, and other ethical risks in production workflows.

* Partner with AI engineers and data scientists to define online and offline evaluation metrics (e.g., model drifts, data drifts, factual accuracy, consistency, safety, interpretability).

* Implement continuous evaluation pipelines for AI models, integrated into CI/CD and production monitoring systems.

* Collaborate with stakeholders to stress test models against edge cases, adversarial prompts, and sensitive data scenarios.

* Research and integrate third-party evaluation frameworks and solutions; adapt them to our regulated, high-stakes environment.

* Work with product and customer-facing teams to ensure explainability, transparency, and auditability of AI outputs.

* Provide technical leadership in responsible AI practices, influencing standards across the organization.

* Contribute to DevOps/MLOps workflows for deployment, monitoring, and scaling of AI evaluation and guardrail systems (experience with Kubernetes is a plus).

* Document best practices and findings, and share knowledge across teams to foster a culture of responsible AI innovation.

What We Value

* Bachelor's or Master's in Computer Science, Artificial Intelligence, Data Science, or related field.

* 3-5+ years of hands-on experience in ML/AI engineering, with at least 2 years working directly on LLM evaluation, QA, or safety.

* Strong familiarity with evaluation techniques for generative AI: human-in-the-loop evaluation, automated metrics, adversarial testing, red-teaming.

* Experience with bias detection, fairness approaches, and responsible AI design.

* Knowledge of LLM observability, monitoring, and guardrail frameworks e.g Langfuse, Langsmith

* Proficiency with Python and modern AI/ML/LLM/Agentic AI libraries (LangGraph, Strands Agents, Pydantic AI, LangChain, HuggingFace, PyTorch, LlamaIndex).

* Experience integrating evaluations into DevOps/MLOps pipelines, preferably with Kubernetes, Terraform, ArgoCD, or GitHub Actions.

* Understanding of cloud AI platforms (AWS, Azure) and deployment best practices.

* Strong problem-solving skills, with the ability to design practical evaluation systems for real-world, high-stakes scenarios.

* Excellent communication skills to translate technical risks and evaluation results into insights for both technical and non-technical stakeholders.

Applications go to the hiring team directly