Back to jobs

Data Scientist (AI Quality & Evaluation)

Bioscope AI
Boston, MA
Full-time
AI tools:
PyTorch
LLMs
Applications go directly to the hiring team

Full Description

About The Role

We're looking for a Data Scientist to own the quality, reliability, and trustworthiness of our clinical AI outputs. You'll build the systems that ensure our AI "knows what it doesn't know" — developing evaluation frameworks, calibrated confidence scoring, and automated quality assurance that physicians can actually trust.

What You'll Do

* Design and implement automated evaluation pipelines that assess AI output quality, accuracy, and safety at scale

* Develop uncertainty quantification systems where confidence scores meaningfully correlate with accuracy

* Build comprehensive evaluation frameworks combining automated assessment with clinician-validated test cases

* Implement feedback loops that continuously improve model outputs based on validation signals

* Establish scalable quality gates that catch errors before they reach end users

* Contribute to model alignment and fine-tuning efforts

Qualifications

Required

* Strong foundation in deep learning frameworks (PyTorch) and LLM architectures

* Experience with model evaluation, benchmarking, and quality metrics

* Proficiency in Python and modern ML development tools

* Strong statistical foundations

* Ability to read, implement, and extend research papers

* Excellent communication skills

Preferred

* Master's degree in Computer Science, Machine Learning, Statistics, or related quantitative field (PhD preferred)

* Publications in top ML/AI venues (NeurIPS, ICML, ICLR, ACL)

* Experience with RLHF, DPO, or preference optimization techniques

* Background in healthcare AI or regulated industries

* Experience building evaluation systems for production LLM applications

Applications go to the hiring team directly