Position: PhD Rater

Type: Part-Time

Compensation: $70–$120/hour

Location: Remote

Commitment: 30+ hours/week (primarily weekdays)

Role Responsibilities

* Design challenging, real-world STEM benchmark problems in domains such as data science, machine learning, finance, and software engineering.

* Implement tasks within an agentic development environment using Python.

* Create reproducible problem setups with clear specifications and executable tests.

* Evaluate and analyze AI model behavior, including reasoning traces and agent workflows.

* Diagnose reasoning failures, logic gaps, and problem-solving limitations in AI systems.

* Contribute to improving benchmark quality and evaluation frameworks for frontier AI models.

Requirements

* Active or recently graduated PhD.

* Deep expertise in data science, machine learning, finance, and/or Python-based software development.

* Strong research background in advanced STEM topics.

* Ability to commit reliably for 30+ hours per week.

* Demonstrated technical output such as high-quality open-source contributions or research work.

* Ability to analyze agent behavior traces and diagnose failures beyond surface-level errors.

Application Process

* Upload resume

* Interview

* Submit form

Machine Learning Engineer | Remote