Back to jobs

Founding ML Engineer | Evaluating Frontier Medical AI | $150k–$200k | SF

CoffeeSpace
San Francisco Bay Area
Full-time
Applications go directly to the hiring team

Full Description

About the job

This role is being recruited by CoffeeSpace on behalf of Tessel, an SF-based startup working at the intersection of ML evaluation, clinical validation, and FDA regulatory strategy.

We’re identifying a small number of exceptional ML researchers from our network.

If there’s a strong fit, we’ll introduce you directly to the founding team.

Founding ML Engineer 

Location: San Francisco (on-site)

Compensation: $150k–$200k base + 1-3% equity

Start timeline: ASAP

Employment type: Full-time

About Tessel

The next generation of diagnostic AI will detect cancer earlier, catch disease before symptoms appear, and change outcomes for millions of patients. But only if the AI actually works.

Tessel builds the evidence infrastructure that proves it does.

They partner with leading diagnostic AI companies and hospitals to rigorously measure, explain, and continuously monitor model performance.

At Tessel, evaluation isn’t a compliance checkbox – it’s how you build AI worth trusting.

Backed by leading investors and part of StartX (Stanford’s accelerator).

The Founder

Founded by Lucas Tao (Stanford MS CS, former Stanford ML Group researcher at SAIL, ex-AWS engineer), with deep experience across ML systems, interpretability, and large-scale infrastructure.

VC-backed, founder-led AI company already trusted by hospitals and diagnostic AI companies navigating regulatory approval.

The Role

You’ll work directly with medical imaging companies ahead of FDA 510(k) or De Novo submissions, owning engagements end-to-end – from defining evaluation questions to delivering evidence that drives go / no-go decisions.

This is not about building models. It’s about understanding them.

Where do they generalize? Where do they break? What trade-offs are being made? What uncertainty remains?

Your output is defensible, decision-grade evidence – clear enough to inform internal decisions, build customer confidence, and withstand regulatory scrutiny.

You combine strong ML instincts with customer-facing judgment and consistently deliver under time pressure.

Required Qualifications

* Demonstrated history of non-trivial machine learning or analytical work: meaningful projects, publications, systems built, or difficult problems solved

* Strong empirical ML instincts: comfortable designing experiments, analyzing failure cases, and debugging model behavior using statistical or representation-level analysis

* Able to design investigations, detect spurious patterns, reason about distribution shift and uncertainty, and distinguish signal from artifact

* Comfortable working with messy real-world data, imperfect ground truth, and ambiguity

* High analytical ownership in Python (data to analysis to defensible conclusions)

* Clear and confident communicator of technical findings to customers and non-technical stakeholders

Preferred Qualifications

* 3 to 5 years of experience or a strong research track record, such as published work around model evaluation, building medical imaging models, or equivalent depth

* Experience evaluating, validating, or debugging real-world ML systems

* Familiarity with robustness, interpretability, or safety-critical evaluation

* Exposure to medical imaging, healthcare ML, or other safety-critical domains

* Experience working directly with customers or cross-functional stakeholders

This Role Is NOT For You If

* You would rather optimize a metric than investigate why a model breaks on a specific patient subpopulation

* You need clearly defined tasks and stable scope to be effective

* You are uncomfortable presenting findings that still contain uncertainty

* You want pure technical work without customer relationship ownership

Why This Role

This is a high-impact, high-ownership role. Your evidence directly affects whether a model is submitted to the FDA, whether a hospital adopts or walks away, and whether patients get the outcome they deserve.

Dozens of startups are building another model. The company that proves rigorous, continuous evaluation works in medical AI will not just set the standard here. It will define how we build and govern high-stakes AI across every sector.

If you are motivated by ownership, accountability, and real-world impact, not incremental optimization or hype, this role is for you.

Next steps

1. Apply via this LinkedIn job post

2. We’ll review and reach out if there’s a strong match

3. If aligned, we’ll introduce you directly to the Tessel team

If this role isn’t the right fit, we may suggest and make introductions to other high-signal startup roles we’re recruiting for – always with your permission.

A quick note on authenticity

This is a real, active role that CoffeeSpace is recruiting for on behalf of Tessel. We don’t post speculative roles and work directly with hiring teams.

Applications go to the hiring team directly