Founding Data Scientist

Runner — San Francisco (onsite for the first three months; remote-friendly thereafter)

About Runner

Runner is an autonomous-AI product that operates online stores end-to-end on behalf of independent merchants. The agent manages the storefront, sets prices, allocates advertising budget, curates assortment, and handles replenishment. Merchants choose Runner because the business outcomes our system produces exceed what they would produce on their own.

The agent already makes thousands of decisions per merchant per day. What it does not yet have is a rigorous answer to "was that decision correct?" — and a closed loop that turns the answer into a better policy tomorrow.

The role

You will join as a founding member of our Data Science function and as a senior individual contributor, building the decision and experimentation foundations from the ground up.

Your charter is to raise the quality of the decisions our system makes on customers' behalf — distinguishing what is confident from what is correct — and to build the tools that surface the difference. You will partner closely with the AI team, the product team, and our customers.

What we are looking for

Required

* Three or more years of applied causal-inference and experimentation work, including end-to-end ownership from scoping through identification, estimation, and shipping recommendations that influenced real decisions.

* Deep fluency in the causal toolkit: difference-in-differences, synthetic control, instrumental variables, regression discontinuity, propensity scoring, and at least one ML-augmented method (double machine learning, doubly-robust estimation, or meta-learners).

* Demonstrated experience designing experimentation when randomization is impossible or insufficient — settings with interference, spillovers, network effects, or small samples per unit.

* Strong Bayesian and decision-theoretic intuition. You frame recommendations in expected value under uncertainty rather than p-values, and reach for hierarchical or partial-pooling models with many small units without prompting.

* Production-grade Python and SQL. You ship code that passes review, design schemas, and produce work that extends beyond notebook artifacts.

* Comfort building from zero with no platform underneath you. You scope rigorously and produce a defensible directional answer in three days when three weeks is not available.

* The ability to translate statistical results into business recommendations, and to recognize when "we do not yet know" is the correct answer.

Strongly preferred

* Domain experience in settings where decisions matter and randomization is difficult: e-commerce, marketplace, DTC, advertising platforms, fintech, ride-share, or comparable high-stakes-decision environments.

* Experience with bandits and adaptive experimentation, including a clear view on when each is appropriate.

* Prior founding or first-DS experience at a scaling startup.

What sets this role apart

* You think in decisions, not dashboards.

* You can articulate the distinction between "the model was accurate"and "the decision was correct" fluently.

* You prefer a defensible directional answer in seventy-two hours to a perfect one in three months.

* You want to set the standards rather than operate within someone else's framework.

Data Scientist

Full Description