Back to jobs

AI Prompt & Agent Developer

OpenCall.ai (YC W24)
San Francisco, CA
Full-time
AI tools:
OpenAI API
Applications go directly to the hiring team

Join OpenCall.ai as an AI Prompt & Agent Developer in San Francisco to impact voice AI for medical groups. You'll be part of a fast-paced team, working on cutting-edge technology that improves human-like customer service, and your work will directly influence millions of calls annually.

Full-time
On-site
2+ years
Bachelor's degree and/or extensive experience in relevant fields

Skills & Expertise

AI/ML
NLP
prompt engineering
Python
TypeScript
analytical skills
problem-solving
communication

Key Responsibilities

Write and maintain the prompts that run in production for customer calls.

Ship iterative fixes by reviewing failed calls daily and deploying solutions.

Build evaluation harnesses and innovative metrics for agent performance.

Full Description

AI Prompt & Agent Developer

Location: On-site (San Francisco) Compensation: $75,000 – $135,000 + Equity

About OpenCall

OpenCall's voice AI handles calls for multi-location medical groups.

We’re solving a unique challenge: pushing the limits of AI at millisecond performance to have the best human-like customer service experience at enterprise scale. Our AI is faster, cheaper, more powerful, and more reliable than anything else on the market. 

We’re looking for versatile developers to help scale our proprietary system from millions of calls to billions of calls annually.

We're hiring an AI Prompt & Agent Developer to own behavioral slice(s) of our voice agents. That behavior splits into two categories: behavior shared across every deployment, and behavior specific to a subset of deployments. You're someone who actually enjoys looking at the data, because the data informs everything else. You'll write prompts, design subagent architectures, build evals, and push automation rates up one small, measurable win at a time.

Responsibilities

* Write and maintain the prompts that run in production. Intent classification, information extraction, availability negotiation, closing phrases, insurance verification flows, objection handling, edge-case recovery. You own behavior that touches every customer call.

* Ship iteratively against real call data. Every morning, you'll listen to failed calls from yesterday. Every afternoon, you'll deploy a fix. You’ll be using and helping to develop dashboards, call review tooling, and automated agents to accelerate the work.

* Build evaluation harnesses. You'll develop offline eval sets, run automated prompt optimization (we use GEPA-style approaches), and establish the test suites that let us ship changes without breaking live deployments.

* Human-in-the-loop onboarding. New customers come online constantly. You'll work with and iterate on our internal AI agents that translate a practice's intake form, their scheduling rules, and their quirks into an agent configuration. Every week, you'll be designing new evaluation metrics for these customers and helping to improve existing ones.

* QA and continuous improvement. You'll simulate real-world customer scenarios, measure outcomes, and monitor production agent performance so you can catch drift early and fix it fast.

What we're looking for

* You've shipped prompts that broke production. Doesn't matter if it was at OpenAI, a chatbot startup, a research lab, or your own project. What matters is that you've felt the specific pain of a prompt that worked beautifully in dev and broke the second it hit real users.

* You're meticulous and careful. Looking at data for long stretches energizes you, as long as there's a signal. You stay organized when five things are in flight. We deploy multiple times a day, and we also run healthcare workflows where a bad change costs real money for real practices. You know the difference between moving fast and breaking things.

* Writing sensibility. The best prompt engineers are good writers. You notice register, rhythm, and word choice. You can tell why "Hello, cornerside dental? This is Ava, how can I help you out today? sounds warmer than "Hello, Cornerside Dental, this is Ava. How can I help you out today" out of a TTS.

* Analytical and empirical. You are relentlessly data-driven. Before you make changes, you proactively run experiments and measure. You don't ship because "I think this is better." You justify a change with "this moved booking rate from 78.2% to 81.4% on n=412 calls."

* Comfort with code. You don't need to be a senior engineer, but you should read Python fluently and TypeScript comfortably, and you can get almost any coding task done by pairing with modern AI coding tools.

Requirements

* 2+ years of experience with AI/ML, NLP, or prompt engineering in production

* Strong analytical and problem-solving mindset; comfort with ambiguity

* Excellent written and verbal communication skills

* Bachelor's degree and/or extensive experience in one or more of: Computer Science, Engineering, Math, Philosophy, Linguistics, Cognitive Science, English, Medicine, or a related field

Preferred Qualifications

* Python chops beyond reading: APIs, data pipelines, testing frameworks

* Prior work with voice AI, TTS, ASR, or telephony platforms (Twilio, etc.)

* Contact center, SaaS, or customer-facing tech background

* Healthcare or medical operations experience — you know what an NPI is, you've worked a front desk, you understand the weird chaos of dental scheduling

* Automated prompt optimization experience (DSPy, GEPA, MIPROv2)

* Fine-tuning experience

Applications go to the hiring team directly