Data Engineer
PamFull Description
Data Engineering Intern — PAM AI
Location: Tysons, Virginia
Type: Internship (Full-time or Part-time)
About the Company:
Pam is the AI workforce for automotive dealerships.
Pam answers calls, books appointments, follows up with customers, and proactively reaches out to keep conversations moving 24/7. The fastest growing Voice AI solution reaching over 700+ dealerships across North America.
Learn more: pam.ai
Pam.ai is not able to provide employment visa sponsorship at this time. Candidates must be authorized to work in the United States without current or future sponsorship.
The Problem
Car dealerships lose billions in revenue every year—not because demand isn’t there, but because customer communication is broken. Missed calls, slow responses, poor follow-ups, and fragmented systems lead to lost sales. PAM AI fixes this by acting as the AI operating layer for dealership communication—handling calls, texts, chat, and workflows 24/7. But the quality of that intelligence depends entirely on one thing: Data.
The Role
We’re looking for a Data Engineering Intern who wants to work on the systems that directly determine how intelligent our AI becomes. This is not a "learning" internship in the traditional sense. You will be expected to build, ship, and own real infrastructure that sits in the critical path of our product.
You’ll work on:
* Dataset infrastructure for conversational AI.
* Labeling systems that define ground truth for complex dealership intents.
* Pipelines that turn messy real-world interactions and garbled transcripts into usable intelligence.
If you’re the type of person who enjoys turning chaos into structured systems, you’ll thrive here.
What You’ll Do
Build the Dataset Layer
* Design and implement systems to ingest and index large volumes of dealership interaction data, including calls, transcripts, SMS, and chat.
* Create tools to search, filter, and explore datasets across the company, ensuring high-quality matches for dealership staff and services.
* Build internal infrastructure for dataset curation and metadata management.
Own the Labeling Stack
* Design workflows that allow distributed labelers to produce high-quality annotations quickly.
* Improve labeling systems across speed, accuracy, and usability.
* Help define what "good data" looks like for conversational AI, addressing common issues like duplicate appointments, mis-transcriptions, and improper categorization.
Ship Data Pipelines
* Build pipelines that transform raw, unstructured interaction data into structured training datasets.
* Integrate with dealership systems such as CRMs, schedulers like X-Time, and communication platforms.
* Improve reliability and observability of data flows to distinguish between successful lookups and system fallbacks.
Apply Data Insights to Pam Agents
* Contribute to Internal Pam Agent Builder Tools, where agents are built and improved using prompt composer workers and automated evaluations.
* Connect datasets directly to agent behavior, evaluation, and iteration loops.
* Help close the loop between data → labeling → model → real-world performance.
What We’re Looking For
You might be a fit if:
* You’ve built systems before—not just taken classes.
* You’re comfortable working in Python or TypeScript.
* You understand (or are eager to learn) how data pipelines and backend systems work.
* You care about data quality and correctness, not just getting something to run.
Strong signals:
* You’ve worked with messy, real-world datasets, such as garbled transcription IDs or inconsistent dealership records.
* You’ve built internal tools or automation systems.
* You’ve explored data labeling, evaluation, or ML workflows.
* You move fast and don’t wait for perfect specs.
What Makes This Role Different
* You will work on core product systems, not side projects.
* Your work will directly impact revenue for real businesses by improving appointment booking and lead capture.
* You’ll operate in a high-ownership, low-process environment.
* You’ll be expected to figure things out, not wait for instructions.
Example Problems You Might Tackle
* Build a search engine over millions of dealership conversations to improve intent detection.
* Design a labeling system for customer intent, sentiment, and outcomes to resolve complex routing issues.
* Create pipelines that turn raw call transcripts into structured training data for multilingual support.
* Increase labeling throughput by 3x without sacrificing quality through better validation logic.
* Build tools that let non-technical operators contribute to AI training via the Agent Workbench.
Why This Matters
Better data → better AI conversations
Better AI conversations → more appointments booked
More appointments → real revenue for dealerships
You won’t just be building infrastructure—you’ll be directly influencing how businesses capture demand and operate.
What You’ll Get
* Experience building production-grade data systems for AI.
* Exposure to real-world conversational AI at scale.
* High ownership and direct impact from day one.
* Close collaboration with a small, highly technical team.