Summary

Early-stage AI investment of ours, founded by a successful repeat entrepreneur, is looking to hire a Senior Machine Learning Engineer or Applied Research Scientist focused on efficient on-device and edge-deployed language models.

This will be one of the earliest ML hires and a foundational technical role inside the company. The person will help architect and build core ML systems focused on efficient inference, model optimization, deployment reliability, and production-scale edge AI infrastructure.

The role sits at the intersection of applied ML research, inference systems, and production engineering.

What You’ll Work On

* Architect and build core ML infrastructure for efficient language model deployment

* Optimize small language models for constrained compute and memory environments

* Improve inference latency, throughput, memory footprint, and deployment reliability

* Develop production-ready pipelines for model evaluation, benchmarking, deployment, and monitoring

* Translate experimental research code into scalable, maintainable production systems

* Work closely across research and engineering to productionize new model capabilities

* Help define long-term technical direction across edge AI and inference systems

Key Qualifications

* Strong experience deploying ML systems or language models in constrained runtime environments

* Deep understanding of model optimization techniques including quantization, distillation, pruning, and efficient inference

* Experience with modern inference runtimes, deployment frameworks, or accelerated ML systems

* Strong systems intuition around latency, memory efficiency, and real-time inference behavior

* Strong PyTorch experience and comfort operating across both research and production environments

* Experience building scalable ML infrastructure and evaluation pipelines

* Ability to operate independently in ambiguous, zero-to-one startup environments

* Prior experience leading small, high-impact technical initiatives or teams preferred

Strong Plus Signals

* Experience working with small language models (SLMs) or edge-deployed LLM systems

* Background in embedded AI, systems optimization, runtime engineering, or inference infrastructure

* Familiarity with low-level performance optimization or hardware-aware ML deployment

* Experience productionizing transformer-based systems in resource-constrained environments

Background

* Advanced degree in Computer Science, Electrical Engineering, Applied Mathematics, or related field preferred

* 4+ years of relevant industry or applied research experience

Please note:

There are no fees associated with any of the support we provide our investments. Greylock Talent provides free candidate referrals/introductions to all of our active investments (one of the many services we provide).

Due to the volume of applicants we typically receive, a follow-up email will not be sent unless a match is identified.

Founding MLE (On-device AI)

Full Description