Company Description

Neuphonic is building the future of on-device voice AI.

We develop ultra-low-latency neural text-to-speech systems that enable super-realistic, human-like speech directly on devices. Our focus is on building efficient generative audio models that can run on CPU-constrained hardware, enabling real-time voice interaction without relying on large cloud infrastructure.

By dramatically reducing latency and compute requirements, we are making natural conversational AI possible on phones, embedded devices, browsers, and edge systems. This opens the door to a new generation of voice-enabled applications where interacting with AI feels as natural and responsive as speaking with another person.

Neuphonic was founded in April 2024 and is backed by leading venture capital firms in Europe. Our customers include OEM handset manufacturers, chip manufacturers, and consumer AI companies building the next generation of voice-enabled products.

Our vision is a world where voice becomes the most natural interface for AI, enabling seamless, intuitive interactions that are accessible to everyone.

To understand the technology you would be working on, please review our Hugging Face and GitHub repositories, as they will be part of the interview discussion:

* https://huggingface.co/neuphonic

* https://github.com/neuphonic

Role

We are looking for a Machine Learning Engineer to help advance the state of the art in speech synthesis.

You will work on research and development across the full speech pipeline — from model architecture and training to dataset design and production deployment. The role combines applied research with real-world engineering, working closely with a small team pushing the boundaries of real-time speech systems.

We are particularly interested in candidates with experience in text-to-speech systems, or multimodal machine learning involving speech and audio.

Your work will include:

* Researching and developing state-of-the-art speech synthesis models

* Training and optimising models for high-quality, low-latency speech generation

* Building and curating high-quality proprietary speech datasets

* Improving model quality, expressiveness, and latency

* Working closely with engineers to bring research models into production systems

* Exploring multimodal approaches to speech and conversational AI

This role is best suited to candidates who have worked on research-grade machine learning models, rather than purely application-level ML systems.

You have

* An MSc or PhD in machine learning, speech processing, computer science, or a closely related field

* Strong experience training and evaluating deep learning models using frameworks such as PyTorch, JAX, or TensorFlow

* Several years of research or industry experience developing machine learning models (this is not a graduate or entry-level role)

In addition, you should have experience in one of the following areas:

* Text-to-speech (TTS) or speech synthesis, including model architecture, training, or evaluation and/or multimodal machine learning involving audio, such as models combining speech, text, or audio modalities

* Experience working in research-oriented ML environments, such as academic labs, advanced research teams, or deep-tech startups

* Familiarity with state-of-the-art approaches in speech generation, audio modelling, or multimodal systems

* Experience reading and implementing recent ML research papers

* Background from top universities, leading research groups, or equivalent research experience

* A strong interest in speech technology and conversational AI

Benefits

* Help shape the company from the ground up – you’ll be joining as part of the founding team and will help define the culture and technical direction.

* Competitive salary and equity – we want everyone on the team to share meaningfully in the company’s success.

* Private health insurance – your health and wellbeing are important to us.

* Conference, travel, and development budget – we want to support your continued growth and ensure you have access to the best resources and research community.