Neuphonic secured €3.5M for its ultra-low latency text-to-speech technology
London-based Neuphonic is a startup working on an ultra-low latency text-to-speech (TTS) engine to unlock the potential of responsive and natural-sounding conversational AI. Giving text-based language models a voice has presented itself as a challenge, given size, speed, cost, and naturalness constraints. Typically, voice modes like Gemini Live or Meta AI's recently launched voice mode transcribe the speech they capture to pass it on to an LLM, which generates a response. Then, the system takes the LLM response and passes it through a text-to-speech engine to generate synthetic speech.
Although the technology has come a long way, there are still issues like awkward pauses as the processing takes place, or unnatural interactions, like Gemini Live's tendency to keep speaking at a lower volume when interrupted. Neuphonic is working to address these issues, starting with its TTS algorithm for real-time, incremental speech generation with a 25-millisecond latency. Neuphonic's incremental approach also means it can work with any LLM to generate more natural-sounding, language-agnostic speech. Fresh off a recently concluded closed beta, Neuphonic's API is now available for anyone looking to leverage "the world's fastest TTS system".
In this initial launch, Neuphonic is launching two models to provide greater flexibility when choosing the best solution for a specific use case: a low-latency one and another with higher quality. Moreover, the company plans to continue working on new models, expand its regional and language support, and offer on-device solutions. These efforts will be boosted with Neuphonic's recently closed pre-seed funding round, in which the company raised €3.5 million from Moonfire VC, with participation from Tiny VC, Salica's Oryx Fund, and Cur8 Capital.