Miso Labs has released the open weights for Miso One, an 8-billion-parameter text-to-speech model the company pitches as "the most emotive voice model in the world." Co-founder Aoden Teo announced the model on X, citing 110 milliseconds of latency and saying API access is "coming soon."

Miso Labs has not published a license type, training-data disclosure, supported-language list, or voice count alongside the announcement. The tweet links to a sample thread VP Land has not independently evaluated. Open weights are available now, but the hosted API is not.

An 8B-parameter open-weights model claiming 110ms latency

The eye-catching specification in Miso Labs' announcement is the 110-millisecond latency figure. Paired with an 8-billion-parameter backbone and open model weights at launch, the package is unusual in a TTS market where the fastest, most expressive models tend to ship as closed APIs.

Per Teo, Miso One "emotes like a human and responds faster than a human." That framing belongs to Miso Labs; the company has not published comparative benchmarks against other speech models, and no third-party evaluations are available at announcement time.

The "most emotive voice model in the world" claim is similarly Miso Labs' own. Expressiveness in modern TTS has become a moving target as competing teams push prosody, breath, and emotional range, and any ranking depends on which test set and which listeners are doing the rating.

Open weights ship before the API

Most production-grade speech models have launched API-first, with self-hostable weights either never appearing or arriving long after the hosted product. ElevenLabs, for example, runs entirely through a managed API; we covered the company's Voice Design tool for generating custom voices from text prompts, a feature available only through the hosted service.

Miso Labs is reversing that order. By publishing weights before opening the API, the company gives researchers and self-hosting teams a usable artifact on day one, with the hosted product trailing. The trade-off: anyone running Miso One today has to stand up their own inference stack and reach the 110ms latency number on their own hardware.

The release fits a broader pattern of low-latency, developer-facing voice infrastructure. Deepgram's Saga voice operating system similarly leans on self-hostable, low-latency design as its core sell, though Saga targets voice-controlled developer workflows rather than expressive synthesis.

What Miso Labs has not disclosed

The announcement leaves several specifications open, and Miso Labs has not published a model card or technical report alongside the tweet. Items not addressed:

  • License terms for the open weights, including any commercial-use restrictions

  • Supported languages and the number of distinct voices shipped with the model

  • Audio output format, sample rate, and streaming behavior

  • Hardware requirements to hit the cited 110ms latency

  • Training data sources and any consent or licensing disclosures

  • Fine-tuning support for cloning or custom voice creation

  • Pricing for the forthcoming API tier

Miso Labs lists Teo as a co-founder on his X profile under the handle @MisoLabsAI. The company has not named additional team members in the launch tweet, and the announcement points readers to an audio sample thread for listening tests.

Reply

Avatar

or to participate

Keep Reading