Eleven v3 (alpha): Eleven Labs introduces Emotional Control via Text Tags

Quick Take: Eleven v3 alpha just dropped! Eleven v3 is now their most expressive Text-to-Speech model yet. It gives developers unprecedented control over AI-generated speech using simple text-based “audio tags” like `[laughs]` and `[whispers]`. Supporting over 70 languages and multi-speaker dialogue, this is a major leap forward for creating realistic audio for videos, audiobooks, and games.


πŸš€ The Crunch

🎯 Why This Matters: Eleven v3 alpha is a massive leap beyond robotic TTS. For developers, Eleven v3 means you can now programmatically generate highly expressive, emotionally nuanced audio with simple text tags like [laughs] or [whispers]. It unlocks a new level of realism for audiobooks, game characters, and video narration without wrestling with complex SSML or separate audio editing.

🎭
Control Emotion with Tags
Direct the AI’s performance with simple text tags. Use [laughs], [whispers], [sarcastic], or even [strong French accent] to control the delivery.
πŸ—£οΈ
Multi-Speaker Dialogue
Generate realistic conversations between multiple speakers in a single prompt, complete with interruptions, tone shifts, and emotional cues.
🌍
70+ Languages Supported
Build applications with a global reach. The new model supports a massive range of languages and accents right out of the box.
🎚️
Fine-Tune with Stability
Use the “Stability” slider to balance performance. Crank it to “Creative” for maximum expression or to “Robust” for v2-like consistency.

⚑ Developer Tip: Jump into the UI and start experimenting with audio tags immediately. For the best results, use a longer prompt (>250 chars) and set the Stability slider to “Creative” or “Natural”. A great first test: [whispers] This is a secret... [laughs] just kidding! I am SO excited to try this.

Critical Caveats & Requirements

  • Alpha Research Preview: This is not a final product. Expect inconsistencies and be prepared for changes.
  • Not for Real-Time (Yet): For conversational use cases needing low latency, stick with v2.5 Turbo or Flash for now. A real-time v3 is in development.
  • UI First, API Coming Soon: v3 is available in the ElevenLabs UI now. Public API access requires contacting sales for early access.
  • Prompt Engineering Required: This model is more powerful but requires more guidance. Use longer prompts and select voices that match your desired output for best results.

βœ… Availability: Eleven v3 is live in the ElevenLabs UI today. They are offering an 80% discount on usage in June to encourage experimentation.


πŸ”¬ The Dive

The Big Picture: From Speech Synthesis to Speech Performance. The release of Eleven v3 marks a significant shift in the world of text-to-speech. The focus is no longer just on synthesizing intelligible words but on generating a believable human *performance*. By understanding text at a deeper level and giving developers direct, intuitive controls via audio tags, ElevenLabs is aiming to bridge the gap between synthetic voices and genuine emotional expression.

How It Works: Directing the AI Actor

  • Audio Tags: These are the primary tool for performance direction. You can specify emotions ([sad], [excited]), delivery styles ([whispers], [shouts]), and even non-verbal sounds ([laughs], [sighs]).
  • Punctuation as a Tool: The model is highly sensitive to punctuation. Ellipses (...) create dramatic pauses, while ALL CAPS adds emphasis, giving you another layer of control over the rhythm and cadence of the speech.
  • Multi-Speaker Dialogue: By assigning different pre-existing voices from your library to different speakers within a single prompt, v3 can generate entire conversations, including interruptions and overlapping speech.

TLDR: ElevenLabs v3 is here to make AI voices feel human. Use simple text tags like `[laughs]` to control emotion, create multi-speaker dialogue, and generate hyper-realistic speech. It’s in the UI now (alpha), so go make some voices that actually have a soul.

Tom Furlanis
Researcher. Narrative designer. Wannabe Developer.
Twenty years ago, Tom was coding his 1st web applications in PHP. But then he left it all to pursue studies in humanities. Now, two decades later, empowered by his coding assistants, a degree in AI ethics and a plethora of unrealized dreams, Tom is determined to develop his apps. Developer heaven or bust? Stay tuned to discover!