Skip to content

TTS Providers

Configure text-to-speech providers for different voices and languages.

Overview

The SDK supports both Azure Cognitive Services and ElevenLabs for high-quality voice synthesis. You can configure TTS providers per action or use default settings.

Azure Cognitive Services

Azure provides a wide range of neural voices across many languages and regions.

typescript
interface TtsProviderConfigAzure {
  provider: "azure";
  language?: string; // BCP-47 format (e.g., "en-US", "de-DE")
  voice?: string;    // Voice name (e.g., "en-US-JennyNeural")
}

Examples:

typescript
// English (US) - Female
tts: {
  provider: "azure",
  language: "en-US",
  voice: "en-US-JennyNeural"
}

// English (GB) - Female
tts: {
  provider: "azure",
  language: "en-GB",
  voice: "en-GB-SoniaNeural"
}

// German - Male
tts: {
  provider: "azure",
  language: "de-DE",
  voice: "de-DE-ConradNeural"
}

// Spanish - Female
tts: {
  provider: "azure",
  language: "es-ES",
  voice: "es-ES-ElviraNeural"
}
LanguageVoice NameGenderDescription
en-USen-US-JennyNeuralFemaleFriendly, professional
en-USen-US-GuyNeuralMaleClear, neutral
en-GBen-GB-SoniaNeuralFemaleBritish, professional
en-GBen-GB-RyanNeuralMaleBritish, friendly
de-DEde-DE-KatjaNeuralFemaleProfessional, clear
de-DEde-DE-ConradNeuralMaleDeep, authoritative

Full Voice List: See Azure TTS documentation for complete list of 400+ voices in 140+ languages.

ElevenLabs

ElevenLabs provides ultra-realistic AI voices optimized for conversational use cases.

typescript
interface TtsProviderConfigElevenLabs {
  provider: "eleven_labs";
  voice?: string; // Voice ID (e.g., "21m00Tcm4TlvDq8ikWAM") - optional, uses default if omitted
}

Example:

typescript
// With specific voice
tts: {
  provider: "eleven_labs",
  voice: "21m00Tcm4TlvDq8ikWAM"  // Rachel
}

// With default voice
tts: {
  provider: "eleven_labs"
}

Available ElevenLabs Voices

Voice NameIDDescriptionVerified Locales
Rachel21m00Tcm4TlvDq8ikWAMMatter-of-fact, personable woman. Great for conversational use cases.en-US
SarahEXAVITQu4vr4xnSDxMaLYoung adult woman with a confident and warm, mature quality.en-US, fr-FR, cmn-CN, hi-IN
LauraFGY2WhTYpPnrIDTdsKH5Young adult female delivers sunny enthusiasm with quirky attitude.en-US, fr-FR, cmn-CN, de-DE
GeorgeJBFqnCBsd6RMkjVDRZzbWarm resonance that instantly captivates listeners.en-GB, fr-FR, ja-JP, cs-CZ
ThomasGBv7mTt0atIp3Br8iCZESoft and subdued male, optimal for narrations or meditations.en-US
RogerCwhRBWXzGAHq8TQ4Fs17Easy going and perfect for casual conversations.en-US, fr-FR, de-DE, nl-NL
EriccjVigY5qzO86Huf0OWalSmooth tenor pitch from a man in his 40s - perfect for agentic use cases.en-US, fr-FR, de-DE, sk-SK
BriannPczCjzI2devNBz1zQrbMiddle-aged man with resonant and comforting tone.en-US, cmn-CN, de-DE, nl-NL
JessicacgSgspJ2msm6clMCkdW9Young and playful American female, perfect for trendy content.en-US, fr-FR, ja-JP, cmn-CN, de-DE
LiamTX3LPaxmHKxFdv7VOQHJYoung adult with energy and warmth - suitable for reels and shorts.en-US, de-DE, cs-CZ, pl-PL, tr-TR
AliceXb7hH8MSUJpSbSDYk0k2Clear and engaging, friendly British woman suitable for e-learning.en-GB, it-IT, fr-FR, ja-JP, pl-PL
DanielonwK4e9ZLuTAKqWW03F9Strong voice perfect for professional broadcast or news.en-GB, de-DE, tr-TR
LilypFZP5JQG7iQjIQuC4BkuVelvety British female delivers news with warmth and clarity.it-IT, de-DE, cmn-CN, cs-CZ, nl-NL
RiverSAz9YHcvj6GT2YYXdXwwRelaxed, neutral voice ready for narrations or conversational projects.en-US, it-IT, fr-FR, cmn-CN
CharlieIKne3meq5aSn9XLyUdCDYoung Australian male with confident and energetic voice.en-AU, cmn-CN, fil-PH
Aria9BWtsMINqrJLrRacOk9xMiddle-aged female with African-American accent. Calm with hint of rasp.en-US, fr-FR, cmn-CN, tr-TR
MatildaXrExE9yKIg1WjnnlVkGXProfessional woman with pleasing alto pitch. Suitable for many use cases.en-US, it-IT, fr-FR, de-DE
WillbIHbv24MWmeRgasZH58oConversational and laid back.en-US, fr-FR, de-DE, cmn-CN, cs-CZ
ChrisiP95p4xoKVk53GoZ742BNatural and real, down-to-earth voice great across many use-cases.en-US, fr-FR, sv-SE, hi-IN
BillpqHfZKP75CvOlQylNhV4Friendly and comforting voice ready to narrate stories.en-US, fr-FR, cmn-CN, de-DE, cs-CZ

Note: 50+ voices available in total. The SDK includes full TypeScript type definitions for all voice IDs and names.

Choosing a TTS Provider

Use Azure when:

  • You need support for many languages (140+ languages available)
  • You want consistent quality across all locales
  • You need specific regional accents or dialects
  • Budget is a primary concern

Use ElevenLabs when:

  • You need the most natural, human-like voices
  • Conversational quality is critical (phone calls, virtual assistants)
  • You're primarily working with English or common European languages
  • You want voices with distinct personalities

Usage Examples

Per-Action Configuration

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Hello in a different voice",
    tts: {
      provider: "azure",
      language: "en-US",
      voice: "en-US-JennyNeural",
    },
  };
}

Using ElevenLabs

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Hello from ElevenLabs!",
    tts: {
      provider: "eleven_labs",
      voice: "21m00Tcm4TlvDq8ikWAM", // Rachel
    },
  };
}

Next Steps