Appearance
TTS Providers
Configure text-to-speech providers for different voices and languages.
Overview
The SDK supports both Azure Cognitive Services and ElevenLabs for high-quality voice synthesis. You can configure TTS providers per action or use default settings.
Azure Cognitive Services
Azure provides a wide range of neural voices across many languages and regions.
typescript
interface TtsProviderConfigAzure {
provider: "azure";
language?: string; // BCP-47 format (e.g., "en-US", "de-DE")
voice?: string; // Voice name (e.g., "en-US-JennyNeural")
}Examples:
typescript
// English (US) - Female
tts: {
provider: "azure",
language: "en-US",
voice: "en-US-JennyNeural"
}
// English (GB) - Female
tts: {
provider: "azure",
language: "en-GB",
voice: "en-GB-SoniaNeural"
}
// German - Male
tts: {
provider: "azure",
language: "de-DE",
voice: "de-DE-ConradNeural"
}
// Spanish - Female
tts: {
provider: "azure",
language: "es-ES",
voice: "es-ES-ElviraNeural"
}Popular Azure Voices
| Language | Voice Name | Gender | Description |
|---|---|---|---|
| en-US | en-US-JennyNeural | Female | Friendly, professional |
| en-US | en-US-GuyNeural | Male | Clear, neutral |
| en-GB | en-GB-SoniaNeural | Female | British, professional |
| en-GB | en-GB-RyanNeural | Male | British, friendly |
| de-DE | de-DE-KatjaNeural | Female | Professional, clear |
| de-DE | de-DE-ConradNeural | Male | Deep, authoritative |
Full Voice List: See Azure TTS documentation for complete list of 400+ voices in 140+ languages.
ElevenLabs
ElevenLabs provides ultra-realistic AI voices optimized for conversational use cases.
typescript
interface TtsProviderConfigElevenLabs {
provider: "eleven_labs";
voice?: string; // Voice ID (e.g., "21m00Tcm4TlvDq8ikWAM") - optional, uses default if omitted
}Example:
typescript
// With specific voice
tts: {
provider: "eleven_labs",
voice: "21m00Tcm4TlvDq8ikWAM" // Rachel
}
// With default voice
tts: {
provider: "eleven_labs"
}Available ElevenLabs Voices
| Voice Name | ID | Description | Verified Locales |
|---|---|---|---|
| Rachel | 21m00Tcm4TlvDq8ikWAM | Matter-of-fact, personable woman. Great for conversational use cases. | en-US |
| Sarah | EXAVITQu4vr4xnSDxMaL | Young adult woman with a confident and warm, mature quality. | en-US, fr-FR, cmn-CN, hi-IN |
| Laura | FGY2WhTYpPnrIDTdsKH5 | Young adult female delivers sunny enthusiasm with quirky attitude. | en-US, fr-FR, cmn-CN, de-DE |
| George | JBFqnCBsd6RMkjVDRZzb | Warm resonance that instantly captivates listeners. | en-GB, fr-FR, ja-JP, cs-CZ |
| Thomas | GBv7mTt0atIp3Br8iCZE | Soft and subdued male, optimal for narrations or meditations. | en-US |
| Roger | CwhRBWXzGAHq8TQ4Fs17 | Easy going and perfect for casual conversations. | en-US, fr-FR, de-DE, nl-NL |
| Eric | cjVigY5qzO86Huf0OWal | Smooth tenor pitch from a man in his 40s - perfect for agentic use cases. | en-US, fr-FR, de-DE, sk-SK |
| Brian | nPczCjzI2devNBz1zQrb | Middle-aged man with resonant and comforting tone. | en-US, cmn-CN, de-DE, nl-NL |
| Jessica | cgSgspJ2msm6clMCkdW9 | Young and playful American female, perfect for trendy content. | en-US, fr-FR, ja-JP, cmn-CN, de-DE |
| Liam | TX3LPaxmHKxFdv7VOQHJ | Young adult with energy and warmth - suitable for reels and shorts. | en-US, de-DE, cs-CZ, pl-PL, tr-TR |
| Alice | Xb7hH8MSUJpSbSDYk0k2 | Clear and engaging, friendly British woman suitable for e-learning. | en-GB, it-IT, fr-FR, ja-JP, pl-PL |
| Daniel | onwK4e9ZLuTAKqWW03F9 | Strong voice perfect for professional broadcast or news. | en-GB, de-DE, tr-TR |
| Lily | pFZP5JQG7iQjIQuC4Bku | Velvety British female delivers news with warmth and clarity. | it-IT, de-DE, cmn-CN, cs-CZ, nl-NL |
| River | SAz9YHcvj6GT2YYXdXww | Relaxed, neutral voice ready for narrations or conversational projects. | en-US, it-IT, fr-FR, cmn-CN |
| Charlie | IKne3meq5aSn9XLyUdCD | Young Australian male with confident and energetic voice. | en-AU, cmn-CN, fil-PH |
| Aria | 9BWtsMINqrJLrRacOk9x | Middle-aged female with African-American accent. Calm with hint of rasp. | en-US, fr-FR, cmn-CN, tr-TR |
| Matilda | XrExE9yKIg1WjnnlVkGX | Professional woman with pleasing alto pitch. Suitable for many use cases. | en-US, it-IT, fr-FR, de-DE |
| Will | bIHbv24MWmeRgasZH58o | Conversational and laid back. | en-US, fr-FR, de-DE, cmn-CN, cs-CZ |
| Chris | iP95p4xoKVk53GoZ742B | Natural and real, down-to-earth voice great across many use-cases. | en-US, fr-FR, sv-SE, hi-IN |
| Bill | pqHfZKP75CvOlQylNhV4 | Friendly and comforting voice ready to narrate stories. | en-US, fr-FR, cmn-CN, de-DE, cs-CZ |
Note: 50+ voices available in total. The SDK includes full TypeScript type definitions for all voice IDs and names.
Choosing a TTS Provider
Use Azure when:
- You need support for many languages (140+ languages available)
- You want consistent quality across all locales
- You need specific regional accents or dialects
- Budget is a primary concern
Use ElevenLabs when:
- You need the most natural, human-like voices
- Conversational quality is critical (phone calls, virtual assistants)
- You're primarily working with English or common European languages
- You want voices with distinct personalities
Usage Examples
Per-Action Configuration
typescript
onUserSpeak: async (event) => {
return {
type: "speak",
session_id: event.session.id,
text: "Hello in a different voice",
tts: {
provider: "azure",
language: "en-US",
voice: "en-US-JennyNeural",
},
};
}Using ElevenLabs
typescript
onUserSpeak: async (event) => {
return {
type: "speak",
session_id: event.session.id,
text: "Hello from ElevenLabs!",
tts: {
provider: "eleven_labs",
voice: "21m00Tcm4TlvDq8ikWAM", // Rachel
},
};
}Next Steps
- Barge-In Configuration - Control interruption behavior
- Action Types - Complete action reference