TTS Providers

Configure text-to-speech providers for different voices and languages.

Overview

The SDK supports Kugelaudio, Azure Cognitive Services, and ElevenLabs for high-quality voice synthesis. You can configure TTS providers per action or use default settings.

Kugelaudio

Kugelaudio is a TTS provider that delivers German and multilingual voice synthesis via WebSocket streaming. It runs entirely on sipgate infrastructure — no third-party cloud service is involved.

typescript

interface TtsProviderConfigKugelaudio {
  provider: "kugelaudio";
  voice?: string;     // Voice ID (numeric string). Omit to use the default sipgate voice.
  language?: string;  // ISO 639-1 code, e.g. "de" or "en"
  speed?: number;     // Playback speed multiplier 0.8–1.2 (default: 1.0)
}

Example:

typescript

// Default sipgate voice
tts: {
  provider: "kugelaudio"
}

// Specific voice
tts: {
  provider: "kugelaudio",
  voice: "963",
  language: "de"
}

Available Kugelaudio Voices

Voice Name	ID	Gender	Description
sipgate	963	Male	sipgate house voice (default)
Tim Schröder	972	Male	30-year-old native German professional
Anna Martin	973	Female	31-year-old German native, customer support
Lea Huber	978	Female	25-year-old native German, customer support
Sara Wagner	979	Female	29-year-old native German speaker
Marius Behnke	980	Male	35-year-old German, customer success manager

Azure Cognitive Services

Azure provides a wide range of neural voices across many languages and regions.

typescript

interface TtsProviderConfigAzure {
  provider: "azure";
  language?: string; // BCP-47 format (e.g., "en-US", "de-DE")
  voice?: string;    // Voice name (e.g., "en-US-JennyNeural")
}

Examples:

typescript

// English (US) - Female
tts: {
  provider: "azure",
  language: "en-US",
  voice: "en-US-JennyNeural"
}

// English (GB) - Female
tts: {
  provider: "azure",
  language: "en-GB",
  voice: "en-GB-SoniaNeural"
}

// German - Male
tts: {
  provider: "azure",
  language: "de-DE",
  voice: "de-DE-ConradNeural"
}

// Spanish - Female
tts: {
  provider: "azure",
  language: "es-ES",
  voice: "es-ES-ElviraNeural"
}

Popular Azure Voices

Language	Voice Name	Gender	Description
en-US	en-US-JennyNeural	Female	Friendly, professional
en-US	en-US-GuyNeural	Male	Clear, neutral
en-GB	en-GB-SoniaNeural	Female	British, professional
en-GB	en-GB-RyanNeural	Male	British, friendly
de-DE	de-DE-KatjaNeural	Female	Professional, clear
de-DE	de-DE-ConradNeural	Male	Deep, authoritative

Full Voice List: See Azure TTS documentation for complete list of 400+ voices in 140+ languages.

ElevenLabs

ElevenLabs provides ultra-realistic AI voices optimized for conversational use cases.

typescript

interface TtsProviderConfigElevenLabs {
  provider: "eleven_labs";
  voice?: string; // Voice ID (e.g., "21m00Tcm4TlvDq8ikWAM") - optional, uses default if omitted
}

Example:

typescript

// With specific voice
tts: {
  provider: "eleven_labs",
  voice: "21m00Tcm4TlvDq8ikWAM"  // Rachel
}

// With default voice
tts: {
  provider: "eleven_labs"
}

Available ElevenLabs Voices

The "Sample" column plays a representative greeting from a voice assistant scenario. Multilingual voices include both German and English samples; voices verified for German only have a German sample.

Voice Name	ID	Description	Verified Locales
sipgate	dSu12TX3MEDQXAarG4s6	Clean male voice used by sipgate for system announcements (default).	de-DE
Rachel	21m00Tcm4TlvDq8ikWAM	Matter-of-fact, personable woman. Great for conversational use cases.	en-US
Sarah	EXAVITQu4vr4xnSDxMaL	Young adult woman with a confident and warm, mature quality.	en-US, fr-FR, cmn-CN, hi-IN
Laura	FGY2WhTYpPnrIDTdsKH5	Young adult female delivers sunny enthusiasm with quirky attitude.	en-US, fr-FR, cmn-CN, de-DE
George	JBFqnCBsd6RMkjVDRZzb	Warm resonance that instantly captivates listeners.	en-GB, fr-FR, ja-JP, cs-CZ
Thomas	GBv7mTt0atIp3Br8iCZE	Soft and subdued male, optimal for narrations or meditations.	en-US
Roger	CwhRBWXzGAHq8TQ4Fs17	Easy going and perfect for casual conversations.	en-US, fr-FR, de-DE, nl-NL
Eric	cjVigY5qzO86Huf0OWal	Smooth tenor pitch from a man in his 40s - perfect for agentic use cases.	en-US, fr-FR, de-DE, sk-SK
Brian	nPczCjzI2devNBz1zQrb	Middle-aged man with resonant and comforting tone.	en-US, cmn-CN, de-DE, nl-NL
Jessica	cgSgspJ2msm6clMCkdW9	Young and playful American female, perfect for trendy content.	en-US, fr-FR, ja-JP, cmn-CN, de-DE
Liam	TX3LPaxmHKxFdv7VOQHJ	Young adult with energy and warmth - suitable for reels and shorts.	en-US, de-DE, cs-CZ, pl-PL, tr-TR
Alice	Xb7hH8MSUJpSbSDYk0k2	Clear and engaging, friendly British woman suitable for e-learning.	en-GB, it-IT, fr-FR, ja-JP, pl-PL
Daniel	onwK4e9ZLuTAKqWW03F9	Strong voice perfect for professional broadcast or news.	en-GB, de-DE, tr-TR
Lily	pFZP5JQG7iQjIQuC4Bku	Velvety British female delivers news with warmth and clarity.	it-IT, de-DE, cmn-CN, cs-CZ, nl-NL
River	SAz9YHcvj6GT2YYXdXww	Relaxed, neutral voice ready for narrations or conversational projects.	en-US, it-IT, fr-FR, cmn-CN
Charlie	IKne3meq5aSn9XLyUdCD	Young Australian male with confident and energetic voice.	en-AU, cmn-CN, fil-PH
Aria	9BWtsMINqrJLrRacOk9x	Middle-aged female with African-American accent. Calm with hint of rasp.	en-US, fr-FR, cmn-CN, tr-TR
Matilda	XrExE9yKIg1WjnnlVkGX	Professional woman with pleasing alto pitch. Suitable for many use cases.	en-US, it-IT, fr-FR, de-DE
Will	bIHbv24MWmeRgasZH58o	Conversational and laid back.	en-US, fr-FR, de-DE, cmn-CN, cs-CZ
Chris	iP95p4xoKVk53GoZ742B	Natural and real, down-to-earth voice great across many use-cases.	en-US, fr-FR, sv-SE, hi-IN
Bill	pqHfZKP75CvOlQylNhV4	Friendly and comforting voice ready to narrate stories.	en-US, fr-FR, cmn-CN, de-DE, cs-CZ

Note: 50+ voices available in total. The full list with samples is in the API reference. The SDK includes full TypeScript type definitions for all voice IDs and names.

Choosing a TTS Provider

Use Azure when:

You need support for many languages (140+ languages available)
You want consistent quality across all locales
You need specific regional accents or dialects

Use ElevenLabs when:

You need the most natural, human-like voices
Conversational quality is critical (phone calls, virtual assistants)
You're primarily working with English or common European languages
You want voices with distinct personalities

Usage Examples

Per-Action Configuration

typescript

onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Hello in a different voice",
    tts: {
      provider: "azure",
      language: "en-US",
      voice: "en-US-JennyNeural",
    },
  };
}

Using ElevenLabs

typescript

onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Hello from ElevenLabs!",
    tts: {
      provider: "eleven_labs",
      voice: "21m00Tcm4TlvDq8ikWAM", // Rachel
    },
  };
}

Next Steps

Barge-In Configuration - Control interruption behavior
Action Types - Complete action reference

TTS Providers ​

Overview ​

Kugelaudio ​

Available Kugelaudio Voices ​

Azure Cognitive Services ​

Popular Azure Voices ​

ElevenLabs ​

Available ElevenLabs Voices ​

Choosing a TTS Provider ​

Use Azure when: ​

Use ElevenLabs when: ​

Usage Examples ​

Per-Action Configuration ​

Using ElevenLabs ​

Next Steps ​

TTS Providers

Overview

Kugelaudio

Available Kugelaudio Voices

Azure Cognitive Services

Popular Azure Voices

ElevenLabs

Available ElevenLabs Voices

Choosing a TTS Provider

Use Azure when:

Use ElevenLabs when:

Usage Examples

Per-Action Configuration

Using ElevenLabs

Next Steps