Appearance
Action Types
Complete reference for all actions you can return from event handlers.
Overview
Actions are responses that tell the AI Flow service what to do next. All actions require a session_id and type field.
Base Action Structure
typescript
interface BaseAction {
session_id: string; // UUID from the event's session.id
type: string; // Action type identifier
}Action Summary
| Action Type | Description | Primary Use Case |
|---|---|---|
speak | Speak text or SSML | Respond to user with synthesized speech |
audio | Play pre-recorded audio | Play hold music, pre-recorded messages |
hangup | End the call | Terminate conversation |
transfer | Transfer to another number | Route to human agent or department |
barge_in | Manually interrupt playback | Stop current audio immediately |
configure_transcription | Change STT language(s) mid-call | Switch recognition language without hanging up |
Speak Action
Speaks text or SSML to the user.
typescript
interface AiFlowActionSpeak {
type: "speak";
session_id: string;
// Either text OR ssml (not both)
text?: string; // Plain text to speak
ssml?: string; // SSML markup for advanced control
// Optional configurations
tts?: TtsConfig; // TTS provider settings
barge_in?: BargeInConfig; // Barge-in behavior
}Examples:
typescript
// Simple text
return {
type: "speak",
session_id: event.session.id,
text: "Hello, how can I help you?",
};
// With SSML
return {
type: "speak",
session_id: event.session.id,
ssml: `
<speak version="1.0" xml:lang="en-US">
<voice name="en-US-JennyNeural">
<prosody rate="slow">Please listen carefully.</prosody>
<break time="500ms"/>
Your account balance is <say-as interpret-as="currency">$42.50</say-as>
</voice>
</speak>
`,
};
// With custom TTS provider
return {
type: "speak",
session_id: event.session.id,
text: "Hello in a different voice",
tts: {
provider: "azure",
language: "en-US",
voice: "en-US-JennyNeural",
},
};Audio Action
Plays pre-recorded audio to the user.
typescript
interface AiFlowActionAudio {
type: "audio";
session_id: string;
audio: string; // Base64 encoded WAV (16kHz, mono, 16-bit)
barge_in?: BargeInConfig;
}Example:
typescript
// Play hold music or pre-recorded message
return {
type: "audio",
session_id: event.session.id,
audio: base64EncodedWavData,
barge_in: {
strategy: "minimum_characters",
minimum_characters: 3,
},
};Audio Format Requirements:
- Format: WAV
- Sample Rate: 16kHz
- Channels: Mono
- Bit Depth: 16-bit PCM
- Encoding: Base64
Hangup Action
Ends the call.
typescript
interface AiFlowActionHangup {
type: "hangup";
session_id: string;
}Example:
typescript
onUserSpeak: async (event) => {
if (event.text.toLowerCase().includes("goodbye")) {
return {
type: "hangup",
session_id: event.session.id,
};
}
};Transfer Action
Transfers the call to another phone number.
typescript
interface AiFlowActionTransfer {
type: "transfer";
session_id: string;
target_phone_number: string; // E.164 format recommended
caller_id_name: string;
caller_id_number: string;
}Example:
typescript
// Transfer to sales department
return {
type: "transfer",
session_id: event.session.id,
target_phone_number: "+1234567890",
caller_id_name: "Sales Department",
caller_id_number: "+1234567890",
};Barge-In Action
Manually triggers barge-in (interrupts current playback).
typescript
interface AiFlowActionBargeIn {
type: "barge_in";
session_id: string;
}Example:
typescript
// Manually interrupt current playback
return {
type: "barge_in",
session_id: event.session.id,
};Configure Transcription Action
Change the STT (Speech-to-Text) provider and/or recognition language(s) during an active call session without hanging up.
typescript
import { TranscriptionProvider } from "@sipgate/ai-flow-sdk";
interface AiFlowActionConfigureTranscription {
type: "configure_transcription";
session_id: string;
provider?: TranscriptionProvider; // "AZURE" | "DEEPGRAM" | "ELEVEN_LABS" — omit to keep current
languages?: string[]; // BCP-47 codes, 1-4 entries — omit to reset to provider default
}At least one of provider or languages should be provided; sending neither is a no-op.
Both fields use full replace semantics — they never merge with existing settings.
Examples:
typescript
// Switch to German
return {
type: "configure_transcription",
session_id: event.session.id,
languages: ["de-DE"],
};
// Multi-language detection (German + English)
return {
type: "configure_transcription",
session_id: event.session.id,
languages: ["de-DE", "en-US"],
};
// Switch STT provider to Deepgram
return {
type: "configure_transcription",
session_id: event.session.id,
provider: "DEEPGRAM",
};
// Switch provider AND language in one step
return {
type: "configure_transcription",
session_id: event.session.id,
provider: "DEEPGRAM",
languages: ["en-US"],
};
// Reset to provider default (automatic detection)
return {
type: "configure_transcription",
session_id: event.session.id,
};Audio gap during restart: Any change requires the transcription engine to restart. Audio during the restart (~100–500 ms for language-only change, ~200–800 ms for provider switch) is dropped.
Multi-language support depends on the active STT provider:
- Azure: up to 4 languages, all used for simultaneous Language Identification (LID)
- Deepgram / ElevenLabs: single language only — only the first entry is used; additional entries are silently ignored
Barge-in latency after provider switch (for immediate strategy):
- Azure: ~20–80 ms
- Deepgram: ~20–100 ms
- ElevenLabs: ~30–120 ms
Type Safety
All actions are fully typed. Import types from the SDK:
typescript
import type {
AiFlowAction,
AiFlowActionSpeak,
AiFlowActionAudio,
AiFlowActionHangup,
AiFlowActionTransfer,
AiFlowActionBargeIn,
AiFlowActionConfigureTranscription,
} from "@sipgate/ai-flow-sdk";
import { TranscriptionProvider } from "@sipgate/ai-flow-sdk";
onUserSpeak: async (event) => {
const action: AiFlowActionSpeak = {
type: "speak",
session_id: event.session.id,
text: "Hello!",
};
return action;
};Next Steps
- TTS Providers - Configure text-to-speech voices
- Barge-In Configuration - Control interruption behavior
- API Reference - Complete API documentation