Appearance
Action Types
Complete reference for all actions you can return from event handlers.
Overview
Actions are responses that tell the AI Flow service what to do next. All actions require a session_id and type field.
Base Action Structure
typescript
interface BaseAction {
session_id: string; // UUID from the event's session.id
type: string; // Action type identifier
}Action Summary
| Action Type | Description | Primary Use Case |
|---|---|---|
speak | Speak text or SSML | Respond to user with synthesized speech |
audio | Play pre-recorded audio | Play hold music, pre-recorded messages |
hangup | End the call | Terminate conversation |
transfer | Transfer to another number | Route to human agent or department |
barge_in | Manually interrupt playback | Stop current audio immediately |
Speak Action
Speaks text or SSML to the user.
typescript
interface AiFlowActionSpeak {
type: "speak";
session_id: string;
// Either text OR ssml (not both)
text?: string; // Plain text to speak
ssml?: string; // SSML markup for advanced control
// Optional configurations
tts?: TtsConfig; // TTS provider settings
barge_in?: BargeInConfig; // Barge-in behavior
}Examples:
typescript
// Simple text
return {
type: "speak",
session_id: event.session.id,
text: "Hello, how can I help you?",
};
// With SSML
return {
type: "speak",
session_id: event.session.id,
ssml: `
<speak version="1.0" xml:lang="en-US">
<voice name="en-US-JennyNeural">
<prosody rate="slow">Please listen carefully.</prosody>
<break time="500ms"/>
Your account balance is <say-as interpret-as="currency">$42.50</say-as>
</voice>
</speak>
`,
};
// With custom TTS provider
return {
type: "speak",
session_id: event.session.id,
text: "Hello in a different voice",
tts: {
provider: "azure",
language: "en-US",
voice: "en-US-JennyNeural",
},
};Audio Action
Plays pre-recorded audio to the user.
typescript
interface AiFlowActionAudio {
type: "audio";
session_id: string;
audio: string; // Base64 encoded WAV (16kHz, mono, 16-bit)
barge_in?: BargeInConfig;
}Example:
typescript
// Play hold music or pre-recorded message
return {
type: "audio",
session_id: event.session.id,
audio: base64EncodedWavData,
barge_in: {
strategy: "minimum_characters",
minimum_characters: 3,
},
};Audio Format Requirements:
- Format: WAV
- Sample Rate: 16kHz
- Channels: Mono
- Bit Depth: 16-bit PCM
- Encoding: Base64
Hangup Action
Ends the call.
typescript
interface AiFlowActionHangup {
type: "hangup";
session_id: string;
}Example:
typescript
onUserSpeak: async (event) => {
if (event.text.toLowerCase().includes("goodbye")) {
return {
type: "hangup",
session_id: event.session.id,
};
}
};Transfer Action
Transfers the call to another phone number.
typescript
interface AiFlowActionTransfer {
type: "transfer";
session_id: string;
target_phone_number: string; // E.164 format recommended
caller_id_name: string;
caller_id_number: string;
}Example:
typescript
// Transfer to sales department
return {
type: "transfer",
session_id: event.session.id,
target_phone_number: "+1234567890",
caller_id_name: "Sales Department",
caller_id_number: "+1234567890",
};Barge-In Action
Manually triggers barge-in (interrupts current playback).
typescript
interface AiFlowActionBargeIn {
type: "barge_in";
session_id: string;
}Example:
typescript
// Manually interrupt current playback
return {
type: "barge_in",
session_id: event.session.id,
};Type Safety
All actions are fully typed. Import types from the SDK:
typescript
import type {
AiFlowAction,
AiFlowActionSpeak,
AiFlowActionAudio,
AiFlowActionHangup,
AiFlowActionTransfer,
AiFlowActionBargeIn,
} from "@sipgate/ai-flow-sdk";
onUserSpeak: async (event) => {
const action: AiFlowActionSpeak = {
type: "speak",
session_id: event.session.id,
text: "Hello!",
};
return action;
};Next Steps
- TTS Providers - Configure text-to-speech voices
- Barge-In Configuration - Control interruption behavior
- API Reference - Complete API documentation