Appearance
Action Types
Complete reference for all actions you can send to the AI Flow service.
Overview
Actions are JSON objects you send back to the AI Flow service in response to events. All actions require a session_id and type field.
Base Action Structure
json
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"type": "speak"
}Action Summary
| Action Type | Description | Primary Use Case |
|---|---|---|
speak | Speak text or SSML | Respond to user with synthesized speech |
audio | Play pre-recorded audio | Play hold music, pre-recorded messages |
mix_audio | Loop a background sound mixed into speech | Add ambient noise (café, office, train station) under the agent |
hangup | End the call | Terminate conversation |
transfer | Transfer to another number | Route to human agent or department |
barge_in | Manually interrupt playback | Stop current audio immediately |
configure_transcription | Change STT language(s) mid-call | Switch recognition language without hanging up |
configure_voice_to_voice | Switch the session into end-to-end voice-to-voice mode | Hand the conversation to a speech-to-speech model that owns audio I/O |
send_sms | Send an SMS from the account | Deliver confirmation codes, summaries, links |
Quick Reference
- Speak Action - Text-to-speech
- Audio Action - Play audio file
- Mix Audio Action - Loop a background sound mixed into outbound speech
- Hangup Action - End call
- Transfer Action - Transfer call
- Barge-In Action - Manually interrupt current playback
- Configure Transcription Action - Change STT language mid-call
- Configure Voice-to-Voice Action - End-to-end speech-to-speech mode (preview)
- Send SMS Action - Send an SMS from your account
Response Format
HTTP Webhook
Return a single action or an array of actions as JSON with 200 OK:
http
HTTP/1.1 200 OK
Content-Type: application/json
{
"type": "speak",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Hello!"
}To execute multiple actions in sequence, return an array:
http
HTTP/1.1 200 OK
Content-Type: application/json
[
{
"type": "barge_in",
"session_id": "550e8400-e29b-41d4-a716-446655440000"
},
{
"type": "speak",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Sorry, let me correct that."
}
]Or return 204 No Content if no action is needed:
http
HTTP/1.1 204 No ContentWebSocket
Send a single action or an array of actions as JSON strings:
json
{
"type": "speak",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Hello!"
}json
[
{ "type": "barge_in", "session_id": "..." },
{ "type": "speak", "session_id": "...", "text": "Sorry, let me correct that." }
]Action Flow
Common Patterns
Simple Response
json
{
"type": "speak",
"session_id": "session-123",
"text": "Hello! How can I help you?"
}Conditional Response
python
if "goodbye" in event['text'].lower():
return {
"type": "hangup",
"session_id": event['session']['id']
}
else:
return {
"type": "speak",
"session_id": event['session']['id'],
"text": "I understand."
}Multiple Actions
You can return an array of actions to execute them in sequence:
python
if event['type'] == 'user_speak':
return [
{
"type": "barge_in",
"session_id": event['session']['id']
},
{
"type": "speak",
"session_id": event['session']['id'],
"text": "Sorry, let me correct that."
}
]Actions in the array are executed one after another in order.
Alternatively, you can chain actions across events using the assistant_speak event:
python
# First response
if event['type'] == 'user_speak':
return {
"type": "speak",
"session_id": event['session']['id'],
"text": "Please listen to this message."
}
# Follow-up after assistant speaks
if event['type'] == 'assistant_speak':
return {
"type": "audio",
"session_id": event['session']['id'],
"audio": "base64-audio-data"
}Next Steps
- Speak Action - Detailed reference
- Event Types - What triggers actions
- Event Flow - Understand the complete flow