Appearance
Audio Action
Play pre-recorded audio to the user.
Action Structure
json
{
"type": "audio",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=",
"barge_in": {
"strategy": "minimum_characters",
"minimum_characters": 3
}
}Fields
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Always "audio" |
session_id | string (UUID) | Yes | Session identifier from event |
audio | string | Yes | Base64 encoded WAV audio data |
barge_in | object | No | Barge-in behavior configuration |
Audio Format Requirements
The audio must be in the following format:
- Format: WAV
- Sample Rate: 16kHz
- Channels: Mono (single channel)
- Bit Depth: 16-bit PCM
- Encoding: Base64
Simple Example
json
{
"type": "audio",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA="
}With Barge-In Configuration
json
{
"type": "audio",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=",
"barge_in": {
"strategy": "minimum_characters",
"minimum_characters": 3,
"allow_after_ms": 1000
}
}Examples
Python
python
import base64
@app.route('/webhook', methods=['POST'])
def webhook():
event = request.json
if event['type'] == 'user_speak':
# Read audio file and encode to base64
with open('hold-music.wav', 'rb') as audio_file:
audio_data = audio_file.read()
base64_audio = base64.b64encode(audio_data).decode('utf-8')
return jsonify({
'type': 'audio',
'session_id': event['session']['id'],
'audio': base64_audio,
'barge_in': {
'strategy': 'minimum_characters',
'minimum_characters': 3
}
})Node.js
javascript
const fs = require('fs');
app.post('/webhook', (req, res) => {
const event = req.body;
if (event.type === 'user_speak') {
// Read audio file and encode to base64
const audioData = fs.readFileSync('hold-music.wav');
const base64Audio = audioData.toString('base64');
return res.json({
type: 'audio',
session_id: event.session.id,
audio: base64Audio,
barge_in: {
strategy: 'minimum_characters',
minimum_characters: 3
}
});
}
});Go
go
import (
"encoding/base64"
"io/ioutil"
)
func webhook(w http.ResponseWriter, r *http.Request) {
var event map[string]interface{}
json.NewDecoder(r.Body).Decode(&event)
if event["type"] == "user_speak" {
// Read audio file and encode to base64
audioData, _ := ioutil.ReadFile("hold-music.wav")
base64Audio := base64.StdEncoding.EncodeToString(audioData)
session := event["session"].(map[string]interface{})
action := map[string]interface{}{
"type": "audio",
"session_id": session["id"],
"audio": base64Audio,
"barge_in": map[string]interface{}{
"strategy": "minimum_characters",
"minimum_characters": 3,
},
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(action)
return
}
}Converting Audio Files
Using FFmpeg
Convert any audio file to the required format:
bash
ffmpeg -i input.mp3 -ar 16000 -ac 1 -sample_fmt s16 -f wav output.wavParameters:
-ar 16000- Set sample rate to 16kHz-ac 1- Set to mono (1 channel)-sample_fmt s16- Set to 16-bit PCM-f wav- Output WAV format
Python Script
python
import base64
def convert_audio_to_base64(audio_file_path):
with open(audio_file_path, 'rb') as f:
audio_data = f.read()
return base64.b64encode(audio_data).decode('utf-8')
# Usage
base64_audio = convert_audio_to_base64('hold-music.wav')Barge-In Configuration
Control how users can interrupt audio playback:
json
{
"barge_in": {
"strategy": "none"
}
}See Barge-In Configuration for details.
Use Cases
- Hold music - Play music while user waits
- Pre-recorded messages - Play announcements or greetings
- Sound effects - Play notification sounds
- Background audio - Ambient sounds during conversation
Best Practices
- Keep files small - Large audio files increase latency
- Use appropriate format - Ensure WAV, 16kHz, mono, 16-bit
- Test playback - Verify audio quality before production
- Configure barge-in - Allow natural interruptions when appropriate
- Cache base64 - Encode once, reuse the base64 string
Troubleshooting
Audio Not Playing
- Verify audio format matches requirements exactly
- Check base64 encoding is correct
- Ensure audio file is not corrupted
- Test with a known-good audio file
Audio Quality Issues
- Ensure sample rate is exactly 16kHz
- Verify mono channel (not stereo)
- Check bit depth is 16-bit PCM
- Re-encode source audio if needed
Next Steps
- Barge-In Configuration - Control interruption behavior
- Speak Action - Text-to-speech alternative
- Action Types - Complete action reference