Skip to content

Audio Action

Play pre-recorded audio to the user.

Action Structure

json
{
  "type": "audio",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=",
  "barge_in": {
    "strategy": "minimum_characters",
    "minimum_characters": 3
  }
}

Fields

FieldTypeRequiredDescription
typestringYesAlways "audio"
session_idstring (UUID)YesSession identifier from event
audiostringYesBase64 encoded WAV audio data
barge_inobjectNoBarge-in behavior configuration

Audio Format Requirements

The audio must be in the following format:

  • Format: WAV
  • Sample Rate: 16kHz
  • Channels: Mono (single channel)
  • Bit Depth: 16-bit PCM
  • Encoding: Base64

Simple Example

json
{
  "type": "audio",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA="
}

With Barge-In Configuration

json
{
  "type": "audio",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgAZGF0YQAAAAA=",
  "barge_in": {
    "strategy": "minimum_characters",
    "minimum_characters": 3,
    "allow_after_ms": 1000
  }
}

Examples

Python

python
import base64

@app.route('/webhook', methods=['POST'])
def webhook():
    event = request.json

    if event['type'] == 'user_speak':
        # Read audio file and encode to base64
        with open('hold-music.wav', 'rb') as audio_file:
            audio_data = audio_file.read()
            base64_audio = base64.b64encode(audio_data).decode('utf-8')

        return jsonify({
            'type': 'audio',
            'session_id': event['session']['id'],
            'audio': base64_audio,
            'barge_in': {
                'strategy': 'minimum_characters',
                'minimum_characters': 3
            }
        })

Node.js

javascript
const fs = require('fs');

app.post('/webhook', (req, res) => {
  const event = req.body;

  if (event.type === 'user_speak') {
    // Read audio file and encode to base64
    const audioData = fs.readFileSync('hold-music.wav');
    const base64Audio = audioData.toString('base64');

    return res.json({
      type: 'audio',
      session_id: event.session.id,
      audio: base64Audio,
      barge_in: {
        strategy: 'minimum_characters',
        minimum_characters: 3
      }
    });
  }
});

Go

go
import (
    "encoding/base64"
    "io/ioutil"
)

func webhook(w http.ResponseWriter, r *http.Request) {
    var event map[string]interface{}
    json.NewDecoder(r.Body).Decode(&event)

    if event["type"] == "user_speak" {
        // Read audio file and encode to base64
        audioData, _ := ioutil.ReadFile("hold-music.wav")
        base64Audio := base64.StdEncoding.EncodeToString(audioData)

        session := event["session"].(map[string]interface{})
        action := map[string]interface{}{
            "type":       "audio",
            "session_id": session["id"],
            "audio":      base64Audio,
            "barge_in": map[string]interface{}{
                "strategy":          "minimum_characters",
                "minimum_characters": 3,
            },
        }
        w.Header().Set("Content-Type", "application/json")
        json.NewEncoder(w).Encode(action)
        return
    }
}

Converting Audio Files

Using FFmpeg

Convert any audio file to the required format:

bash
ffmpeg -i input.mp3 -ar 16000 -ac 1 -sample_fmt s16 -f wav output.wav

Parameters:

  • -ar 16000 - Set sample rate to 16kHz
  • -ac 1 - Set to mono (1 channel)
  • -sample_fmt s16 - Set to 16-bit PCM
  • -f wav - Output WAV format

Python Script

python
import base64

def convert_audio_to_base64(audio_file_path):
    with open(audio_file_path, 'rb') as f:
        audio_data = f.read()
        return base64.b64encode(audio_data).decode('utf-8')

# Usage
base64_audio = convert_audio_to_base64('hold-music.wav')

Barge-In Configuration

Control how users can interrupt audio playback:

json
{
  "barge_in": {
    "strategy": "none"
  }
}

See Barge-In Configuration for details.

Use Cases

  • Hold music - Play music while user waits
  • Pre-recorded messages - Play announcements or greetings
  • Sound effects - Play notification sounds
  • Background audio - Ambient sounds during conversation

Best Practices

  1. Keep files small - Large audio files increase latency
  2. Use appropriate format - Ensure WAV, 16kHz, mono, 16-bit
  3. Test playback - Verify audio quality before production
  4. Configure barge-in - Allow natural interruptions when appropriate
  5. Cache base64 - Encode once, reuse the base64 string

Troubleshooting

Audio Not Playing

  • Verify audio format matches requirements exactly
  • Check base64 encoding is correct
  • Ensure audio file is not corrupted
  • Test with a known-good audio file

Audio Quality Issues

  • Ensure sample rate is exactly 16kHz
  • Verify mono channel (not stereo)
  • Check bit depth is 16-bit PCM
  • Re-encode source audio if needed

Next Steps