Assistant Speak Event

Triggered after the assistant starts speaking. Event may be omitted for some text-to-speech models.

Event Structure

json

{
  "type": "assistant_speak",
  "text": "Hello! How can I help you?",
  "ssml": "<speak version=\"1.0\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello!</voice></speak>",
  "duration_ms": 2000,
  "speech_started_at": 1234567890,
  "session": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "account_id": "account-123",
    "phone_number": "+1234567890"
  }
}

Fields

Field	Type	Required	Description
`type`	string	Yes	Always `"assistant_speak"`
`text`	string	No	Text that was spoken
`ssml`	string	No	SSML that was used (if applicable)
`duration_ms`	number	Yes	Duration of speech in milliseconds
`speech_started_at`	number	Yes	Unix timestamp (ms) when speech started
`session.id`	string (UUID)	Yes	Session identifier
`session.account_id`	string	Yes	Account identifier
`session.phone_number`	string	Yes	Phone number for this flow session

Response

You can return any action or 204 No Content. Common uses:

Track metrics - Log conversation analytics
Chain actions - Trigger follow-up actions
No response - Just track the event

Examples

Track Metrics

python

@app.route('/webhook', methods=['POST'])
def webhook():
    event = request.json

    if event['type'] == 'assistant_speak':
        # Track metrics
        track_metrics({
            'session_id': event['session']['id'],
            'duration_ms': event['duration_ms'],
            'text': event.get('text', '')
        })
        return '', 204

Chain Actions

python

# Store what to do next
session_state = {}

@app.route('/webhook', methods=['POST'])
def webhook():
    event = request.json
    session_id = event['session']['id']

    if event['type'] == 'user_speak':
        # Set next action
        session_state[session_id] = 'play_audio'
        return jsonify({
            'type': 'speak',
            'session_id': session_id,
            'text': 'Please listen to this message.'
        })

    if event['type'] == 'assistant_speak':
        # Execute next action
        if session_state.get(session_id) == 'play_audio':
            del session_state[session_id]
            return jsonify({
                'type': 'audio',
                'session_id': session_id,
                'audio': 'base64-audio-data'
            })

    return '', 204

Use Cases

Analytics - Track conversation metrics
Action chaining - Trigger follow-up actions
Logging - Record what was said
Timing - Measure response times

Best Practices

Don't block - Process quickly
Track metrics - Use for analytics
Chain carefully - Avoid infinite loops
Log interactions - For debugging

Next Steps

User Speak Event - Handle user input
Action Types - All available actions
Event Flow - Understand the complete flow

Assistant Speak Event ​

Event Structure ​

Fields ​

Response ​

Examples ​

Track Metrics ​

Chain Actions ​

Use Cases ​

Best Practices ​

Next Steps ​

Assistant Speak Event

Event Structure

Fields

Response

Examples

Track Metrics

Chain Actions

Use Cases

Best Practices

Next Steps