Appearance
Assistant Speak Event
Triggered after the assistant starts speaking. Event may be omitted for some text-to-speech models.
Event Structure
json
{
"type": "assistant_speak",
"text": "Hello! How can I help you?",
"ssml": "<speak version=\"1.0\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello!</voice></speak>",
"duration_ms": 2000,
"speech_started_at": 1234567890,
"session": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"account_id": "account-123",
"phone_number": "+1234567890"
}
}Fields
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Always "assistant_speak" |
text | string | No | Text that was spoken |
ssml | string | No | SSML that was used (if applicable) |
duration_ms | number | Yes | Duration of speech in milliseconds |
speech_started_at | number | Yes | Unix timestamp (ms) when speech started |
session.id | string (UUID) | Yes | Session identifier |
session.account_id | string | Yes | Account identifier |
session.phone_number | string | Yes | Phone number for this flow session |
Response
You can return any action or 204 No Content. Common uses:
- Track metrics - Log conversation analytics
- Chain actions - Trigger follow-up actions
- No response - Just track the event
Examples
Track Metrics
python
@app.route('/webhook', methods=['POST'])
def webhook():
event = request.json
if event['type'] == 'assistant_speak':
# Track metrics
track_metrics({
'session_id': event['session']['id'],
'duration_ms': event['duration_ms'],
'text': event.get('text', '')
})
return '', 204Chain Actions
python
# Store what to do next
session_state = {}
@app.route('/webhook', methods=['POST'])
def webhook():
event = request.json
session_id = event['session']['id']
if event['type'] == 'user_speak':
# Set next action
session_state[session_id] = 'play_audio'
return jsonify({
'type': 'speak',
'session_id': session_id,
'text': 'Please listen to this message.'
})
if event['type'] == 'assistant_speak':
# Execute next action
if session_state.get(session_id) == 'play_audio':
del session_state[session_id]
return jsonify({
'type': 'audio',
'session_id': session_id,
'audio': 'base64-audio-data'
})
return '', 204Use Cases
- Analytics - Track conversation metrics
- Action chaining - Trigger follow-up actions
- Logging - Record what was said
- Timing - Measure response times
Best Practices
- Don't block - Process quickly
- Track metrics - Use for analytics
- Chain carefully - Avoid infinite loops
- Log interactions - For debugging
Next Steps
- User Speak Event - Handle user input
- Action Types - All available actions
- Event Flow - Understand the complete flow