Skip to content

Assistant Speak Event

Triggered after the assistant starts speaking. Event may be omitted for some text-to-speech models.

Event Structure

json
{
  "type": "assistant_speak",
  "text": "Hello! How can I help you?",
  "ssml": "<speak version=\"1.0\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello!</voice></speak>",
  "duration_ms": 2000,
  "speech_started_at": 1234567890,
  "session": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "account_id": "account-123",
    "phone_number": "+1234567890"
  }
}

Fields

FieldTypeRequiredDescription
typestringYesAlways "assistant_speak"
textstringNoText that was spoken
ssmlstringNoSSML that was used (if applicable)
duration_msnumberYesDuration of speech in milliseconds
speech_started_atnumberYesUnix timestamp (ms) when speech started
session.idstring (UUID)YesSession identifier
session.account_idstringYesAccount identifier
session.phone_numberstringYesPhone number for this flow session

Response

You can return any action or 204 No Content. Common uses:

  • Track metrics - Log conversation analytics
  • Chain actions - Trigger follow-up actions
  • No response - Just track the event

Examples

Track Metrics

python
@app.route('/webhook', methods=['POST'])
def webhook():
    event = request.json

    if event['type'] == 'assistant_speak':
        # Track metrics
        track_metrics({
            'session_id': event['session']['id'],
            'duration_ms': event['duration_ms'],
            'text': event.get('text', '')
        })
        return '', 204

Chain Actions

python
# Store what to do next
session_state = {}

@app.route('/webhook', methods=['POST'])
def webhook():
    event = request.json
    session_id = event['session']['id']

    if event['type'] == 'user_speak':
        # Set next action
        session_state[session_id] = 'play_audio'
        return jsonify({
            'type': 'speak',
            'session_id': session_id,
            'text': 'Please listen to this message.'
        })

    if event['type'] == 'assistant_speak':
        # Execute next action
        if session_state.get(session_id) == 'play_audio':
            del session_state[session_id]
            return jsonify({
                'type': 'audio',
                'session_id': session_id,
                'audio': 'base64-audio-data'
            })

    return '', 204

Use Cases

  • Analytics - Track conversation metrics
  • Action chaining - Trigger follow-up actions
  • Logging - Record what was said
  • Timing - Measure response times

Best Practices

  1. Don't block - Process quickly
  2. Track metrics - Use for analytics
  3. Chain carefully - Avoid infinite loops
  4. Log interactions - For debugging

Next Steps