Skip to content

VAD (Voice Activity Detection) Configuration

Advanced setting that lets you tune how long the system waits in silence before treating the caller's turn as finished. Useful for call flows where the caller is expected to pause (think aloud, list items, spell things out) or where you want a snappier turn-taking rhythm.

Optional advanced setting

The default behaviour is tuned for typical conversations. Only set vad when you have a concrete use case where the system's default end-of-turn timing is too eager or too patient. When omitted, the system default applies.

Where to set it

VAD config is accepted in two places:

  • Per speak action — applies to the caller's reply that follows. The setting persists until overridden by another speak.vad or by configure_transcription.vad.
  • On configure_transcription — sets the value for the rest of the session (until overridden again).

Schema

json
{
  "vad": {
    "end_of_turn_silence_ms": 1200
  }
}
FieldTypeRecommended rangeDescription
end_of_turn_silence_msnumber150–2000Milliseconds of silence after the caller stops speaking before their turn is considered finished.

Lower values yield faster turn-taking; higher values tolerate longer pauses.

Lenient validation

If you send an out-of-range, non-integer, or otherwise invalid value, the value is silently ignored — the system default takes over and the rest of your action is processed normally. This avoids breaking call flows over a typo.

Example: tolerate long pauses (e.g. spelling)

json
{
  "type": "speak",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "text": "Please spell your last name, letter by letter.",
  "vad": {
    "end_of_turn_silence_ms": 1500
  }
}

Example: snappy back-and-forth (e.g. yes/no questions)

json
{
  "type": "speak",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "text": "Did you mean account number 1234?",
  "vad": {
    "end_of_turn_silence_ms": 250
  }
}

Example: set once for the whole session

json
{
  "type": "configure_transcription",
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "vad": {
    "end_of_turn_silence_ms": 1000
  }
}

Notes

  • The setting takes effect immediately — speech happens before the caller can reply, so any internal reconfiguration completes before the system needs to listen again.
  • VAD tuning and barge-in are related but distinct: vad governs when the caller's turn is considered finished, while barge_in governs whether and how the caller may interrupt the assistant while it is speaking. Both can be set on the same speak action.