Appearance
VAD (Voice Activity Detection) Configuration
Advanced setting that lets you tune how long the system waits in silence before treating the caller's turn as finished. Useful for call flows where the caller is expected to pause (think aloud, list items, spell things out) or where you want a snappier turn-taking rhythm.
Optional advanced setting
The default behaviour is tuned for typical conversations. Only set vad when you have a concrete use case where the system's default end-of-turn timing is too eager or too patient. When omitted, the system default applies.
Where to set it
VAD config is accepted in two places:
- Per
speakaction — applies to the caller's reply that follows. The setting persists until overridden by anotherspeak.vador byconfigure_transcription.vad. - On
configure_transcription— sets the value for the rest of the session (until overridden again).
Schema
json
{
"vad": {
"end_of_turn_silence_ms": 1200
}
}| Field | Type | Recommended range | Description |
|---|---|---|---|
end_of_turn_silence_ms | number | 150–2000 | Milliseconds of silence after the caller stops speaking before their turn is considered finished. |
Lower values yield faster turn-taking; higher values tolerate longer pauses.
Lenient validation
If you send an out-of-range, non-integer, or otherwise invalid value, the value is silently ignored — the system default takes over and the rest of your action is processed normally. This avoids breaking call flows over a typo.
Example: tolerate long pauses (e.g. spelling)
json
{
"type": "speak",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Please spell your last name, letter by letter.",
"vad": {
"end_of_turn_silence_ms": 1500
}
}Example: snappy back-and-forth (e.g. yes/no questions)
json
{
"type": "speak",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Did you mean account number 1234?",
"vad": {
"end_of_turn_silence_ms": 250
}
}Example: set once for the whole session
json
{
"type": "configure_transcription",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"vad": {
"end_of_turn_silence_ms": 1000
}
}Notes
- The setting takes effect immediately — speech happens before the caller can reply, so any internal reconfiguration completes before the system needs to listen again.
- VAD tuning and barge-in are related but distinct:
vadgoverns when the caller's turn is considered finished, whilebarge_ingoverns whether and how the caller may interrupt the assistant while it is speaking. Both can be set on the samespeakaction.