Skip to content

Barge-In Configuration

Control how users can interrupt the assistant while speaking.

Overview

Barge-in allows users to interrupt the assistant's speech. Configure it per action using the barge_in field.

Configuration

json
{
  "type": "speak",
  "session_id": "session-123",
  "text": "Hello!",
  "barge_in": {
    "strategy": "minimum_characters",
    "minimum_characters": 3,
    "allow_after_ms": 500
  }
}

Strategies

none

Disables barge-in completely. Audio plays fully without interruption.

json
{
  "barge_in": {
    "strategy": "none"
  }
}

Use cases:

  • Critical information
  • Legal disclaimers
  • Emergency instructions

manual

Allows manual barge-in via API only (no automatic detection).

json
{
  "barge_in": {
    "strategy": "manual"
  }
}

Use cases:

  • Custom interruption logic
  • Button-triggered interruption
  • External event-based interruption

minimum_characters

Automatically detects barge-in when user speech exceeds character threshold.

json
{
  "barge_in": {
    "strategy": "minimum_characters",
    "minimum_characters": 5,
    "allow_after_ms": 500
  }
}

Use cases:

  • Natural conversation flow
  • Customer service scenarios
  • Interactive voice menus

immediate ⚡ NEW

Most responsive option - Interrupts immediately when user starts speaking, using Voice Activity Detection (VAD).

json
{
  "barge_in": {
    "strategy": "immediate",
    "allow_after_ms": 500
  }
}

How it works:

  • Azure/Deepgram: Uses VAD (Voice Activity Detection) - triggers before any text is recognized
  • ElevenLabs: Uses first partial transcript
  • Latency: 20-100ms (2-4x faster than minimum_characters)
  • No text required: Interrupts on voice detection, not transcription

Use cases:

  • High-priority conversations requiring instant responsiveness
  • Natural dialogue where interruptions should feel seamless
  • Customer service where quick response matters
  • Urgent or time-sensitive interactions

Best practices:

  • Use allow_after_ms: 500-1000 to prevent accidental interruptions at start
  • Test with real users to find optimal allow_after_ms value
  • Consider network latency in production environments

Comparison with minimum_characters:

Featureimmediateminimum_characters
TriggerVoice Activity (VAD)Text recognition (3+ characters)
Latency20-100ms50-200ms
User ExperienceInstant interruptionSlight delay
AccuracyMay trigger on noiseMore reliable (text-based)

Configuration Options

minimum_characters

Minimum number of characters before barge-in triggers.

  • Default: 3
  • Range: 1 to 100
  • Higher values: Require more speech before interruption

allow_after_ms

Delay in milliseconds before barge-in is allowed (protection period).

  • Default: 0 (immediate)
  • Range: 0 to 10000 (10 seconds)
  • Use: Prevent interruption during critical information

Examples

Natural Conversation

json
{
  "type": "speak",
  "session_id": "session-123",
  "text": "I can help you with billing, support, or sales.",
  "barge_in": {
    "strategy": "minimum_characters",
    "minimum_characters": 3
  }
}

Critical Information

json
{
  "type": "speak",
  "session_id": "session-123",
  "text": "Your verification code is 1-2-3-4-5-6.",
  "barge_in": {
    "strategy": "none"
  }
}

Protected Announcement

json
{
  "type": "speak",
  "session_id": "session-123",
  "text": "Your account number is 1234567890.",
  "barge_in": {
    "strategy": "minimum_characters",
    "minimum_characters": 10,
    "allow_after_ms": 2000
  }
}

Instant Response (Immediate) ⚡

json
{
  "type": "speak",
  "session_id": "session-123",
  "text": "I can help you with your order, account, or technical support. What would you like to know?",
  "barge_in": {
    "strategy": "immediate",
    "allow_after_ms": 500
  }
}

Result: Assistant stops speaking the moment user starts talking (20-100ms latency), providing the most natural conversation experience.

Best Practices

  1. Use none sparingly - Only for truly critical information
  2. Choose the right strategy:
    • immediate - For most natural, responsive conversations
    • minimum_characters - For balance between responsiveness and reliability
    • manual - For custom logic
    • none - For critical announcements only
  3. Set protection periods - Use allow_after_ms: 500-1000 to prevent cutting off important intro
  4. Test with users - Find the right balance for your use case
  5. Consider noise - immediate may trigger on background noise; use allow_after_ms as buffer

Next Steps