Skip to content

Barge-In Configuration

Control how users can interrupt the assistant while speaking.

Overview

Barge-in allows users to interrupt the assistant's speech. You can configure barge-in behavior for each speak or audio action.

Configuration

typescript
interface BargeInConfig {
  strategy: "none" | "manual" | "minimum_characters" | "immediate";
  minimum_characters?: number; // Default: 3 (only for minimum_characters)
  allow_after_ms?: number;     // Delay before allowing interruption
}

Strategies

none

Disables barge-in completely. Audio plays fully without interruption.

typescript
barge_in: {
  strategy: "none"
}

Use cases:

  • Critical information that must be heard
  • Legal disclaimers
  • Emergency instructions

Example:

typescript
return {
  type: "speak",
  session_id: event.session.id,
  text: "This is important information. Please listen carefully.",
  barge_in: {
    strategy: "none",
  },
};

manual

Allows manual barge-in via API only (no automatic detection).

typescript
barge_in: {
  strategy: "manual"
}

Use cases:

  • Custom interruption logic
  • Button-triggered interruption
  • External event-based interruption

Example:

typescript
return {
  type: "speak",
  session_id: event.session.id,
  text: "Press a button to interrupt.",
  barge_in: {
    strategy: "manual",
  },
};

minimum_characters

Automatically detects barge-in when user speech exceeds character threshold.

typescript
barge_in: {
  strategy: "minimum_characters",
  minimum_characters: 5,      // Trigger after 5 characters
  allow_after_ms: 500          // Wait 500ms before allowing interruption
}

Use cases:

  • Natural conversation flow
  • Customer service scenarios
  • Interactive voice menus

Example:

typescript
return {
  type: "speak",
  session_id: event.session.id,
  text: "How can I help you today?",
  barge_in: {
    strategy: "minimum_characters",
    minimum_characters: 3,
  },
};

immediate ⚡ NEW

Most responsive option - Interrupts immediately when user starts speaking using Voice Activity Detection (VAD).

typescript
barge_in: {
  strategy: "immediate",
  allow_after_ms: 500  // Optional: protect first 500ms
}

How it works:

  • Azure/Deepgram: Uses Voice Activity Detection (VAD) - triggers before any text is recognized
  • ElevenLabs: Uses first partial transcript
  • Latency: 20-100ms (2-4x faster than minimum_characters)
  • No text required: Interrupts on voice detection, not transcription

Use cases:

  • High-priority conversations requiring instant responsiveness
  • Natural dialogue where interruptions should feel seamless
  • Customer service where quick response matters
  • Urgent or time-sensitive interactions

Example:

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "I can help you with billing, support, or sales. What would you like?",
    barge_in: {
      strategy: "immediate",
      allow_after_ms: 500, // Protect first 500ms from accidental noise
    },
  };
}

Comparison:

StrategyTriggerLatencyUse Case
immediateVoice Activity (VAD)20-100msMost natural, instant response
minimum_charactersText recognition50-200msBalanced reliability
manualAPI callN/ACustom logic
noneNeverN/ACritical info only

Best practices:

  • Use allow_after_ms: 500-1000 to prevent accidental interruptions
  • Test with real users to find optimal settings
  • Consider background noise in your environment

Protection Period

You can add a protection period to prevent interruption during critical parts of speech:

typescript
return {
  type: "speak",
  session_id: event.session.id,
  text: "Your account number is 1234567890. Please write this down.",
  barge_in: {
    strategy: "minimum_characters",
    minimum_characters: 10, // Require substantial speech
    allow_after_ms: 2000,    // Protect first 2 seconds
  },
};

Configuration Options

minimum_characters

The minimum number of characters the user must speak before barge-in is triggered.

  • Default: 3
  • Range: 1 to 100
  • Use: Higher values require more speech before interruption

allow_after_ms

Delay in milliseconds before barge-in is allowed. This creates a "protection period" at the start of speech.

  • Default: 0 (immediate)
  • Range: 0 to 10000 (10 seconds)
  • Use: Prevent interruption during critical information

Examples

Natural Conversation

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "I can help you with billing, support, or sales. What would you like?",
    barge_in: {
      strategy: "minimum_characters",
      minimum_characters: 3,
    },
  };
}

Critical Information

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Your verification code is 1-2-3-4-5-6. Please write this down.",
    barge_in: {
      strategy: "none", // Don't allow interruption
    },
  };
}

Protected Announcement

typescript
onSessionStart: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Welcome! Your call may be recorded for quality assurance.",
    barge_in: {
      strategy: "minimum_characters",
      minimum_characters: 5,
      allow_after_ms: 3000, // Protect first 3 seconds
    },
  };
}

Best Practices

  1. Use none sparingly - Only for truly critical information
  2. Choose the right strategy:
    • immediate - For most natural, responsive conversations
    • minimum_characters - For balance between responsiveness and reliability
    • manual - For custom logic
    • none - For critical announcements only
  3. Set protection periods - Use allow_after_ms: 500-1000 to prevent cutting off important intro
  4. Test with users - Find the right balance for your use case
  5. Consider noise - immediate may trigger on background noise; use allow_after_ms as buffer

Next Steps