Skip to content

Barge-In Configuration

Control how users can interrupt the assistant while speaking.

Overview

Barge-in allows users to interrupt the assistant's speech. You can configure barge-in behavior for each speak or audio action.

Configuration

typescript
interface BargeInConfig {
  strategy: "none" | "manual" | "minimum_characters" | "immediate";
  minimum_characters?: number; // Default: 3 (only for minimum_characters)
  allow_after_ms?: number;     // Delay before allowing interruption
}

Strategies

none

Disables barge-in completely. Audio plays fully without interruption.

typescript
barge_in: {
  strategy: "none"
}

Use cases:

  • Critical information that must be heard
  • Legal disclaimers
  • Emergency instructions

Example:

typescript
return {
  type: "speak",
  session_id: event.session.id,
  text: "This is important information. Please listen carefully.",
  barge_in: {
    strategy: "none",
  },
};

manual

Allows manual barge-in via API only (no automatic detection).

typescript
barge_in: {
  strategy: "manual"
}

Use cases:

  • Custom interruption logic
  • Button-triggered interruption
  • External event-based interruption

Example:

typescript
return {
  type: "speak",
  session_id: event.session.id,
  text: "Press a button to interrupt.",
  barge_in: {
    strategy: "manual",
  },
};

minimum_characters

Automatically detects barge-in when user speech exceeds character threshold.

typescript
barge_in: {
  strategy: "minimum_characters",
  minimum_characters: 5,      // Trigger after 5 characters
  allow_after_ms: 500          // Wait 500ms before allowing interruption
}

Use cases:

  • Natural conversation flow
  • Customer service scenarios
  • Interactive voice menus

Example:

typescript
return {
  type: "speak",
  session_id: event.session.id,
  text: "How can I help you today?",
  barge_in: {
    strategy: "minimum_characters",
    minimum_characters: 3,
  },
};

immediate ⚡ NEW

Most responsive option - Interrupts immediately when user starts speaking using Voice Activity Detection (VAD).

typescript
barge_in: {
  strategy: "immediate",
  allow_after_ms: 500  // Optional: protect first 500ms
}

How it works:

  • Azure/Deepgram: Uses Voice Activity Detection (VAD) - triggers before any text is recognized
  • ElevenLabs: Uses first partial transcript
  • Latency: 20-100ms (2-4x faster than minimum_characters)
  • No text required: Interrupts on voice detection, not transcription

Use cases:

  • High-priority conversations requiring instant responsiveness
  • Natural dialogue where interruptions should feel seamless
  • Customer service where quick response matters
  • Urgent or time-sensitive interactions

Example:

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "I can help you with billing, support, or sales. What would you like?",
    barge_in: {
      strategy: "immediate",
      allow_after_ms: 500, // Protect first 500ms from accidental noise
    },
  };
}

Comparison:

StrategyTriggerLatencyUse Case
immediateVoice Activity (VAD)20-100msMost natural, instant response
minimum_charactersText recognition50-200msBalanced reliability
manualAPI callN/ACustom logic
noneNeverN/ACritical info only

Best practices:

  • Use allow_after_ms: 500-1000 to prevent accidental interruptions
  • Test with real users to find optimal settings
  • Consider background noise in your environment

Protection Period

You can add a protection period to prevent interruption during critical parts of speech:

typescript
return {
  type: "speak",
  session_id: event.session.id,
  text: "Your account number is 1234567890. Please write this down.",
  barge_in: {
    strategy: "minimum_characters",
    minimum_characters: 10, // Require substantial speech
    allow_after_ms: 2000,    // Protect first 2 seconds
  },
};

Configuration Options

minimum_characters

The minimum number of characters the user must speak before barge-in is triggered.

  • Default: 3
  • Range: 1 to 100
  • Use: Higher values require more speech before interruption

allow_after_ms

Delay in milliseconds before barge-in is allowed. This creates a "protection period" at the start of speech.

  • Default: 0 (immediate)
  • Range: 0 to 10000 (10 seconds)
  • Use: Prevent interruption during critical information

Examples

Natural Conversation

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "I can help you with billing, support, or sales. What would you like?",
    barge_in: {
      strategy: "minimum_characters",
      minimum_characters: 3,
    },
  };
}

Critical Information

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Your verification code is 1-2-3-4-5-6. Please write this down.",
    barge_in: {
      strategy: "none", // Don't allow interruption
    },
  };
}

Protected Announcement

typescript
onSessionStart: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Welcome! Your call may be recorded for quality assurance.",
    barge_in: {
      strategy: "minimum_characters",
      minimum_characters: 5,
      allow_after_ms: 3000, // Protect first 3 seconds
    },
  };
}

Best Practices

  1. Use none sparingly - Only for truly critical information
  2. Choose the right strategy:
    • immediate - For most natural, responsive conversations
    • minimum_characters - For balance between responsiveness and reliability
    • manual - For custom logic
    • none - For critical announcements only
  3. Set protection periods - Use allow_after_ms: 500-1000 to prevent cutting off important intro
  4. Test with users - Find the right balance for your use case
  5. Consider noise - immediate may trigger on background noise; use allow_after_ms as buffer

Barge-in controls whether the caller may interrupt the assistant while it is speaking. The related VAD Configuration controls how long the caller may pause before their turn is considered finished. Both can be set on the same speak action.

Next Steps