Skip to content

Barge-In Best Practices: Handling User Interruptions Gracefully

When users interrupt your voice AI assistant mid-sentence, how you respond makes the difference between a frustrating experience and a natural conversation. This guide covers best practices for handling barge-in interruptions using the barged_in flag in user_speak events.

Why Users Interrupt

In natural human conversations, interruptions happen constantly:

  • "Got it!" - They understood and don't need the rest
  • "Wait, actually..." - They want to change direction
  • "No, that's not what I meant" - Correcting a misunderstanding
  • "Yes yes, I know" - Impatient, want to move on
  • "Hold on" - Something came up

A voice assistant that ignores interruptions or handles them poorly feels robotic. Done well, barge-in handling makes your assistant feel responsive and human-like.

How Barge-In Detection Works

When a user speaks while the assistant is talking, sipgate AI Flow:

  1. Stops the assistant's speech immediately
  2. Sends a user_speak event with barged_in: true and what the user said
  3. Waits for your response (action or 204 No Content)

Basic Handling

Check the barged_in flag to detect interruptions and respond appropriately:

typescript
async function handleUserSpeak(event: {
  type: 'user_speak'
  text: string
  barged_in?: boolean
  session: { id: string }
}) {
  if (event.barged_in) {
    // User interrupted - acknowledge quickly
    return {
      type: 'speak',
      session_id: event.session.id,
      text: "Of course. What would you like to know?",
      tts: { provider: 'azure', language: 'en-US', voice: 'en-US-JennyNeural' },
    }
  }

  // Normal speech processing
  return processUserInput(event)
}

Respond to What They Said

The text field contains what the user said when interrupting. Use it to respond appropriately:

typescript
function handleUserSpeak(event: { type: 'user_speak', text: string, barged_in?: boolean }) {
  if (!event.barged_in) {
    // Normal processing for non-interruptions
    return processNormalSpeech(event)
  }

  const text = event.text.toLowerCase()

  // User understood - move on
  if (text.includes('got it') || text.includes('understood') || text.includes('okay')) {
    return speak("Great! What else can I help you with?")
  }

  // User wants to change direction
  if (text.includes('actually') || text.includes('wait') || text.includes('no')) {
    return speak("Of course. What would you like instead?")
  }

  // User is correcting something
  if (text.includes('not what i') || text.includes('i meant')) {
    return speak("I apologize for the confusion. Please tell me more.")
  }

  // User has a new question - process it directly
  if (text.length > 25 || text.includes('?')) {
    return processAsNewQuestion(event.session.id, event.text)
  }

  // Default acknowledgment
  return speak("I'm listening.")
}

Natural Acknowledgment Phrases

Vary your responses to avoid sounding robotic:

typescript
const ACKNOWLEDGMENTS = {
  understood: [
    "Great! What else can I help with?",
    "Perfect. Anything else?",
    "Alright! What's next?",
  ],
  redirect: [
    "Of course. What would you like instead?",
    "Sure thing. Go ahead.",
    "No problem. What did you have in mind?",
  ],
  listening: [
    "I'm listening.",
    "Go ahead.",
    "Yes?",
  ],
}

// German equivalents
const ACKNOWLEDGMENTS_DE = {
  understood: [
    "Sehr gut! Kann ich sonst noch helfen?",
    "Alles klar. Was noch?",
    "Prima! Was möchten Sie noch wissen?",
  ],
  redirect: [
    "Natürlich. Was kann ich für Sie tun?",
    "Kein Problem. Was hätten Sie gerne?",
    "Selbstverständlich. Bitte?",
  ],
  listening: [
    "Ich höre.",
    "Ja bitte?",
    "Ja?",
  ],
}

When to Process vs. Acknowledge

If the user said something substantial during a barge-in, treat it as a new question rather than just acknowledging:

typescript
function handleUserSpeak(event: { type: 'user_speak', text: string, barged_in?: boolean }) {
  if (!event.barged_in) {
    return processNormalSpeech(event)
  }

  const interruptText = event.text.trim()

  // Substantial interruption = likely a complete thought or question
  if (interruptText.length > 25 || interruptText.includes('?')) {
    // Process as a complete question
    return processUserQuestion(event.text)
  }

  // Short interruption = just acknowledge
  return speak("I'm listening.")
}

This provides a smoother experience - users don't have to repeat themselves.

Silent Acknowledgment

Sometimes the best response is no response. Return 204 No Content to simply listen:

typescript
function handleUserSpeak(event: { type: 'user_speak', text: string, barged_in?: boolean }) {
  if (!event.barged_in) {
    return processNormalSpeech(event)
  }

  const text = event.text.toLowerCase()

  // User just said "um", "uh", background noise, etc.
  if (text.length < 3) {
    return new Response(null, { status: 204 })
  }

  // User said "stop" or similar - they probably want silence
  if (text === 'stop' || text === 'quiet') {
    return new Response(null, { status: 204 })
  }

  return speak("I'm listening.")
}

Configure Barge-In Sensitivity

Use the barge-in configuration to control when interruptions trigger:

typescript
// Allow easy interruption for conversational responses
return {
  type: 'speak',
  session_id: sessionId,
  text: "I can help you with billing, support, or sales...",
  barge_in: {
    strategy: 'minimum_characters',
    minimum_characters: 3,  // Trigger quickly
  },
}

// Protect important information from interruption
return {
  type: 'speak',
  session_id: sessionId,
  text: "Your confirmation code is 7-4-2-9. Please write this down.",
  barge_in: {
    strategy: 'minimum_characters',
    minimum_characters: 10,  // Require more speech
    allow_after_ms: 3000,    // Protect first 3 seconds
  },
}

// Never allow interruption for critical info
return {
  type: 'speak',
  session_id: sessionId,
  text: "This call may be recorded for quality assurance.",
  barge_in: {
    strategy: 'none',
  },
}

Handling Impatient Users

Some users interrupt frequently. Keep acknowledgments brief:

typescript
// Track interruption count per session
const interruptCounts = new Map<string, number>()

function handleUserSpeak(event: { type: 'user_speak', text: string, barged_in?: boolean, session: any }) {
  if (!event.barged_in) {
    return processNormalSpeech(event)
  }

  const sessionId = event.session.id
  const count = (interruptCounts.get(sessionId) || 0) + 1
  interruptCounts.set(sessionId, count)

  // User interrupts a lot - be extra brief
  if (count > 3) {
    return speak("Yes?")
  }

  return speak("Of course. What would you like?")
}

Complete Example

typescript
export async function POST(req: Request): Promise<Response> {
  const event = await req.json()
  const sessionId = event.session.id

  switch (event.type) {
    case 'user_speak':
      return handleUserSpeak(event)

    case 'session_end':
      // Clean up session state
      sessionStates.delete(sessionId)
      interruptCounts.delete(sessionId)
      return new Response(null, { status: 204 })

    default:
      return new Response(null, { status: 204 })
  }
}

function handleUserSpeak(event: {
  type: 'user_speak'
  text: string
  barged_in?: boolean
  session: { id: string }
}): Response {
  const sessionId = event.session.id

  // Handle normal speech
  if (!event.barged_in) {
    return processNormalUserSpeech(event)
  }

  // Barge-in handling
  const text = event.text.trim().toLowerCase()

  // Very short - probably noise, stay silent
  if (text.length < 3) {
    return new Response(null, { status: 204 })
  }

  // User understood / confirmed
  if (text.includes('got it') || text.includes('thanks') || text.includes('okay')) {
    return speak(sessionId, "Great! What else can I help with?")
  }

  // User wants to redirect
  if (text.includes('actually') || text.includes('wait') || text.includes('but')) {
    return speak(sessionId, "Of course. What would you like?")
  }

  // Substantial text - treat as new input
  if (event.text.length > 25 || event.text.includes('?')) {
    return processNormalUserSpeech(event)
  }

  // Default
  return speak(sessionId, "I'm listening.")
}

function speak(sessionId: string, text: string): Response {
  return Response.json({
    type: 'speak',
    session_id: sessionId,
    text,
    tts: {
      provider: 'azure',
      language: 'en-US',
      voice: 'en-US-JennyNeural',
    },
  })
}

Best Practices Summary

  1. Respond to intent - Use the text field to understand why they interrupted

  2. Be brief - Short acknowledgments sound natural ("Got it!" not "I understand that you have indicated...")

  3. Vary your phrases - Rotate through different acknowledgments

  4. Process substantial interruptions - If they said a lot, treat it as a new question

  5. Sometimes stay silent - Return 204 for noise or "stop" commands

  6. Configure sensitivity - Use barge_in config to protect important information

  7. Keep impatient users happy - Shorter responses for frequent interrupters

  8. Clean up state - If you're tracking conversation state, consider resetting flags like "expecting confirmation" when the user interrupts

Async Operations

If you're using the Async Hold Pattern for slow operations, remember to cancel pending work when the user interrupts - they've moved on and don't want the old answer.