Skip to content

Barge-In Action

Immediately stop whatever audio the service is currently playing to the caller (synthesized speech from a speak action or pre-recorded audio from an audio action). This is the manual, application-triggered counterpart to the automatic user-driven interruption.

Action vs. configuration — don't confuse these

Two things are called "barge-in" and they do different things:

  • barge_in action (this page): a top-level action you send, { "type": "barge_in", "session_id": "..." }. You interrupt the playback — right now — from your application.
  • barge_in config on speak / audio actions: an optional object describing how and when the caller is allowed to interrupt. See Barge-In Configuration.

The action stops current playback. The configuration controls whether the caller is allowed to do the same thing by speaking.

Action Structure

json
{
  "type": "barge_in",
  "session_id": "550e8400-e29b-41d4-a716-446655440000"
}

Fields

FieldTypeRequiredDescription
typestringYesAlways "barge_in"
session_idstring (UUID)YesSession identifier from an event

The action has no other fields. It always targets whatever is currently being played on this session.

Typical Pattern — Interrupt Then Speak

The most useful form is an array of actions: first barge_in to cut off the current playback, then speak (or audio) with the new content. The service executes array entries in order, so the caller hears the playback stop and the new message begin without any manual coordination on your side.

json
[
  {
    "type": "barge_in",
    "session_id": "550e8400-e29b-41d4-a716-446655440000"
  },
  {
    "type": "speak",
    "session_id": "550e8400-e29b-41d4-a716-446655440000",
    "text": "Sorry, let me correct that — your order ships tomorrow, not today."
  }
]

This works anywhere an action response is accepted: HTTP webhook response body, WebSocket message, or external API POST.

Replace in-progress audio with new audio

json
[
  { "type": "barge_in", "session_id": "550e8400-e29b-41d4-a716-446655440000" },
  {
    "type": "audio",
    "session_id": "550e8400-e29b-41d4-a716-446655440000",
    "audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
  }
]

Stop playback without saying anything

Send barge_in on its own if you only want silence (for example, to cut off a long response because an external system just produced a final answer you're about to deliver separately):

json
{
  "type": "barge_in",
  "session_id": "550e8400-e29b-41d4-a716-446655440000"
}

When to Use It

  • Agent self-correction. Your LLM streamed a tentative answer via speak, then a tool call returned a better one. Send [barge_in, speak] to replace the in-flight utterance.
  • External event trumps current playback. A human operator joins, a priority notification arrives, or a fresh webhook result invalidates what's being said right now.
  • Cutting off a long pre-recorded audio clip. The caller gave new intent mid-playback and you've decided to stop the clip early, regardless of their barge_in configuration.

If all you want is for the caller to be able to interrupt by speaking, you don't need this action — use the barge_in configuration on the speak or audio action instead. See Barge-In Configuration for the available strategies.

Examples

Node.js

javascript
app.post('/webhook', (req, res) => {
  const event = req.body;

  if (event.type === 'user_speak' && correctionNeeded(event.text)) {
    return res.json([
      { type: 'barge_in', session_id: event.session.id },
      {
        type: 'speak',
        session_id: event.session.id,
        text: 'Sorry, let me correct that.',
      },
    ]);
  }
});

Python

python
@app.route('/webhook', methods=['POST'])
def webhook():
    event = request.json

    if event['type'] == 'user_speak' and correction_needed(event['text']):
        return jsonify([
            { 'type': 'barge_in', 'session_id': event['session']['id'] },
            {
                'type': 'speak',
                'session_id': event['session']['id'],
                'text': 'Sorry, let me correct that.',
            },
        ])

Go

go
actions := []map[string]interface{}{
    {"type": "barge_in", "session_id": sessionID},
    {"type": "speak", "session_id": sessionID, "text": "Sorry, let me correct that."},
}
json.NewEncoder(w).Encode(actions)

Behavior Notes

  • barge_in is a no-op if nothing is currently being played. It does not produce an error.
  • The service emits an assistant_speech_ended event for the interrupted speak/audio, followed by the events for the next action in the array.
  • Array entries are processed strictly in order. Putting barge_in after a speak in the same array does not "cancel" that speak before it starts — the speak is dispatched first, then barge_in stops it mid-playback.

Next Steps