Appearance
Are you an LLM? You can read better optimized documentation at /sipgate-ai-flow-api/api/actions/barge-in.md for this page in Markdown format
Barge-In Action
Immediately stop whatever audio the service is currently playing to the caller (synthesized speech from a speak action or pre-recorded audio from an audio action). This is the manual, application-triggered counterpart to the automatic user-driven interruption.
Action vs. configuration — don't confuse these
Two things are called "barge-in" and they do different things:
barge_inaction (this page): a top-level action you send,{ "type": "barge_in", "session_id": "..." }. You interrupt the playback — right now — from your application.barge_inconfig onspeak/audioactions: an optional object describing how and when the caller is allowed to interrupt. See Barge-In Configuration.
The action stops current playback. The configuration controls whether the caller is allowed to do the same thing by speaking.
Action Structure
json
{
"type": "barge_in",
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}Fields
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Always "barge_in" |
session_id | string (UUID) | Yes | Session identifier from an event |
The action has no other fields. It always targets whatever is currently being played on this session.
Typical Pattern — Interrupt Then Speak
The most useful form is an array of actions: first barge_in to cut off the current playback, then speak (or audio) with the new content. The service executes array entries in order, so the caller hears the playback stop and the new message begin without any manual coordination on your side.
json
[
{
"type": "barge_in",
"session_id": "550e8400-e29b-41d4-a716-446655440000"
},
{
"type": "speak",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"text": "Sorry, let me correct that — your order ships tomorrow, not today."
}
]This works anywhere an action response is accepted: HTTP webhook response body, WebSocket message, or external API POST.
Replace in-progress audio with new audio
json
[
{ "type": "barge_in", "session_id": "550e8400-e29b-41d4-a716-446655440000" },
{
"type": "audio",
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"audio": "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
}
]Stop playback without saying anything
Send barge_in on its own if you only want silence (for example, to cut off a long response because an external system just produced a final answer you're about to deliver separately):
json
{
"type": "barge_in",
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}When to Use It
- Agent self-correction. Your LLM streamed a tentative answer via
speak, then a tool call returned a better one. Send[barge_in, speak]to replace the in-flight utterance. - External event trumps current playback. A human operator joins, a priority notification arrives, or a fresh webhook result invalidates what's being said right now.
- Cutting off a long pre-recorded
audioclip. The caller gave new intent mid-playback and you've decided to stop the clip early, regardless of theirbarge_inconfiguration.
If all you want is for the caller to be able to interrupt by speaking, you don't need this action — use the barge_in configuration on the speak or audio action instead. See Barge-In Configuration for the available strategies.
Examples
Node.js
javascript
app.post('/webhook', (req, res) => {
const event = req.body;
if (event.type === 'user_speak' && correctionNeeded(event.text)) {
return res.json([
{ type: 'barge_in', session_id: event.session.id },
{
type: 'speak',
session_id: event.session.id,
text: 'Sorry, let me correct that.',
},
]);
}
});Python
python
@app.route('/webhook', methods=['POST'])
def webhook():
event = request.json
if event['type'] == 'user_speak' and correction_needed(event['text']):
return jsonify([
{ 'type': 'barge_in', 'session_id': event['session']['id'] },
{
'type': 'speak',
'session_id': event['session']['id'],
'text': 'Sorry, let me correct that.',
},
])Go
go
actions := []map[string]interface{}{
{"type": "barge_in", "session_id": sessionID},
{"type": "speak", "session_id": sessionID, "text": "Sorry, let me correct that."},
}
json.NewEncoder(w).Encode(actions)Behavior Notes
barge_inis a no-op if nothing is currently being played. It does not produce an error.- The service emits an
assistant_speech_endedevent for the interruptedspeak/audio, followed by the events for the next action in the array. - Array entries are processed strictly in order. Putting
barge_inafter aspeakin the same array does not "cancel" that speak before it starts — the speak is dispatched first, thenbarge_instops it mid-playback.
Next Steps
- Barge-In Configuration — Let the caller interrupt by speaking (strategies, timing)
- Speak Action — Synthesize and play text
- Audio Action — Play pre-recorded audio
- Action Types — Complete action reference