Appearance
Handling Long-Running Requests in Voice AI: The Async Hold Pattern
When building voice AI assistants that integrate with external tools (like MCP servers, RAG systems, or slow APIs), you'll inevitably face a challenge: sipgate AI Flow has webhook timeout limits, but your backend operations might take much longer.
This guide explains how to implement an elegant solution we call the "Async Hold Pattern" - keeping callers engaged while your system processes their request in the background.
The Problem
sipgate AI Flow enforces webhook timeout limits of approximately 5 seconds. If your server doesn't respond in time, the platform may drop the connection or return an error to the caller.
But what if your assistant needs to:
- Query a slow external API (20+ seconds)
- Search through a large knowledge base
- Call an MCP (Model Context Protocol) server
- Perform complex RAG operations
- Fetch real-time data from third-party services
You can't make the caller wait in silence, and you can't speed up the external service. So what do you do?
The Solution: Async Hold Pattern
Instead of blocking on the slow operation, we:
- Start the operation in the background (don't await it)
- Wait briefly for a quick response (e.g., 4 seconds)
- If completed quickly → return the result directly
- If still pending → tell the caller to wait, then check again when they're done listening
This leverages a key insight: sipgate AI Flow sends an assistant_speech_ended event when the assistant finishes speaking. We can use this to create a polling loop that keeps the caller informed.
Implementation
Step 1: Create a Pending State Manager
First, we need a way to store the background promise and track state across webhook calls. Since each webhook call is a separate HTTP request, we need shared state:
typescript
// pending-state.ts
interface PendingState {
promise: Promise<{ response: string; error?: string }>
startedAt: number
holdMessageCount: number
userMessage: string
}
// In-memory store (use Redis for multi-instance deployments)
const pendingStates = new Map<string, PendingState>()
// Hold messages - rotate through these while waiting
const HOLD_MESSAGES = [
'One moment, let me check...',
'Still searching...',
'Just a moment longer...',
'Almost there...',
]
// How long to wait before responding (stay under sipgate's timeout!)
const WAIT_BEFORE_RESPONSE_MS = 4000
export function startPending(
sessionId: string,
promise: Promise<{ response: string; error?: string }>,
userMessage: string
): void {
pendingStates.set(sessionId, {
promise,
startedAt: Date.now(),
holdMessageCount: 0,
userMessage,
})
}
export function hasPending(sessionId: string): boolean {
return pendingStates.has(sessionId)
}
export function cancelPending(sessionId: string): void {
pendingStates.delete(sessionId)
}
export function getNextHoldMessage(sessionId: string): string {
const state = pendingStates.get(sessionId)
if (!state) return HOLD_MESSAGES[0]
const index = Math.min(state.holdMessageCount, HOLD_MESSAGES.length - 1)
state.holdMessageCount++
return HOLD_MESSAGES[index]
}
export async function waitForCompletion(
sessionId: string
): Promise<{ response: string; error?: string } | null> {
const state = pendingStates.get(sessionId)
if (!state) return null
// Race between the promise and a timeout
const timeoutPromise = new Promise<null>((resolve) => {
setTimeout(() => resolve(null), WAIT_BEFORE_RESPONSE_MS)
})
const result = await Promise.race([state.promise, timeoutPromise])
if (result !== null) {
// Completed! Clean up and return
pendingStates.delete(sessionId)
return result
}
return null // Still pending
}Step 2: Handle the Initial user_speak Event
When a user speaks, start the background operation and wait briefly:
typescript
// webhook-handler.ts
async function handleUserSpeak(event: {
type: 'user_speak'
session: { id: string }
text: string
}) {
const sessionId = event.session.id
// Cancel any existing pending operation (user asked a new question)
cancelPending(sessionId)
// Start the slow operation in background (DON'T await the full operation!)
const operationPromise = performSlowOperation(event.text)
.then(result => ({ response: result }))
.catch(error => ({ response: '', error: String(error) }))
// Wait up to 4 seconds for completion
const INITIAL_WAIT_MS = 4000
const timeoutPromise = new Promise<null>((resolve) => {
setTimeout(() => resolve(null), INITIAL_WAIT_MS)
})
const quickResult = await Promise.race([operationPromise, timeoutPromise])
if (quickResult !== null) {
// Completed quickly! Return result directly - no hold message needed
console.log('Operation completed within 4s - returning direct response')
if (quickResult.error) {
return createSpeakResponse(sessionId, 'I\'m sorry, there was an error.')
}
return createSpeakResponse(sessionId, quickResult.response)
}
// Taking too long - switch to hold pattern
console.log('Operation taking >4s - using hold pattern')
// Store the promise for the assistant_speech_ended handler
startPending(sessionId, operationPromise, event.text)
// Return hold message
return createSpeakResponse(sessionId, getNextHoldMessage(sessionId))
}
function createSpeakResponse(sessionId: string, text: string) {
return {
type: 'speak',
session_id: sessionId,
text: text,
tts: {
provider: 'azure',
language: 'en-US',
voice: 'en-US-JennyNeural',
},
}
}Step 3: Handle the assistant_speech_ended Event
This is the key insight: sipgate AI Flow sends an assistant_speech_ended event when the assistant finishes speaking. You can return a new action from this event!
typescript
async function handleAssistantSpeechEnded(event: {
type: 'assistant_speech_ended'
session: { id: string }
}) {
const sessionId = event.session.id
// No pending operation? Nothing to do - return 204 No Content
if (!hasPending(sessionId)) {
return new Response(null, { status: 204 })
}
// Wait for completion (up to 4 seconds to maximize processing time)
const result = await waitForCompletion(sessionId)
if (result !== null) {
// Done! Return the actual response
console.log('Operation completed - returning result')
if (result.error) {
return createSpeakResponse(
sessionId,
'I\'m sorry, I couldn\'t find that information.'
)
}
return createSpeakResponse(sessionId, result.response)
}
// Still pending - say another hold message and wait for next speech_ended
console.log('Still waiting - returning another hold message')
return createSpeakResponse(sessionId, getNextHoldMessage(sessionId))
}Step 4: Wire Up the Webhook Router
typescript
export async function POST(request: Request) {
const event = await request.json()
switch (event.type) {
case 'session_start':
return handleSessionStart(event)
case 'user_speak':
return handleUserSpeak(event)
case 'assistant_speech_ended':
return handleAssistantSpeechEnded(event)
case 'session_end':
// Clean up any pending state
cancelPending(event.session.id)
return handleSessionEnd(event)
default:
// Return 204 for events we don't handle
return new Response(null, { status: 204 })
}
}Key Insights
Why 4 Seconds?
sipgate AI Flow has approximately a 5 second timeout. We use 4 seconds to:
- Leave buffer for network latency
- Allow time for the response to be transmitted
- Stay safely under the limit
You can adjust this value, but always leave at least 500ms-1s of buffer.
The assistant_speech_ended Event is Powerful
Many developers overlook this event, but it's the key to the async pattern. When the assistant finishes speaking, sipgate sends this event and waits for your response. You can:
- Return a new
speakaction to continue talking - Return
204 No Contentto stay silent and wait for user input - Check if your background operation completed
This creates a natural polling mechanism without awkward silences.
Memory vs. Redis
The example uses an in-memory Map for simplicity. This works for single-instance deployments, but for production with multiple server instances behind a load balancer, use Redis:
typescript
import Redis from 'ioredis'
const redis = new Redis(process.env.REDIS_URL)
// Store metadata in Redis (promises can't be serialized)
export async function startPending(sessionId: string, ...) {
await redis.setex(
`pending:${sessionId}`,
120, // 2 minute TTL
JSON.stringify({
startedAt: Date.now(),
holdMessageCount: 0,
userMessage
})
)
// Keep promise reference in memory (same instance will handle it)
pendingPromises.set(sessionId, promise)
}Always Cancel Previous Operations
When a user asks a new question while you're still processing the old one, cancel the old operation:
typescript
// In user_speak handler - always cancel previous pending operation
cancelPending(event.session.id)This prevents confusion and wasted resources. The user doesn't care about the old answer anymore.
Clean Up on Session End
Always clean up when the call ends:
typescript
case 'session_end':
cancelPending(event.session.id)
return handleSessionEnd(event)Advanced: Caching Slow Initializations
If your slow operation has a one-time initialization step (like discovering available tools from an MCP server), cache it separately:
typescript
// BAD: Fetching tool definitions on every request
async function handleUserSpeak(event) {
const tools = await mcpServer.listTools() // SLOW - 5+ seconds!
const response = await llm.generate({ tools, message: event.text })
return createSpeakResponse(sessionId, response)
}
// GOOD: Cache tool definitions, only fetch once
async function handleUserSpeak(event) {
// Tools were cached when server was configured
const tools = await database.get('mcp_server_tools', serverId) // FAST - <100ms
const response = await llm.generate({ tools, message: event.text })
return createSpeakResponse(sessionId, response)
}
// Cache tools when MCP server is configured (admin action)
async function configureMcpServer(serverUrl: string) {
const tools = await mcpServer.listTools() // Slow, but only happens once
await database.set('mcp_server_tools', serverId, tools)
}This separates "one-time setup" (caching tool definitions) from "per-request work" (calling tools), dramatically improving response times.
Complete Minimal Example
Here's a self-contained example you can adapt:
typescript
// ============================================
// pending-state.ts
// ============================================
const pendingStates = new Map<string, {
promise: Promise<{ response: string; error?: string }>
holdCount: number
}>()
const HOLD_MESSAGES = [
'One moment please...',
'Still searching...',
'Almost there...',
]
const WAIT_MS = 4000
export const pending = {
start(id: string, promise: Promise<{ response: string; error?: string }>) {
pendingStates.set(id, { promise, holdCount: 0 })
},
has(id: string): boolean {
return pendingStates.has(id)
},
cancel(id: string): void {
pendingStates.delete(id)
},
getHoldMessage(id: string): string {
const state = pendingStates.get(id)
if (!state) return HOLD_MESSAGES[0]
return HOLD_MESSAGES[Math.min(state.holdCount++, HOLD_MESSAGES.length - 1)]
},
async wait(id: string): Promise<{ response: string; error?: string } | null> {
const state = pendingStates.get(id)
if (!state) return null
const timeout = new Promise<null>(r => setTimeout(() => r(null), WAIT_MS))
const result = await Promise.race([state.promise, timeout])
if (result !== null) {
pendingStates.delete(id)
}
return result
},
}
// ============================================
// webhook.ts
// ============================================
import { pending } from './pending-state'
export async function POST(req: Request): Promise<Response> {
const event = await req.json()
const sessionId = event.session.id
// Handle user_speak - start background operation
if (event.type === 'user_speak') {
pending.cancel(sessionId) // Cancel any previous operation
// Start slow operation (don't await fully!)
const promise = slowExternalApiCall(event.text)
.then(result => ({ response: result }))
.catch(err => ({ response: '', error: String(err) }))
// Wait up to 4 seconds
const timeout = new Promise<null>(r => setTimeout(() => r(null), 4000))
const quick = await Promise.race([promise, timeout])
// If completed quickly, return result directly
if (quick !== null) {
if (quick.error) {
return speak(sessionId, 'Sorry, there was an error.')
}
return speak(sessionId, quick.response)
}
// Taking too long - use hold pattern
pending.start(sessionId, promise)
return speak(sessionId, pending.getHoldMessage(sessionId))
}
// Handle assistant_speech_ended - check if operation completed
if (event.type === 'assistant_speech_ended') {
if (!pending.has(sessionId)) {
return new Response(null, { status: 204 })
}
const result = await pending.wait(sessionId)
if (result !== null) {
if (result.error) {
return speak(sessionId, 'The request could not be processed.')
}
return speak(sessionId, result.response)
}
// Still waiting - another hold message
return speak(sessionId, pending.getHoldMessage(sessionId))
}
// Handle session_end - clean up
if (event.type === 'session_end') {
pending.cancel(sessionId)
}
return new Response(null, { status: 204 })
}
function speak(sessionId: string, text: string): Response {
return Response.json({
type: 'speak',
session_id: sessionId,
text,
tts: {
provider: 'azure',
language: 'en-US',
voice: 'en-US-GuyNeural',
},
})
}
// Your slow operation (replace with actual implementation)
async function slowExternalApiCall(query: string): Promise<string> {
// Simulating a slow API call
await new Promise(r => setTimeout(r, 15000))
return `Here's what I found about "${query}"...`
}Conclusion
The Async Hold Pattern transforms a technical limitation into a natural conversation flow. Instead of timing out or making users wait in awkward silence, your assistant says "One moment please..." - just like a human would.
Key takeaways:
- Start slow operations without awaiting - let them run in the background
- Wait briefly (4 seconds) before deciding to use hold messages
- Use the
assistant_speech_endedevent to poll for completion - Keep messages varied - rotate through different hold phrases
- Always clean up - cancel pending operations when no longer needed
- Cache when possible - separate one-time setup from per-request work
This pattern works with any slow backend operation: MCP servers, RAG pipelines, external APIs, database queries, or anything else that might exceed the webhook timeout.
For more information about sipgate AI Flow events and actions, see the sipgate AI Flow API documentation.