Appearance
Testing Voice Assistants Without Making Phone Calls
Testing voice assistants is challenging - you can't just write unit tests and call it a day. Real phone calls are slow, awkward to automate, and expensive at scale. This guide covers practical strategies for testing your sipgate AI Flow integration at every level.
The Testing Challenge
Voice assistants have unique testing challenges:
- Real calls are slow - Each test takes 30+ seconds of actual talking
- Hard to automate - You can't easily script "say this, wait for response"
- Expensive at scale - Phone minutes add up during development
- Environment-dependent - Need a publicly accessible webhook URL
- Non-deterministic - Speech recognition varies, LLM responses vary
The solution: test at multiple levels, saving real phone calls for final validation.
Testing Pyramid for Voice AI
Level 1: Unit Tests
Test your business logic in isolation - no sipgate, no LLM calls.
typescript
// utils/intent-detection.ts
export function detectIntent(text: string): 'greeting' | 'question' | 'goodbye' | 'unknown' {
const lower = text.toLowerCase()
if (lower.match(/^(hi|hello|hey|good morning)/)) return 'greeting'
if (lower.match(/(bye|goodbye|see you|thanks)/)) return 'goodbye'
if (lower.includes('?')) return 'question'
return 'unknown'
}
// utils/intent-detection.test.ts
import { detectIntent } from './intent-detection'
describe('detectIntent', () => {
it('detects greetings', () => {
expect(detectIntent('Hello there')).toBe('greeting')
expect(detectIntent('Hi!')).toBe('greeting')
expect(detectIntent('Good morning')).toBe('greeting')
})
it('detects questions', () => {
expect(detectIntent('What are your hours?')).toBe('question')
expect(detectIntent('Can you help me?')).toBe('question')
})
it('detects goodbyes', () => {
expect(detectIntent('Goodbye')).toBe('goodbye')
expect(detectIntent('Thanks, bye!')).toBe('goodbye')
})
})What to unit test:
- Intent detection logic
- Response formatting
- State machine transitions
- Phone number normalization
- TTS configuration building
Level 2: Chat Simulator
Build a text-based interface that uses the same LLM logic as your voice assistant. This lets you rapidly iterate on prompts and conversation flow without any phone infrastructure.
typescript
// The key insight: extract your LLM logic into a shared service
// lib/conversation-service.ts
export async function generateResponse(params: {
systemPrompt: string
conversationHistory: { role: 'user' | 'assistant'; content: string }[]
userMessage: string
}): Promise<string> {
// Your LLM call logic here
// This is used by BOTH the webhook AND the chat simulator
}typescript
// Webhook uses it
async function handleUserSpeak(event: UserSpeakEvent) {
const response = await generateResponse({
systemPrompt: assistant.system_prompt,
conversationHistory: history,
userMessage: event.text,
})
return speak(response)
}
// Chat simulator uses the SAME function
async function handleChatMessage(message: string, sessionId: string) {
const response = await generateResponse({
systemPrompt: assistant.system_prompt,
conversationHistory: history,
userMessage: message,
})
return { response }
}Benefits:
- Test conversation flow in seconds, not minutes
- Iterate on system prompts quickly
- Debug LLM issues without phone overhead
- Share sessions with teammates for review
Limitations:
- Doesn't test speech recognition accuracy
- Doesn't test TTS pronunciation
- Doesn't test real-time timing
Level 3: Event Simulation
Send fake sipgate events directly to your webhook. This tests your actual webhook handler without needing a phone call.
Manual Testing with curl
bash
# Simulate session_start
curl -X POST http://localhost:3000/api/webhook \
-H "Content-Type: application/json" \
-d '{
"type": "session_start",
"session": {
"id": "test-session-123",
"account_id": "test-account",
"phone_number": "+1234567890",
"direction": "inbound",
"from_phone_number": "+0987654321",
"to_phone_number": "+1234567890"
}
}'
# Simulate user_speak
curl -X POST http://localhost:3000/api/webhook \
-H "Content-Type: application/json" \
-d '{
"type": "user_speak",
"session": {
"id": "test-session-123",
"account_id": "test-account",
"phone_number": "+1234567890"
},
"text": "What are your business hours?"
}'
# Simulate user_speak with interruption (barge_in)
curl -X POST http://localhost:3000/api/webhook \
-H "Content-Type: application/json" \
-d '{
"type": "user_speak",
"session": {
"id": "test-session-123",
"account_id": "test-account",
"phone_number": "+1234567890"
},
"text": "Actually, never mind",
"barged_in": true
}'
# Simulate session_end
curl -X POST http://localhost:3000/api/webhook \
-H "Content-Type: application/json" \
-d '{
"type": "session_end",
"session": {
"id": "test-session-123",
"account_id": "test-account",
"phone_number": "+1234567890"
},
"reason": "caller_hangup"
}'Automated Integration Tests
typescript
// tests/webhook.test.ts
import { describe, it, expect, beforeEach } from 'vitest'
const WEBHOOK_URL = 'http://localhost:3000/api/webhook'
describe('Webhook Integration', () => {
const sessionId = `test-${Date.now()}`
it('handles session_start and returns greeting', async () => {
const response = await fetch(WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
type: 'session_start',
session: {
id: sessionId,
account_id: 'test',
phone_number: '+1234567890',
direction: 'inbound',
from_phone_number: '+0987654321',
to_phone_number: '+1234567890',
},
}),
})
expect(response.ok).toBe(true)
const data = await response.json()
expect(data.type).toBe('speak')
expect(data.text).toBeTruthy()
expect(data.session_id).toBe(sessionId)
})
it('handles user_speak and returns response', async () => {
const response = await fetch(WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
type: 'user_speak',
session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
text: 'What are your hours?',
}),
})
expect(response.ok).toBe(true)
const data = await response.json()
expect(data.type).toBe('speak')
expect(data.text).toBeTruthy()
})
it('handles barge-in gracefully', async () => {
const response = await fetch(WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
type: 'user_speak',
barged_in: true,
session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
text: 'Wait',
}),
})
expect(response.ok).toBe(true)
// Could be 204 or a speak action
if (response.status !== 204) {
const data = await response.json()
expect(data.type).toBe('speak')
}
})
it('handles session_end and cleans up', async () => {
const response = await fetch(WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
type: 'session_end',
session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
reason: 'caller_hangup',
}),
})
expect(response.ok).toBe(true)
})
})Conversation Flow Tests
Test complete conversation scenarios:
typescript
// tests/flows/booking-flow.test.ts
async function simulateConversation(messages: string[]): Promise<string[]> {
const sessionId = `test-${Date.now()}`
const responses: string[] = []
// Start session
await sendEvent({ type: 'session_start', session: { id: sessionId, ... } })
// Simulate each user message
for (const message of messages) {
const response = await sendEvent({
type: 'user_speak',
session: { id: sessionId, ... },
text: message,
})
responses.push(response.text)
}
// End session
await sendEvent({ type: 'session_end', session: { id: sessionId, ... } })
return responses
}
describe('Booking Flow', () => {
it('completes a booking conversation', async () => {
const responses = await simulateConversation([
'I want to book an appointment',
'Tomorrow at 2pm',
'John Smith',
'Yes, that is correct',
])
expect(responses[0]).toMatch(/when|date|time/i)
expect(responses[1]).toMatch(/name/i)
expect(responses[2]).toMatch(/confirm/i)
expect(responses[3]).toMatch(/booked|confirmed|scheduled/i)
})
it('handles corrections mid-flow', async () => {
const responses = await simulateConversation([
'I want to book an appointment',
'Tomorrow at 2pm',
'Actually, make it 3pm instead',
])
expect(responses[2]).toMatch(/3|three|pm/i)
})
})Level 4: Local Development with ngrok
For testing with real sipgate infrastructure (but simulated calls), expose your local server:
bash
# Start your dev server
npm run dev
# In another terminal, expose it
ngrok http 3000Configure the ngrok URL as your webhook endpoint in sipgate. Now sipgate can reach your local development server.
Use cases:
- Test webhook authentication
- Test with sipgate's actual event format
- Debug production issues locally
Level 5: Real Phone Calls
Save these for final validation. Create a testing checklist:
markdown
## Pre-Release Phone Test Checklist
### Basic Flow
- [ ] Call connects and greeting plays
- [ ] Assistant responds to simple question
- [ ] Assistant handles "I don't understand" gracefully
- [ ] Call ends cleanly when user says goodbye
### Barge-In
- [ ] Interrupting mid-sentence works
- [ ] Assistant acknowledges interruption
- [ ] No "stale" responses after interruption
### Edge Cases
- [ ] Long silence from user (10+ seconds)
- [ ] Very long user input (30+ seconds of speaking)
- [ ] Background noise doesn't trigger false responses
- [ ] Accent/dialect recognition (if applicable)
### Error Handling
- [ ] Network timeout during LLM call
- [ ] Invalid user input
- [ ] Session state recovery after errorsTesting Utilities
Event Factory
Create a helper for generating test events:
typescript
// tests/utils/event-factory.ts
export function createSessionStartEvent(overrides = {}) {
return {
type: 'session_start',
session: {
id: `test-${Date.now()}`,
account_id: 'test-account',
phone_number: '+1234567890',
direction: 'inbound',
from_phone_number: '+0987654321',
to_phone_number: '+1234567890',
},
...overrides,
}
}
export function createUserSpeakEvent(sessionId: string, text: string, overrides = {}) {
return {
type: 'user_speak',
session: {
id: sessionId,
account_id: 'test-account',
phone_number: '+1234567890',
},
text,
...overrides,
}
}
export function createBargeInEvent(sessionId: string, text: string, overrides = {}) {
return {
type: 'user_speak',
barged_in: true,
session: {
id: sessionId,
account_id: 'test-account',
phone_number: '+1234567890',
},
text,
...overrides,
}
}Response Assertions
typescript
// tests/utils/assertions.ts
export function assertSpeakAction(response: any, options: {
containsText?: string
sessionId?: string
} = {}) {
expect(response.type).toBe('speak')
expect(response.text).toBeTruthy()
expect(response.tts).toBeDefined()
if (options.containsText) {
expect(response.text.toLowerCase()).toContain(options.containsText.toLowerCase())
}
if (options.sessionId) {
expect(response.session_id).toBe(options.sessionId)
}
}
export function assertTransferAction(response: any, targetNumber?: string) {
expect(response.type).toBe('transfer')
expect(response.target).toBeTruthy()
if (targetNumber) {
expect(response.target).toBe(targetNumber)
}
}CI/CD Integration
Run event simulation tests in your pipeline:
yaml
# .github/workflows/test.yml
name: Test Voice Assistant
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run unit tests
run: npm test
- name: Start server
run: npm run dev &
env:
# Use test/mock API keys
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY_TEST }}
- name: Wait for server
run: npx wait-on http://localhost:3000/api/health
- name: Run integration tests
run: npm run test:integrationBest Practices Summary
- Extract shared logic - Same LLM service for chat and voice
- Test the pyramid - Most tests at unit level, fewest at phone level
- Automate event simulation - Integration tests catch regressions
- Use deterministic test data - Fixed session IDs, predictable inputs
- Test conversation flows - Not just individual events
- Create test utilities - Event factories, response assertions
- Run in CI - Catch issues before deployment
- Save phone tests for validation - Manual checklist for final sign-off
Related Documentation
- HTTP Webhooks - Webhook endpoint reference
- Event Types - All event structures
- Action Types - Response format reference