Testing Voice Assistants Without Making Phone Calls

Testing voice assistants is challenging - you can't just write unit tests and call it a day. Real phone calls are slow, awkward to automate, and expensive at scale. This guide covers practical strategies for testing your sipgate AI Flow integration at every level.

The Testing Challenge

Voice assistants have unique testing challenges:

Real calls are slow - Each test takes 30+ seconds of actual talking
Hard to automate - You can't easily script "say this, wait for response"
Expensive at scale - Phone minutes add up during development
Environment-dependent - Need a publicly accessible webhook URL
Non-deterministic - Speech recognition varies, LLM responses vary

The solution: test at multiple levels, saving real phone calls for final validation.

Testing Pyramid for Voice AI

Level 1: Unit Tests

Test your business logic in isolation - no sipgate, no LLM calls.

typescript

// utils/intent-detection.ts
export function detectIntent(text: string): 'greeting' | 'question' | 'goodbye' | 'unknown' {
  const lower = text.toLowerCase()
  if (lower.match(/^(hi|hello|hey|good morning)/)) return 'greeting'
  if (lower.match(/(bye|goodbye|see you|thanks)/)) return 'goodbye'
  if (lower.includes('?')) return 'question'
  return 'unknown'
}

// utils/intent-detection.test.ts
import { detectIntent } from './intent-detection'

describe('detectIntent', () => {
  it('detects greetings', () => {
    expect(detectIntent('Hello there')).toBe('greeting')
    expect(detectIntent('Hi!')).toBe('greeting')
    expect(detectIntent('Good morning')).toBe('greeting')
  })

  it('detects questions', () => {
    expect(detectIntent('What are your hours?')).toBe('question')
    expect(detectIntent('Can you help me?')).toBe('question')
  })

  it('detects goodbyes', () => {
    expect(detectIntent('Goodbye')).toBe('goodbye')
    expect(detectIntent('Thanks, bye!')).toBe('goodbye')
  })
})

What to unit test:

Intent detection logic
Response formatting
State machine transitions
Phone number normalization
TTS configuration building

Level 2: Chat Simulator

Build a text-based interface that uses the same LLM logic as your voice assistant. This lets you rapidly iterate on prompts and conversation flow without any phone infrastructure.

typescript

// The key insight: extract your LLM logic into a shared service

// lib/conversation-service.ts
export async function generateResponse(params: {
  systemPrompt: string
  conversationHistory: { role: 'user' | 'assistant'; content: string }[]
  userMessage: string
}): Promise<string> {
  // Your LLM call logic here
  // This is used by BOTH the webhook AND the chat simulator
}

typescript

// Webhook uses it
async function handleUserSpeak(event: UserSpeakEvent) {
  const response = await generateResponse({
    systemPrompt: assistant.system_prompt,
    conversationHistory: history,
    userMessage: event.text,
  })
  return speak(response)
}

// Chat simulator uses the SAME function
async function handleChatMessage(message: string, sessionId: string) {
  const response = await generateResponse({
    systemPrompt: assistant.system_prompt,
    conversationHistory: history,
    userMessage: message,
  })
  return { response }
}

Benefits:

Test conversation flow in seconds, not minutes
Iterate on system prompts quickly
Debug LLM issues without phone overhead
Share sessions with teammates for review

Limitations:

Doesn't test speech recognition accuracy
Doesn't test TTS pronunciation
Doesn't test real-time timing

Level 3: Event Simulation

Send fake sipgate events directly to your webhook. This tests your actual webhook handler without needing a phone call.

Manual Testing with curl

bash

# Simulate session_start
curl -X POST http://localhost:3000/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "type": "session_start",
    "session": {
      "id": "test-session-123",
      "account_id": "test-account",
      "phone_number": "+1234567890",
      "direction": "inbound",
      "from_phone_number": "+0987654321",
      "to_phone_number": "+1234567890"
    }
  }'

# Simulate user_speak
curl -X POST http://localhost:3000/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "type": "user_speak",
    "session": {
      "id": "test-session-123",
      "account_id": "test-account",
      "phone_number": "+1234567890"
    },
    "text": "What are your business hours?"
  }'

# Simulate user_speak with interruption (barge_in)
curl -X POST http://localhost:3000/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "type": "user_speak",
    "session": {
      "id": "test-session-123",
      "account_id": "test-account",
      "phone_number": "+1234567890"
    },
    "text": "Actually, never mind",
    "barged_in": true
  }'

# Simulate session_end
curl -X POST http://localhost:3000/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "type": "session_end",
    "session": {
      "id": "test-session-123",
      "account_id": "test-account",
      "phone_number": "+1234567890"
    },
    "reason": "caller_hangup"
  }'

Automated Integration Tests

typescript

// tests/webhook.test.ts
import { describe, it, expect, beforeEach } from 'vitest'

const WEBHOOK_URL = 'http://localhost:3000/api/webhook'

describe('Webhook Integration', () => {
  const sessionId = `test-${Date.now()}`

  it('handles session_start and returns greeting', async () => {
    const response = await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        type: 'session_start',
        session: {
          id: sessionId,
          account_id: 'test',
          phone_number: '+1234567890',
          direction: 'inbound',
          from_phone_number: '+0987654321',
          to_phone_number: '+1234567890',
        },
      }),
    })

    expect(response.ok).toBe(true)
    const data = await response.json()
    expect(data.type).toBe('speak')
    expect(data.text).toBeTruthy()
    expect(data.session_id).toBe(sessionId)
  })

  it('handles user_speak and returns response', async () => {
    const response = await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        type: 'user_speak',
        session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
        text: 'What are your hours?',
      }),
    })

    expect(response.ok).toBe(true)
    const data = await response.json()
    expect(data.type).toBe('speak')
    expect(data.text).toBeTruthy()
  })

  it('handles barge-in gracefully', async () => {
    const response = await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        type: 'user_speak',
        barged_in: true,
        session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
        text: 'Wait',
      }),
    })

    expect(response.ok).toBe(true)
    // Could be 204 or a speak action
    if (response.status !== 204) {
      const data = await response.json()
      expect(data.type).toBe('speak')
    }
  })

  it('handles session_end and cleans up', async () => {
    const response = await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        type: 'session_end',
        session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
        reason: 'caller_hangup',
      }),
    })

    expect(response.ok).toBe(true)
  })
})

Conversation Flow Tests

Test complete conversation scenarios:

typescript

// tests/flows/booking-flow.test.ts

async function simulateConversation(messages: string[]): Promise<string[]> {
  const sessionId = `test-${Date.now()}`
  const responses: string[] = []

  // Start session
  await sendEvent({ type: 'session_start', session: { id: sessionId, ... } })

  // Simulate each user message
  for (const message of messages) {
    const response = await sendEvent({
      type: 'user_speak',
      session: { id: sessionId, ... },
      text: message,
    })
    responses.push(response.text)
  }

  // End session
  await sendEvent({ type: 'session_end', session: { id: sessionId, ... } })

  return responses
}

describe('Booking Flow', () => {
  it('completes a booking conversation', async () => {
    const responses = await simulateConversation([
      'I want to book an appointment',
      'Tomorrow at 2pm',
      'John Smith',
      'Yes, that is correct',
    ])

    expect(responses[0]).toMatch(/when|date|time/i)
    expect(responses[1]).toMatch(/name/i)
    expect(responses[2]).toMatch(/confirm/i)
    expect(responses[3]).toMatch(/booked|confirmed|scheduled/i)
  })

  it('handles corrections mid-flow', async () => {
    const responses = await simulateConversation([
      'I want to book an appointment',
      'Tomorrow at 2pm',
      'Actually, make it 3pm instead',
    ])

    expect(responses[2]).toMatch(/3|three|pm/i)
  })
})

Level 4: Local Development with ngrok

For testing with real sipgate infrastructure (but simulated calls), expose your local server:

bash

# Start your dev server
npm run dev

# In another terminal, expose it
ngrok http 3000

Configure the ngrok URL as your webhook endpoint in sipgate. Now sipgate can reach your local development server.

Use cases:

Test webhook authentication
Test with sipgate's actual event format
Debug production issues locally

Level 5: Real Phone Calls

Save these for final validation. Create a testing checklist:

markdown

## Pre-Release Phone Test Checklist

### Basic Flow
- [ ] Call connects and greeting plays
- [ ] Assistant responds to simple question
- [ ] Assistant handles "I don't understand" gracefully
- [ ] Call ends cleanly when user says goodbye

### Barge-In
- [ ] Interrupting mid-sentence works
- [ ] Assistant acknowledges interruption
- [ ] No "stale" responses after interruption

### Edge Cases
- [ ] Long silence from user (10+ seconds)
- [ ] Very long user input (30+ seconds of speaking)
- [ ] Background noise doesn't trigger false responses
- [ ] Accent/dialect recognition (if applicable)

### Error Handling
- [ ] Network timeout during LLM call
- [ ] Invalid user input
- [ ] Session state recovery after errors

Testing Utilities

Event Factory

Create a helper for generating test events:

typescript

// tests/utils/event-factory.ts

export function createSessionStartEvent(overrides = {}) {
  return {
    type: 'session_start',
    session: {
      id: `test-${Date.now()}`,
      account_id: 'test-account',
      phone_number: '+1234567890',
      direction: 'inbound',
      from_phone_number: '+0987654321',
      to_phone_number: '+1234567890',
    },
    ...overrides,
  }
}

export function createUserSpeakEvent(sessionId: string, text: string, overrides = {}) {
  return {
    type: 'user_speak',
    session: {
      id: sessionId,
      account_id: 'test-account',
      phone_number: '+1234567890',
    },
    text,
    ...overrides,
  }
}

export function createBargeInEvent(sessionId: string, text: string, overrides = {}) {
  return {
    type: 'user_speak',
    barged_in: true,
    session: {
      id: sessionId,
      account_id: 'test-account',
      phone_number: '+1234567890',
    },
    text,
    ...overrides,
  }
}

Response Assertions

typescript

// tests/utils/assertions.ts

export function assertSpeakAction(response: any, options: {
  containsText?: string
  sessionId?: string
} = {}) {
  expect(response.type).toBe('speak')
  expect(response.text).toBeTruthy()
  expect(response.tts).toBeDefined()

  if (options.containsText) {
    expect(response.text.toLowerCase()).toContain(options.containsText.toLowerCase())
  }
  if (options.sessionId) {
    expect(response.session_id).toBe(options.sessionId)
  }
}

export function assertTransferAction(response: any, targetNumber?: string) {
  expect(response.type).toBe('transfer')
  expect(response.target).toBeTruthy()
  if (targetNumber) {
    expect(response.target).toBe(targetNumber)
  }
}

CI/CD Integration

Run event simulation tests in your pipeline:

yaml

# .github/workflows/test.yml
name: Test Voice Assistant

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm ci

      - name: Run unit tests
        run: npm test

      - name: Start server
        run: npm run dev &
        env:
          # Use test/mock API keys
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY_TEST }}

      - name: Wait for server
        run: npx wait-on http://localhost:3000/api/health

      - name: Run integration tests
        run: npm run test:integration

Best Practices Summary

Extract shared logic - Same LLM service for chat and voice
Test the pyramid - Most tests at unit level, fewest at phone level
Automate event simulation - Integration tests catch regressions
Use deterministic test data - Fixed session IDs, predictable inputs
Test conversation flows - Not just individual events
Create test utilities - Event factories, response assertions
Run in CI - Catch issues before deployment
Save phone tests for validation - Manual checklist for final sign-off

HTTP Webhooks - Webhook endpoint reference
Event Types - All event structures
Action Types - Response format reference

Testing Voice Assistants Without Making Phone Calls ​

The Testing Challenge ​

Testing Pyramid for Voice AI ​

Level 1: Unit Tests ​

Level 2: Chat Simulator ​

Level 3: Event Simulation ​

Manual Testing with curl ​

Automated Integration Tests ​

Conversation Flow Tests ​

Level 4: Local Development with ngrok ​

Level 5: Real Phone Calls ​

Testing Utilities ​

Event Factory ​

Response Assertions ​

CI/CD Integration ​

Best Practices Summary ​

Related Documentation ​

Testing Voice Assistants Without Making Phone Calls

The Testing Challenge

Testing Pyramid for Voice AI

Level 1: Unit Tests

Level 2: Chat Simulator

Level 3: Event Simulation

Manual Testing with curl

Automated Integration Tests

Conversation Flow Tests

Level 4: Local Development with ngrok

Level 5: Real Phone Calls

Testing Utilities

Event Factory

Response Assertions

CI/CD Integration

Best Practices Summary

Related Documentation