Skip to content

Testing Voice Assistants Without Making Phone Calls

Testing voice assistants is challenging - you can't just write unit tests and call it a day. Real phone calls are slow, awkward to automate, and expensive at scale. This guide covers practical strategies for testing your sipgate AI Flow integration at every level.

The Testing Challenge

Voice assistants have unique testing challenges:

  • Real calls are slow - Each test takes 30+ seconds of actual talking
  • Hard to automate - You can't easily script "say this, wait for response"
  • Expensive at scale - Phone minutes add up during development
  • Environment-dependent - Need a publicly accessible webhook URL
  • Non-deterministic - Speech recognition varies, LLM responses vary

The solution: test at multiple levels, saving real phone calls for final validation.

Testing Pyramid for Voice AI

Level 1: Unit Tests

Test your business logic in isolation - no sipgate, no LLM calls.

typescript
// utils/intent-detection.ts
export function detectIntent(text: string): 'greeting' | 'question' | 'goodbye' | 'unknown' {
  const lower = text.toLowerCase()
  if (lower.match(/^(hi|hello|hey|good morning)/)) return 'greeting'
  if (lower.match(/(bye|goodbye|see you|thanks)/)) return 'goodbye'
  if (lower.includes('?')) return 'question'
  return 'unknown'
}

// utils/intent-detection.test.ts
import { detectIntent } from './intent-detection'

describe('detectIntent', () => {
  it('detects greetings', () => {
    expect(detectIntent('Hello there')).toBe('greeting')
    expect(detectIntent('Hi!')).toBe('greeting')
    expect(detectIntent('Good morning')).toBe('greeting')
  })

  it('detects questions', () => {
    expect(detectIntent('What are your hours?')).toBe('question')
    expect(detectIntent('Can you help me?')).toBe('question')
  })

  it('detects goodbyes', () => {
    expect(detectIntent('Goodbye')).toBe('goodbye')
    expect(detectIntent('Thanks, bye!')).toBe('goodbye')
  })
})

What to unit test:

  • Intent detection logic
  • Response formatting
  • State machine transitions
  • Phone number normalization
  • TTS configuration building

Level 2: Chat Simulator

Build a text-based interface that uses the same LLM logic as your voice assistant. This lets you rapidly iterate on prompts and conversation flow without any phone infrastructure.

typescript
// The key insight: extract your LLM logic into a shared service

// lib/conversation-service.ts
export async function generateResponse(params: {
  systemPrompt: string
  conversationHistory: { role: 'user' | 'assistant'; content: string }[]
  userMessage: string
}): Promise<string> {
  // Your LLM call logic here
  // This is used by BOTH the webhook AND the chat simulator
}
typescript
// Webhook uses it
async function handleUserSpeak(event: UserSpeakEvent) {
  const response = await generateResponse({
    systemPrompt: assistant.system_prompt,
    conversationHistory: history,
    userMessage: event.text,
  })
  return speak(response)
}

// Chat simulator uses the SAME function
async function handleChatMessage(message: string, sessionId: string) {
  const response = await generateResponse({
    systemPrompt: assistant.system_prompt,
    conversationHistory: history,
    userMessage: message,
  })
  return { response }
}

Benefits:

  • Test conversation flow in seconds, not minutes
  • Iterate on system prompts quickly
  • Debug LLM issues without phone overhead
  • Share sessions with teammates for review

Limitations:

  • Doesn't test speech recognition accuracy
  • Doesn't test TTS pronunciation
  • Doesn't test real-time timing

Level 3: Event Simulation

Send fake sipgate events directly to your webhook. This tests your actual webhook handler without needing a phone call.

Manual Testing with curl

bash
# Simulate session_start
curl -X POST http://localhost:3000/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "type": "session_start",
    "session": {
      "id": "test-session-123",
      "account_id": "test-account",
      "phone_number": "+1234567890",
      "direction": "inbound",
      "from_phone_number": "+0987654321",
      "to_phone_number": "+1234567890"
    }
  }'

# Simulate user_speak
curl -X POST http://localhost:3000/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "type": "user_speak",
    "session": {
      "id": "test-session-123",
      "account_id": "test-account",
      "phone_number": "+1234567890"
    },
    "text": "What are your business hours?"
  }'

# Simulate user_speak with interruption (barge_in)
curl -X POST http://localhost:3000/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "type": "user_speak",
    "session": {
      "id": "test-session-123",
      "account_id": "test-account",
      "phone_number": "+1234567890"
    },
    "text": "Actually, never mind",
    "barged_in": true
  }'

# Simulate session_end
curl -X POST http://localhost:3000/api/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "type": "session_end",
    "session": {
      "id": "test-session-123",
      "account_id": "test-account",
      "phone_number": "+1234567890"
    },
    "reason": "caller_hangup"
  }'

Automated Integration Tests

typescript
// tests/webhook.test.ts
import { describe, it, expect, beforeEach } from 'vitest'

const WEBHOOK_URL = 'http://localhost:3000/api/webhook'

describe('Webhook Integration', () => {
  const sessionId = `test-${Date.now()}`

  it('handles session_start and returns greeting', async () => {
    const response = await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        type: 'session_start',
        session: {
          id: sessionId,
          account_id: 'test',
          phone_number: '+1234567890',
          direction: 'inbound',
          from_phone_number: '+0987654321',
          to_phone_number: '+1234567890',
        },
      }),
    })

    expect(response.ok).toBe(true)
    const data = await response.json()
    expect(data.type).toBe('speak')
    expect(data.text).toBeTruthy()
    expect(data.session_id).toBe(sessionId)
  })

  it('handles user_speak and returns response', async () => {
    const response = await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        type: 'user_speak',
        session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
        text: 'What are your hours?',
      }),
    })

    expect(response.ok).toBe(true)
    const data = await response.json()
    expect(data.type).toBe('speak')
    expect(data.text).toBeTruthy()
  })

  it('handles barge-in gracefully', async () => {
    const response = await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        type: 'user_speak',
        barged_in: true,
        session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
        text: 'Wait',
      }),
    })

    expect(response.ok).toBe(true)
    // Could be 204 or a speak action
    if (response.status !== 204) {
      const data = await response.json()
      expect(data.type).toBe('speak')
    }
  })

  it('handles session_end and cleans up', async () => {
    const response = await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        type: 'session_end',
        session: { id: sessionId, account_id: 'test', phone_number: '+1234567890' },
        reason: 'caller_hangup',
      }),
    })

    expect(response.ok).toBe(true)
  })
})

Conversation Flow Tests

Test complete conversation scenarios:

typescript
// tests/flows/booking-flow.test.ts

async function simulateConversation(messages: string[]): Promise<string[]> {
  const sessionId = `test-${Date.now()}`
  const responses: string[] = []

  // Start session
  await sendEvent({ type: 'session_start', session: { id: sessionId, ... } })

  // Simulate each user message
  for (const message of messages) {
    const response = await sendEvent({
      type: 'user_speak',
      session: { id: sessionId, ... },
      text: message,
    })
    responses.push(response.text)
  }

  // End session
  await sendEvent({ type: 'session_end', session: { id: sessionId, ... } })

  return responses
}

describe('Booking Flow', () => {
  it('completes a booking conversation', async () => {
    const responses = await simulateConversation([
      'I want to book an appointment',
      'Tomorrow at 2pm',
      'John Smith',
      'Yes, that is correct',
    ])

    expect(responses[0]).toMatch(/when|date|time/i)
    expect(responses[1]).toMatch(/name/i)
    expect(responses[2]).toMatch(/confirm/i)
    expect(responses[3]).toMatch(/booked|confirmed|scheduled/i)
  })

  it('handles corrections mid-flow', async () => {
    const responses = await simulateConversation([
      'I want to book an appointment',
      'Tomorrow at 2pm',
      'Actually, make it 3pm instead',
    ])

    expect(responses[2]).toMatch(/3|three|pm/i)
  })
})

Level 4: Local Development with ngrok

For testing with real sipgate infrastructure (but simulated calls), expose your local server:

bash
# Start your dev server
npm run dev

# In another terminal, expose it
ngrok http 3000

Configure the ngrok URL as your webhook endpoint in sipgate. Now sipgate can reach your local development server.

Use cases:

  • Test webhook authentication
  • Test with sipgate's actual event format
  • Debug production issues locally

Level 5: Real Phone Calls

Save these for final validation. Create a testing checklist:

markdown
## Pre-Release Phone Test Checklist

### Basic Flow
- [ ] Call connects and greeting plays
- [ ] Assistant responds to simple question
- [ ] Assistant handles "I don't understand" gracefully
- [ ] Call ends cleanly when user says goodbye

### Barge-In
- [ ] Interrupting mid-sentence works
- [ ] Assistant acknowledges interruption
- [ ] No "stale" responses after interruption

### Edge Cases
- [ ] Long silence from user (10+ seconds)
- [ ] Very long user input (30+ seconds of speaking)
- [ ] Background noise doesn't trigger false responses
- [ ] Accent/dialect recognition (if applicable)

### Error Handling
- [ ] Network timeout during LLM call
- [ ] Invalid user input
- [ ] Session state recovery after errors

Testing Utilities

Event Factory

Create a helper for generating test events:

typescript
// tests/utils/event-factory.ts

export function createSessionStartEvent(overrides = {}) {
  return {
    type: 'session_start',
    session: {
      id: `test-${Date.now()}`,
      account_id: 'test-account',
      phone_number: '+1234567890',
      direction: 'inbound',
      from_phone_number: '+0987654321',
      to_phone_number: '+1234567890',
    },
    ...overrides,
  }
}

export function createUserSpeakEvent(sessionId: string, text: string, overrides = {}) {
  return {
    type: 'user_speak',
    session: {
      id: sessionId,
      account_id: 'test-account',
      phone_number: '+1234567890',
    },
    text,
    ...overrides,
  }
}

export function createBargeInEvent(sessionId: string, text: string, overrides = {}) {
  return {
    type: 'user_speak',
    barged_in: true,
    session: {
      id: sessionId,
      account_id: 'test-account',
      phone_number: '+1234567890',
    },
    text,
    ...overrides,
  }
}

Response Assertions

typescript
// tests/utils/assertions.ts

export function assertSpeakAction(response: any, options: {
  containsText?: string
  sessionId?: string
} = {}) {
  expect(response.type).toBe('speak')
  expect(response.text).toBeTruthy()
  expect(response.tts).toBeDefined()

  if (options.containsText) {
    expect(response.text.toLowerCase()).toContain(options.containsText.toLowerCase())
  }
  if (options.sessionId) {
    expect(response.session_id).toBe(options.sessionId)
  }
}

export function assertTransferAction(response: any, targetNumber?: string) {
  expect(response.type).toBe('transfer')
  expect(response.target).toBeTruthy()
  if (targetNumber) {
    expect(response.target).toBe(targetNumber)
  }
}

CI/CD Integration

Run event simulation tests in your pipeline:

yaml
# .github/workflows/test.yml
name: Test Voice Assistant

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install dependencies
        run: npm ci

      - name: Run unit tests
        run: npm test

      - name: Start server
        run: npm run dev &
        env:
          # Use test/mock API keys
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY_TEST }}

      - name: Wait for server
        run: npx wait-on http://localhost:3000/api/health

      - name: Run integration tests
        run: npm run test:integration

Best Practices Summary

  1. Extract shared logic - Same LLM service for chat and voice
  2. Test the pyramid - Most tests at unit level, fewest at phone level
  3. Automate event simulation - Integration tests catch regressions
  4. Use deterministic test data - Fixed session IDs, predictable inputs
  5. Test conversation flows - Not just individual events
  6. Create test utilities - Event factories, response assertions
  7. Run in CI - Catch issues before deployment
  8. Save phone tests for validation - Manual checklist for final sign-off