Skip to content

Core Concepts

Understanding the event-driven architecture and response model.

Event-Driven Architecture

The SDK uses an event-driven model where your assistant responds to events from the AI Flow service:

  1. Session Start - Called when a new call session begins
  2. User Speak - Called when the user says something (after speech-to-text)
  3. User Barge In - Called when the user interrupts the assistant
  4. Assistant Speak - Called after your assistant starts speaking (event may be left out)
  5. Assistant Speech Ended - Called when the assistant's speech playback ends
  6. Session End - Called when the call ends

Event Flow

┌─────────────────┐
│  session_start  │──> Respond with speak/audio or do nothing
└─────────────────┘

┌─────────────────┐
│   user_speak    │──> Respond with speak/audio/transfer/hangup
│  (barged_in?)   │    Check barged_in flag for interruptions
└─────────────────┘

┌─────────────────┐
│ assistant_speak │──> Optional: track metrics, trigger next action
└─────────────────┘

┌─────────────────┐
│   session_end   │──> Cleanup only, no actions accepted
└─────────────────┘

Response Types

Event handlers can return three types of responses:

1. Simple String

The simplest way to respond - just return a string:

typescript
onUserSpeak: async (event) => {
  return "Hello, how can I help?";
}

This is automatically converted to a speak action.

2. Action Object

For advanced control, return an action object:

typescript
onUserSpeak: async (event) => {
  return {
    type: "speak",
    session_id: event.session.id,
    text: "Hello!",
    barge_in: {
      strategy: "minimum_characters",
      minimum_characters: 3
    },
  };
}

Available action types:

  • speak - Text-to-speech response
  • audio - Play pre-recorded audio
  • hangup - End the call
  • transfer - Transfer to another number
  • barge_in - Manually interrupt playback

3. No Response

Return null or undefined when no response is needed:

typescript
onAssistantSpeak: async (event) => {
  // Track metrics, no response needed
  trackMetrics(event);
  return null;
}

Session Information

All events include session information:

typescript
interface SessionInfo {
  id: string;              // UUID of the session
  account_id: string;      // Account identifier
  phone_number: string;    // Phone number for this flow session
  direction?: "inbound" | "outbound";
  from_phone_number: string;
  to_phone_number: string;
}

Best Practices

1. Handle All Events

Even if you don't need to respond, it's good practice to handle all events:

typescript
const assistant = AiFlowAssistant.create({
  onSessionStart: async (event) => {
    // Initialize session state
    initializeSession(event.session.id);
    return "Welcome!";
  },

  onUserSpeak: async (event) => {
    // Main conversation logic
    return processUserInput(event.text);
  },

  onSessionEnd: async (event) => {
    // Cleanup
    cleanupSession(event.session.id);
  },
});

2. Use Type Safety

The SDK provides full TypeScript types:

typescript
import type {
  AiFlowEventUserSpeak,
  AiFlowAction
} from "@sipgate/ai-flow-sdk";

onUserSpeak: async (event: AiFlowEventUserSpeak) => {
  // event is fully typed
  const text: string = event.text;
  const sessionId: string = event.session.id;

  return {
    type: "speak",
    session_id: sessionId,
    text: `You said: ${text}`,
  } as AiFlowAction;
}

3. Error Handling

Always handle errors gracefully:

typescript
onUserSpeak: async (event) => {
  try {
    return await processUserInput(event.text);
  } catch (error) {
    console.error("Error processing user input:", error);
    return "I'm sorry, I encountered an error. Please try again.";
  }
}

Next Steps