Skip to content

Action Types

Complete reference for all actions you can return from event handlers.

Overview

Actions are responses that tell the AI Flow service what to do next. All actions require a session_id and type field.

Base Action Structure

typescript
interface BaseAction {
  session_id: string; // UUID from the event's session.id
  type: string;       // Action type identifier
}

Action Summary

Action TypeDescriptionPrimary Use Case
speakSpeak text or SSMLRespond to user with synthesized speech
audioPlay pre-recorded audioPlay hold music, pre-recorded messages
hangupEnd the callTerminate conversation
transferTransfer to another numberRoute to human agent or department
barge_inManually interrupt playbackStop current audio immediately

Speak Action

Speaks text or SSML to the user.

typescript
interface AiFlowActionSpeak {
  type: "speak";
  session_id: string;

  // Either text OR ssml (not both)
  text?: string; // Plain text to speak
  ssml?: string; // SSML markup for advanced control

  // Optional configurations
  tts?: TtsConfig;      // TTS provider settings
  barge_in?: BargeInConfig; // Barge-in behavior
}

Examples:

typescript
// Simple text
return {
  type: "speak",
  session_id: event.session.id,
  text: "Hello, how can I help you?",
};

// With SSML
return {
  type: "speak",
  session_id: event.session.id,
  ssml: `
    <speak version="1.0" xml:lang="en-US">
      <voice name="en-US-JennyNeural">
        <prosody rate="slow">Please listen carefully.</prosody>
        <break time="500ms"/>
        Your account balance is <say-as interpret-as="currency">$42.50</say-as>
      </voice>
    </speak>
  `,
};

// With custom TTS provider
return {
  type: "speak",
  session_id: event.session.id,
  text: "Hello in a different voice",
  tts: {
    provider: "azure",
    language: "en-US",
    voice: "en-US-JennyNeural",
  },
};

Audio Action

Plays pre-recorded audio to the user.

typescript
interface AiFlowActionAudio {
  type: "audio";
  session_id: string;
  audio: string; // Base64 encoded WAV (16kHz, mono, 16-bit)
  barge_in?: BargeInConfig;
}

Example:

typescript
// Play hold music or pre-recorded message
return {
  type: "audio",
  session_id: event.session.id,
  audio: base64EncodedWavData,
  barge_in: {
    strategy: "minimum_characters",
    minimum_characters: 3,
  },
};

Audio Format Requirements:

  • Format: WAV
  • Sample Rate: 16kHz
  • Channels: Mono
  • Bit Depth: 16-bit PCM
  • Encoding: Base64

Hangup Action

Ends the call.

typescript
interface AiFlowActionHangup {
  type: "hangup";
  session_id: string;
}

Example:

typescript
onUserSpeak: async (event) => {
  if (event.text.toLowerCase().includes("goodbye")) {
    return {
      type: "hangup",
      session_id: event.session.id,
    };
  }
};

Transfer Action

Transfers the call to another phone number.

typescript
interface AiFlowActionTransfer {
  type: "transfer";
  session_id: string;
  target_phone_number: string; // E.164 format recommended
  caller_id_name: string;
  caller_id_number: string;
}

Example:

typescript
// Transfer to sales department
return {
  type: "transfer",
  session_id: event.session.id,
  target_phone_number: "+1234567890",
  caller_id_name: "Sales Department",
  caller_id_number: "+1234567890",
};

Barge-In Action

Manually triggers barge-in (interrupts current playback).

typescript
interface AiFlowActionBargeIn {
  type: "barge_in";
  session_id: string;
}

Example:

typescript
// Manually interrupt current playback
return {
  type: "barge_in",
  session_id: event.session.id,
};

Type Safety

All actions are fully typed. Import types from the SDK:

typescript
import type {
  AiFlowAction,
  AiFlowActionSpeak,
  AiFlowActionAudio,
  AiFlowActionHangup,
  AiFlowActionTransfer,
  AiFlowActionBargeIn,
} from "@sipgate/ai-flow-sdk";

onUserSpeak: async (event) => {
  const action: AiFlowActionSpeak = {
    type: "speak",
    session_id: event.session.id,
    text: "Hello!",
  };
  return action;
};

Next Steps