Skip to content

Changelog

Release notes for the sipgate AI Flow API and SDK. Only customer-visible changes are listed here.


Preview — May 2026

End-to-End Voice-to-Voice Mode (Preview)

You can now connect your assistant to an end-to-end speech-to-speech model. With the new configure_voice_to_voice action the assistant bypasses the standard STT → text → TTS pipeline: caller audio flows directly into the model and the model's spoken response is sent straight back to the caller. Conversations feel snappier and more natural, with first-byte response latencies typically in the 200–600 ms range.

User turns are still surfaced as user_speak events so call traces and logs keep working — you only need to send a single configure_voice_to_voice action on session_start. To revert to the standard pipeline mid-call, send a configure_transcription action.

This is a preview feature, available on request after sipgate support review.


v1.9.0 — May 2026

Per-Action VAD Configuration

You can now configure Voice Activity Detection (VAD) individually for each speak action. This lets you fine-tune how sensitive barge-in detection is depending on the context — for example, using stricter VAD during important announcements and more permissive settings during open-ended questions.


Improvements — April 2026

Faster, More Natural Conversation Turns

Upgraded to a next-generation transcription backend with significantly improved end-of-utterance detection. The assistant responds faster at natural sentence endings and is less likely to cut in while the caller is still speaking.

Background Audio Looping (mix_audio)

The mix_audio action now supports looping — play hold music or ambient sound continuously in the background while the assistant speaks, without gaps or manual re-triggering.

Transfer with Timeout Fallback

The transfer action accepts an optional timeout. If the transfer destination does not answer within the configured time, the call returns to your assistant automatically, allowing you to handle the fallback gracefully.

Send SMS During a Call (send_sms)

A new send_sms action lets your assistant send an SMS to the caller while the call is still active — useful for sending confirmation links, reference numbers, or follow-up information in real time.

Keypad (DTMF) Input Support

Your assistant can now react to keypad presses during a call. DTMF digits are delivered as events, enabling menu navigation, PIN entry, and other touch-tone interactions. User input timeouts also reset correctly when the caller presses a key.

Consistent E.164 Phone Numbers in All Events

Caller and callee phone numbers in all events are now consistently formatted as E.164 (e.g. +4921112345678). If you were normalising numbers on your side, this step is no longer necessary.


v1.5.1 — March 2026

Outbound Calls

Initiate AI-powered calls programmatically via POST /ai-flows/:aiFlowId/call. Your assistant handles the call as soon as the recipient picks up — the same event-driven flow as inbound calls. Available on request after a review by sipgate support.

Real-Time Speech Start Event (user_speech_started)

A new user_speech_started event is sent the moment the caller begins speaking — before transcription completes. Use it to interrupt the assistant or trigger visual feedback without waiting for the full transcript.

Faster ElevenLabs Voices

ElevenLabs voices now use the latest eleven_flash_v2_5 model by default, delivering noticeably lower latency for generated speech.

ElevenLabs EU Data Residency

ElevenLabs voices now route through the EU endpoint by default, keeping audio data within the European Union.


Improvements — February 2026

Immediate Barge-In Strategy

A new immediate barge-in strategy detects speech using Voice Activity Detection (VAD) the moment a caller starts talking — typically 20–100 ms before the first word is transcribed. Conversations feel as natural as talking to a real person.

Mid-Call Language and Provider Switching (configure_transcription)

A new configure_transcription action lets your assistant switch the transcription language or provider in the middle of a call — for example, after detecting that the caller speaks a different language, or to adapt recognition parameters dynamically. Supported languages follow BCP-47 tags and work across Azure, Deepgram, and ElevenLabs.


Improvements — January 2026

SSML Support in Speak Actions

The speak action now accepts SSML (Speech Synthesis Markup Language) in addition to plain text. Use SSML to control pronunciation, pauses, emphasis, and speaking rate for fine-tuned voice output.


Early Access — November–December 2025

Multi-Provider Transcription

Deepgram and ElevenLabs are now available as speech-to-text providers alongside Azure. Select the provider that best fits your use case — each offers different strengths in accuracy, latency, and supported languages.

Phone Number Routing

AI flows can now be associated with specific phone numbers directly through the API, making it easier to build multi-flow routing logic without external IVR configuration.

SDK Launch

The @sipgate/ai-flow-sdk TypeScript SDK is now publicly available on npm. It provides fully typed event handlers and action builders, removing the need to manage raw WebSocket or HTTP webhook payloads manually.


Note: The AI Flow API follows continuous delivery — not all improvements correspond to an SDK version bump. Check this page regularly for the latest changes.