An AI customer-support voice agent answers inbound support calls, resolves the routine ones, and escalates the rest. It handles FAQs, looks up account or order context from your backend, walks the caller through simple fixes, and connects a human agent on the same call when it hits something it can't solve. The goal is deflection: most callers get an answer without ever waiting in a queue.
With Sautikit you drive the agent in real time. Your number's routing_url returns a <Stream> voice action; Sautikit forks live caller audio to your WebSocket server; your bridge relays that audio to the LLM of your choice and streams synthesized speech back into the call. Because your server owns the conversation, the same LLM turn that decides "I can't resolve this" can end the stream and return a <Dial> to a human.
The real-time loop:
routing_url points at your voice webhook.<Response> containing a <Stream> action with your wss:// URL.audio.drachtio.org subprotocol on the handshake, or Sautikit rejects the connection.outputSamplingRate you requested (use 16000 for AI models).<Dial> to a human agent, connecting them on the same call.Your server owns the conversation. Key the LLM session and any tool results by the call SID from the stream handshake, so a mid-call escalation to <Dial> can pass everything the human agent needs (verified identity, account ID, what the caller already tried).
Endpoints you call:
POST /v1/numbers: claim a phone number for your support line.PATCH /v1/numbers/{number_id}: set or update the routing_url (your voice webhook).GET /v1/calls/{call_sid}: fetch the call detail record after the call ends.Voice actions used:
Stream: fork live call audio to your WebSocket for real-time AI.Dial: connect the caller to a human agent for escalation.Say: optional TTS for a greeting or a fallback message before streaming starts.| Attribute | Required | Notes |
|---|---|---|
url | yes | wss:// endpoint Sautikit connects to. |
track | yes | inbound_track, outbound_track, or both_tracks. |
outputSamplingRate | yes | 8000 or 16000. Use 16000 for AI models. |
name | no | Friendly identifier echoed in stream status events. |
headerMetadata | no | JSON headers sent on the WebSocket handshake. |
openMetadata | no | Opaque UTF-8 payload sent as the first text frame. |
statusCallback | no | URL Sautikit POSTs stream status events to. |
statusEvents | no | Space-separated subset of stream-started, stream-stopped, stream-error. |
Audio on the socket is 16-bit little-endian PCM. Your server must accept the audio.drachtio.org subprotocol.
When the number is dialled, Sautikit POSTs to your routing_url. Reply with application/xml:
<Response>
<Stream
name="support-agent"
url="wss://your-app.example.com/audio"
track="both_tracks"
outputSamplingRate="16000"
statusCallback="https://your-app.example.com/stream-status"
statusEvents="stream-started stream-stopped stream-error" />
</Response>This sketch shows where the LLM plugs in, where you'd invoke an internal lookup, and how you'd signal an escalation. Wire the LLM client and PCM plumbing to your provider.
import { WebSocketServer } from "ws";
// Sautikit negotiates the `audio.drachtio.org` subprotocol on connect.
const wss = new WebSocketServer({
port: 8080,
handleProtocols: (protocols) =>
protocols.has("audio.drachtio.org") ? "audio.drachtio.org" : false,
});
wss.on("connection", (ws) => {
const llm = startLLMSession({
// Your internal tools the model can call mid-conversation.
tools: {
async getAccount({ msisdn }) {
const res = await fetch(
`https://internal.example.com/accounts?phone=${msisdn}`,
);
return res.json(); // balance, plan, open tickets, outage status...
},
},
// The model calls this when it can't resolve the issue.
onEscalate: (reason) => escalateToHuman(ws, reason),
});
ws.on("message", (data, isBinary) => {
if (isBinary) {
// Live caller audio: 16-bit LE PCM at 16000 Hz. Feed it to the model.
llm.pushAudio(data);
}
});
// Model output: PCM back on the same socket. Sautikit plays it into the call.
llm.on("audio", (pcm) => ws.send(pcm, { binary: true }));
});
function escalateToHuman(ws, reason) {
// Close the stream, then return a <Dial> from your routing flow so the
// caller is connected to a human agent on the same call.
ws.close();
// e.g. redirect the call to a webhook that responds with:
// <Response><Dial><Number>+254720000010</Number></Dial></Response>
}<Dial> connects a human agent, the per-second rate continues across the connected legs.<Stream> reference, status events, and the media handshake.