"Where's my order?" is the most repetitive call an e-commerce shop or last-mile courier fields. An AI voice agent handles it end to end: the caller speaks naturally, the agent looks up the order in your backend, and it reads back the current status and delivery ETA. No "press 1 for tracking" menu, no queue for a human.
You build this on the Stream voice action. Sautikit forks the live call audio to your WebSocket server, you relay it to an LLM (Gemini Live, OpenAI, or a self-hosted model), and you send the spoken reply back on the same socket. Your LLM owns the conversation and calls your own orders API to fetch real data.
The loop is real-time and full-duplex:
routing_url points to your voice webhook.<Response> containing a <Stream> element.audio.drachtio.org subprotocol during the handshake, or the connection is rejected.From) to find recent orders.Because the socket is bidirectional, the caller can interrupt (barge-in): when they start talking, stop your outbound audio and let the LLM respond to the new input.
Endpoints you call:
POST /v1/numbers: claim a phone number for your order-status line.PATCH /v1/numbers/{number_id}: set or update the routing_url (your voice webhook).GET /v1/calls/{call_sid}: fetch the call detail record after the call ends.Voice actions used:
Stream: fork live call audio to your WebSocket for the real-time AI loop.Say: optional text-to-speech greeting before the stream opens.Dial: optional escalation to a human agent when the caller asks or the LLM has no answer.You identify the caller from the From number in the webhook POST body. Use it to look up that phone number's recent orders in your backend, so a returning customer can just ask "is my order out for delivery?" without reciting an order number.
<Response>
<Stream
name="order-status"
url="wss://your-app.example.com/audio"
track="both_tracks"
outputSamplingRate="16000"
statusCallback="https://your-app.example.com/stream-status"
statusEvents="stream-started stream-stopped stream-error" />
</Response>track="both_tracks" forks both the caller's audio and the audio you play back. statusCallback receives fire-and-forget stream-started / stream-stopped / stream-error notifications so you can log session lifecycle.
import { WebSocketServer } from "ws";
// Sautikit requires the audio.drachtio.org subprotocol on the handshake.
const wss = new WebSocketServer({
port: 8080,
handleProtocols: () => "audio.drachtio.org",
});
// Your LLM calls this tool to fetch real order data.
async function lookupOrder({ orderNumber, callerNumber }) {
const params = orderNumber
? `order_number=${encodeURIComponent(orderNumber)}`
: `phone=${encodeURIComponent(callerNumber)}`;
const res = await fetch(`https://your-app.example.com/api/orders?${params}`);
const order = await res.json();
// e.g. { status: "out_for_delivery", eta: "today between 4 and 6 PM", courier: "PikiPiki" }
return order;
}
wss.on("connection", (socket) => {
// Connect to your realtime LLM (Gemini Live / OpenAI Realtime / self-hosted).
const llm = connectRealtimeLLM({
tools: [
{
name: "lookup_order",
description: "Fetch delivery status and ETA for an order.",
run: lookupOrder,
},
],
onAudio: (pcm) => socket.send(pcm), // spoken reply back into the call
});
socket.on("message", (data, isBinary) => {
if (isBinary) llm.pushAudio(data); // 16-bit LE PCM from the caller
});
socket.on("close", () => llm.close());
});The LLM decides when to call lookup_order, passing the order number the caller read out or falling back to their callerNumber. It then narrates the returned status and ETA in natural speech.
curl -X PATCH "https://api.sautikit.com/v1/numbers/{number_id}" \
-H "Authorization: Bearer $SAUTIKIT_API_KEY" \
-H "Content-Type: application/json" \
-d '{"routing_url": "https://your-app.example.com/voice"}'