Order status by voice: an AI agent that answers "where's my order?"

"Where's my order?" is the most repetitive call an e-commerce shop or last-mile courier fields. An AI voice agent handles it end to end: the caller speaks naturally, the agent looks up the order in your backend, and it reads back the current status and delivery ETA. No "press 1 for tracking" menu, no queue for a human.

You build this on the Stream voice action. Sautikit forks the live call audio to your WebSocket server, you relay it to an LLM (Gemini Live, OpenAI, or a self-hosted model), and you send the spoken reply back on the same socket. Your LLM owns the conversation and calls your own orders API to fetch real data.

E-commerce shops in Nairobi and across Africa fielding daily "where is my parcel" calls.
Last-mile logistics and courier operations that want callers to self-serve delivery ETAs.
Backend developers who already have an orders/tracking API and want a voice front end for it.
Product managers looking to deflect status calls from human agents without a rigid phone tree.

The loop is real-time and full-duplex:

A caller dials a Sautikit number whose routing_url points to your voice webhook.
Your webhook returns an XML <Response> containing a <Stream> element.
Sautikit opens a WebSocket to your server. Your server must advertise the audio.drachtio.org subprotocol during the handshake, or the connection is rejected.
Sautikit forks the live caller audio to that socket as binary 16-bit little-endian PCM frames.
Your bridge relays the audio to your LLM (Gemini Live / OpenAI Realtime / self-hosted STT + LLM + TTS).
When the caller asks about an order, the LLM calls your orders API — by order number they read out, or by their caller ID (From) to find recent orders.
You send the spoken reply back as PCM on the same socket, and Sautikit plays it into the call.

Because the socket is bidirectional, the caller can interrupt (barge-in): when they start talking, stop your outbound audio and let the LLM respond to the new input.

Endpoints you call:

POST /v1/numbers: claim a phone number for your order-status line.
PATCH /v1/numbers/{number_id}: set or update the routing_url (your voice webhook).
GET /v1/calls/{call_sid}: fetch the call detail record after the call ends.

Voice actions used:

Stream: fork live call audio to your WebSocket for the real-time AI loop.
Say: optional text-to-speech greeting before the stream opens.
Dial: optional escalation to a human agent when the caller asks or the LLM has no answer.

You identify the caller from the From number in the webhook POST body. Use it to look up that phone number's recent orders in your backend, so a returning customer can just ask "is my order out for delivery?" without reciting an order number.

<Response>
  <Stream
    name="order-status"
    url="wss://your-app.example.com/audio"
    track="both_tracks"
    outputSamplingRate="16000"
    statusCallback="https://your-app.example.com/stream-status"
    statusEvents="stream-started stream-stopped stream-error" />
</Response>

track="both_tracks" forks both the caller's audio and the audio you play back. statusCallback receives fire-and-forget stream-started / stream-stopped / stream-error notifications so you can log session lifecycle.

import { WebSocketServer } from "ws";
 
// Sautikit requires the audio.drachtio.org subprotocol on the handshake.
const wss = new WebSocketServer({
  port: 8080,
  handleProtocols: () => "audio.drachtio.org",
});
 
// Your LLM calls this tool to fetch real order data.
async function lookupOrder({ orderNumber, callerNumber }) {
  const params = orderNumber
    ? `order_number=${encodeURIComponent(orderNumber)}`
    : `phone=${encodeURIComponent(callerNumber)}`;
  const res = await fetch(`https://your-app.example.com/api/orders?${params}`);
  const order = await res.json();
  // e.g. { status: "out_for_delivery", eta: "today between 4 and 6 PM", courier: "PikiPiki" }
  return order;
}
 
wss.on("connection", (socket) => {
  // Connect to your realtime LLM (Gemini Live / OpenAI Realtime / self-hosted).
  const llm = connectRealtimeLLM({
    tools: [
      {
        name: "lookup_order",
        description: "Fetch delivery status and ETA for an order.",
        run: lookupOrder,
      },
    ],
    onAudio: (pcm) => socket.send(pcm), // spoken reply back into the call
  });
 
  socket.on("message", (data, isBinary) => {
    if (isBinary) llm.pushAudio(data); // 16-bit LE PCM from the caller
  });
 
  socket.on("close", () => llm.close());
});

The LLM decides when to call lookup_order, passing the order number the caller read out or falling back to their callerNumber. It then narrates the returned status and ETA in natural speech.

curl -X PATCH "https://api.sautikit.com/v1/numbers/{number_id}" \
  -H "Authorization: Bearer $SAUTIKIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"routing_url": "https://your-app.example.com/voice"}'

Inbound call: billed per second in KES for the live call leg while it is active on the Sautikit platform. Short status calls stay cheap.
LLM cost: the model, speech-to-text, and text-to-speech run on your own provider (Gemini, OpenAI, or self-hosted), billed by that provider — Sautikit only moves the audio.
Deflection: every self-served "where's my order" call is one your human agents never pick up. For a shop or courier taking hundreds of status calls a day, that is the whole return on the build.

Stream voice action reference: full attribute list, subprotocol, and PCM format.
Build an AI voice engine with Gemini and Sautikit: wiring the realtime loop end to end.
How to build an AI voice agent: the STT → LLM → TTS pipeline and barge-in.
Voice actions concept: how the webhook action loop works.
Appointment booking agent: another AI voice agent pattern built on the same Stream loop.