SautiKit
PricingDevelopersBlogAbout
Sign inStart building

Order status by voice: an AI agent that answers "where's my order?"

Build an AI voice agent that looks up order and delivery status in your backend and reads back the ETA in natural language, no DTMF menu.

use-caseai-voice-agentstreamorder-statusllm

Next Steps

  • Voice Actions DSLVoice Actions are the JSON DSL Sautikit uses to control call flow. Your voice_callback_url returns a JSON array of verbs; the platform executes them in order against the live call.
  • Answer real phone calls with Gemini: bridge Gemini Live to SautikitA flagship realtime tutorial: bridge live phone audio from Sautikit's Stream verb to the Google Gemini Live API over WebSocket, so an AI voice agent answers real calls on any phone.
  • Ship an AI voice agent that answers calls: a 2026 developer guideA pillar guide to building a phone AI voice agent: the telephony, STT, LLM, and TTS layers, turn-based vs full-duplex builds, and where Sautikit fits as the voice layer.
  • AI appointment-booking voicebot: book, reschedule, cancel by voiceCallers dial a number and talk to an AI agent that books, reschedules, or cancels appointments in plain speech. No DTMF menu. Real-time audio via the Stream verb piped to your own LLM.
SautiKit

Programmable voice infrastructure for Africa. Buy numbers, place calls, and bill per second, all in local currency, via API.

Product

NumbersCalls & routingRecordingsWallet & billingPricing

Developers

DocumentationAPI referenceQuickstartAI prompt

Company

AboutBlogConsole

© 2026 Sautikit. All rights reserved • Powered by Helloduty

Terms of ServicePrivacy Policy

Sautikit provides voice API services for application developers. Numbers provisioned on this platform are not configured for emergency calling (e.g. 999 / 112). Do not use Sautikit numbers as a replacement for a primary phone line.

Summary

"Where's my order?" is the most repetitive call an e-commerce shop or last-mile courier fields. An AI voice agent handles it end to end: the caller speaks naturally, the agent looks up the order in your backend, and it reads back the current status and delivery ETA. No "press 1 for tracking" menu, no queue for a human.

You build this on the Stream voice action. Sautikit forks the live call audio to your WebSocket server, you relay it to an LLM (Gemini Live, OpenAI, or a self-hosted model), and you send the spoken reply back on the same socket. Your LLM owns the conversation and calls your own orders API to fetch real data.

Who this is for

  • E-commerce shops in Nairobi and across Africa fielding daily "where is my parcel" calls.
  • Last-mile logistics and courier operations that want callers to self-serve delivery ETAs.
  • Backend developers who already have an orders/tracking API and want a voice front end for it.
  • Product managers looking to deflect status calls from human agents without a rigid phone tree.

How it works

The loop is real-time and full-duplex:

  1. A caller dials a Sautikit number whose routing_url points to your voice webhook.
  2. Your webhook returns an XML <Response> containing a <Stream> element.
  3. Sautikit opens a WebSocket to your server. Your server must advertise the audio.drachtio.org subprotocol during the handshake, or the connection is rejected.
  4. Sautikit forks the live caller audio to that socket as binary 16-bit little-endian PCM frames.
  5. Your bridge relays the audio to your LLM (Gemini Live / OpenAI Realtime / self-hosted STT + LLM + TTS).
  6. When the caller asks about an order, the LLM calls your orders API — by order number they read out, or by their caller ID (From) to find recent orders.
  7. You send the spoken reply back as PCM on the same socket, and Sautikit plays it into the call.

Because the socket is bidirectional, the caller can interrupt (barge-in): when they start talking, stop your outbound audio and let the LLM respond to the new input.

ℹ

Set outputSamplingRate="16000" for AI agents. The wider band improves speech recognition accuracy over the 8000 Hz telephony default.

API surface

Endpoints you call:

  • POST /v1/numbers: claim a phone number for your order-status line.
  • PATCH /v1/numbers/{number_id}: set or update the routing_url (your voice webhook).
  • GET /v1/calls/{call_sid}: fetch the call detail record after the call ends.

Voice actions used:

  • Stream: fork live call audio to your WebSocket for the real-time AI loop.
  • Say: optional text-to-speech greeting before the stream opens.
  • Dial: optional escalation to a human agent when the caller asks or the LLM has no answer.

You identify the caller from the From number in the webhook POST body. Use it to look up that phone number's recent orders in your backend, so a returning customer can just ask "is my order out for delivery?" without reciting an order number.

⚠

Stream ships via the raw XML form only. Return the <Stream> element in an application/xml response — native JSON stream is on the roadmap. Don't wrap it in the JSON actions array.

Example

1. The XML your webhook returns

<Response>
  <Stream
    name="order-status"
    url="wss://your-app.example.com/audio"
    track="both_tracks"
    outputSamplingRate="16000"
    statusCallback="https://your-app.example.com/stream-status"
    statusEvents="stream-started stream-stopped stream-error" />
</Response>

track="both_tracks" forks both the caller's audio and the audio you play back. statusCallback receives fire-and-forget stream-started / stream-stopped / stream-error notifications so you can log session lifecycle.

2. A Node.js WebSocket bridge with an order-lookup tool

import { WebSocketServer } from "ws";
 
// Sautikit requires the audio.drachtio.org subprotocol on the handshake.
const wss = new WebSocketServer({
  port: 8080,
  handleProtocols: () => "audio.drachtio.org",
});
 
// Your LLM calls this tool to fetch real order data.
async function lookupOrder({ orderNumber, callerNumber }) {
  const params = orderNumber
    ? `order_number=${encodeURIComponent(orderNumber)}`
    : `phone=${encodeURIComponent(callerNumber)}`;
  const res = await fetch(`https://your-app.example.com/api/orders?${params}`);
  const order = await res.json();
  // e.g. { status: "out_for_delivery", eta: "today between 4 and 6 PM", courier: "PikiPiki" }
  return order;
}
 
wss.on("connection", (socket) => {
  // Connect to your realtime LLM (Gemini Live / OpenAI Realtime / self-hosted).
  const llm = connectRealtimeLLM({
    tools: [
      {
        name: "lookup_order",
        description: "Fetch delivery status and ETA for an order.",
        run: lookupOrder,
      },
    ],
    onAudio: (pcm) => socket.send(pcm), // spoken reply back into the call
  });
 
  socket.on("message", (data, isBinary) => {
    if (isBinary) llm.pushAudio(data); // 16-bit LE PCM from the caller
  });
 
  socket.on("close", () => llm.close());
});

The LLM decides when to call lookup_order, passing the order number the caller read out or falling back to their callerNumber. It then narrates the returned status and ETA in natural speech.

3. Attach your webhook to the number

curl -X PATCH "https://api.sautikit.com/v1/numbers/{number_id}" \
  -H "Authorization: Bearer $SAUTIKIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"routing_url": "https://your-app.example.com/voice"}'

Pricing notes

  • Inbound call: billed per second in KES for the live call leg while it is active on the Sautikit platform. Short status calls stay cheap.
  • LLM cost: the model, speech-to-text, and text-to-speech run on your own provider (Gemini, OpenAI, or self-hosted), billed by that provider — Sautikit only moves the audio.
  • Deflection: every self-served "where's my order" call is one your human agents never pick up. For a shop or courier taking hundreds of status calls a day, that is the whole return on the build.

Next steps

  • Stream voice action reference: full attribute list, subprotocol, and PCM format.
  • Build an AI voice engine with Gemini and Sautikit: wiring the realtime loop end to end.
  • How to build an AI voice agent: the STT → LLM → TTS pipeline and barge-in.
  • Voice actions concept: how the webhook action loop works.
  • Appointment booking agent: another AI voice agent pattern built on the same Stream loop.