SautiKit
PricingDevelopersBlogAbout
Sign inStart building

Outbound AI voice calls that qualify leads and run phone surveys

Place outbound calls via the API, let an AI agent ask qualifying questions in natural language, and warm-transfer hot leads to a live rep.

use-caseai-voice-agentstreamoutboundlead-qualificationllm

Next Steps

  • Voice Actions DSLVoice Actions are the JSON DSL Sautikit uses to control call flow. Your voice_callback_url returns a JSON array of verbs; the platform executes them in order against the live call.
  • Answer real phone calls with Gemini: bridge Gemini Live to SautikitA flagship realtime tutorial: bridge live phone audio from Sautikit's Stream verb to the Google Gemini Live API over WebSocket, so an AI voice agent answers real calls on any phone.
  • Ship an AI voice agent that answers calls: a 2026 developer guideA pillar guide to building a phone AI voice agent: the telephony, STT, LLM, and TTS layers, turn-based vs full-duplex builds, and where Sautikit fits as the voice layer.
  • AI support voice agent: deflect and triage inbound calls, escalate liveStream live caller audio to your own LLM over a WebSocket, resolve routine support requests automatically, and escalate to a human on the same call with Dial when the agent can't help.
SautiKit

Programmable voice infrastructure for Africa. Buy numbers, place calls, and bill per second, all in local currency, via API.

Product

NumbersCalls & routingRecordingsWallet & billingPricing

Developers

DocumentationAPI referenceQuickstartAI prompt

Company

AboutBlogConsole

© 2026 Sautikit. All rights reserved • Powered by Helloduty

Terms of ServicePrivacy Policy

Sautikit provides voice API services for application developers. Numbers provisioned on this platform are not configured for emergency calling (e.g. 999 / 112). Do not use Sautikit numbers as a replacement for a primary phone line.

Summary

Outbound lead qualification means your system places the call, not the customer. You feed a list of contacts to the Sautikit API, an AI voice agent dials each one, asks a short set of qualifying questions in natural language, and records the answers. When a lead is hot, the same call is warm-transferred to a live rep so the human picks up a conversation already in progress.

The whole thing runs over a real-time audio Stream. Sautikit forks the live call audio to your WebSocket, your bridge relays it to an LLM (Google Gemini Live, OpenAI, or a self-hosted model) that runs your script, and the model's spoken replies flow back down the same socket. You own the prompt, the scoring logic, and the transfer rules.

Who this is for

  • Sales teams across Africa qualifying inbound web leads by phone — SaaS, insurance, real-estate, fintech — before a rep spends time on them.
  • Growth teams running high-volume callbacks where a human-only funnel cannot keep up with lead flow.
  • NGOs and research firms running phone surveys or follow-up questionnaires at scale.
  • Product teams that want to plug their own LLM and qualification script into a managed voice layer instead of building telephony from scratch.

How it works

The outbound real-time loop:

  1. Place the call. Your backend calls POST /v1/calls with to and from. Sautikit dials the lead.
  2. Answer. When the callee picks up, Sautikit fetches your voice webhook — the routing_url on the from number, or a call-level answer URL — which returns XML <Response> containing a <Stream>.
  3. Open the socket. Sautikit opens a WebSocket to the url in your <Stream>. Your server must advertise the audio.drachtio.org subprotocol during the handshake or the connection is rejected.
  4. Fork audio both ways. Sautikit streams live call audio as binary 16-bit little-endian PCM frames to your socket, and plays back any PCM frames you send. This is full-duplex — the agent can listen and speak at the same time.
  5. Run the script. Your bridge relays audio to your LLM, which asks the qualifying questions, transcribes the answers, and scores the lead.
  6. Route hot leads. On a qualifying answer, your flow returns a <Dial> to transfer the call to a live rep. Cold leads get a polite close and a logged CDR.
⚠

Outbound calling carries consent and compliance obligations. Only call contacts who opted in, honour do-not-call lists, disclose that the call is recorded and AI-assisted where required, and respect calling-hour rules in each market you dial into.

API surface

Endpoints you call:

  • POST /v1/calls: place the outbound call with to, from, and an optional clientRequestId for idempotency and tracking.
  • PATCH /v1/numbers/{number_id}: set the routing_url on your from number so Sautikit knows which webhook to fetch on answer.
  • GET /v1/calls/{call_sid}: fetch the call detail record (duration, status, timestamps) after the call ends.

Voice actions used:

  • Stream: fork live call audio to your WebSocket for real-time AI. Available via the XML form today.
  • Dial: warm-transfer a hot lead to a live rep or external number.
  • Say: text-to-speech for a fixed intro or fallback message.

Example

1. Place the outbound call

curl -X POST "https://api.sautikit.com/v1/calls" \
  -H "Authorization: Bearer $SAUTIKIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"to":["+254712345678"],"from":"+254711000001","clientRequestId":"lead-42"}'

Sautikit dials +254712345678. When the lead answers, it fetches the routing_url configured on +254711000001.

2. The XML your webhook returns

Your answer webhook responds with an application/xml body that opens the audio stream:

<Response>
  <Stream
    name="lead-qualifier"
    url="wss://your-app.example.com/audio"
    track="both_tracks"
    outputSamplingRate="16000"
    statusCallback="https://your-app.example.com/stream-status"
    statusEvents="stream-started stream-stopped stream-error" />
</Response>

track="both_tracks" forks both the caller and callee legs. outputSamplingRate="16000" is the rate AI models expect. Audio frames are 16-bit little-endian PCM.

3. The WebSocket bridge (Node.js)

import { WebSocketServer } from "ws";
 
// Advertise the required subprotocol or Sautikit rejects the handshake.
const wss = new WebSocketServer({
  port: 8080,
  handleProtocols: () => "audio.drachtio.org",
});
 
const SCRIPT = [
  "Hi, this is Zawadi from Acme. Is now a good time for a couple of quick questions?",
  "Are you the person who handles buying decisions for your team?",
  "Roughly how many seats would you need?",
  "What is your timeline to get started?",
];
 
wss.on("connection", (socket) => {
  const llm = connectToLLM(); // Gemini Live / OpenAI / self-hosted
  let step = 0;
 
  llm.ask(SCRIPT[step]); // speak the first question
 
  // Live PCM from the call -> your LLM for transcription + turn detection
  socket.on("message", (pcmFrame) => llm.pushAudio(pcmFrame));
 
  // LLM audio replies -> back down the same socket to the caller
  llm.on("audio", (pcmFrame) => socket.send(pcmFrame));
 
  // When the model finishes an answer, advance the script or route the lead
  llm.on("answer", ({ text, hot }) => {
    logAnswer(step, text);
    if (hot) return warmTransfer(socket); // return <Dial> to a live rep
    if (++step < SCRIPT.length) llm.ask(SCRIPT[step]);
    else closePolitely(socket);
  });
});

warmTransfer ends the stream and hands the in-progress call to your flow's <Dial> step, so a human rep joins the same live call.

Pricing notes

Outbound calls are billed per second in KES on the connected leg — you pay for the time the lead is actually on the line, from answer to hangup. A 90-second qualification call bills 90 seconds; there is no separate charge for running the Stream or for the WebSocket round-trips.

LLM inference runs on your own provider (Gemini, OpenAI, or self-hosted), so those tokens or minutes are billed by that provider, not by Sautikit. When a hot lead is warm-transferred, the connected rep leg continues to bill per second for its duration.

Because these are outbound calls, responsible use matters: only dial opted-in contacts, disclose recording and AI assistance where required, and keep to lawful calling hours in each market.

Next steps

  • Voice actions concept: the full <Stream> attribute reference and status events.
  • How to build an AI voice agent: end-to-end walkthrough of the stream-to-LLM loop.
  • AI voice engine with Gemini: wiring Gemini Live to a Sautikit audio stream.
  • Dial voice action: warm-transfer options for handing hot leads to a rep.
  • AI support agent use case: the inbound counterpart to outbound qualification.