Outbound AI voice calls that qualify leads and run phone surveys

Outbound lead qualification means your system places the call, not the customer. You feed a list of contacts to the Sautikit API, an AI voice agent dials each one, asks a short set of qualifying questions in natural language, and records the answers. When a lead is hot, the same call is warm-transferred to a live rep so the human picks up a conversation already in progress.

The whole thing runs over a real-time audio Stream. Sautikit forks the live call audio to your WebSocket, your bridge relays it to an LLM (Google Gemini Live, OpenAI, or a self-hosted model) that runs your script, and the model's spoken replies flow back down the same socket. You own the prompt, the scoring logic, and the transfer rules.

Sales teams across Africa qualifying inbound web leads by phone — SaaS, insurance, real-estate, fintech — before a rep spends time on them.
Growth teams running high-volume callbacks where a human-only funnel cannot keep up with lead flow.
NGOs and research firms running phone surveys or follow-up questionnaires at scale.
Product teams that want to plug their own LLM and qualification script into a managed voice layer instead of building telephony from scratch.

The outbound real-time loop:

Place the call. Your backend calls POST /v1/calls with to and from. Sautikit dials the lead.
Answer. When the callee picks up, Sautikit fetches your voice webhook — the routing_url on the from number, or a call-level answer URL — which returns XML <Response> containing a <Stream>.
Open the socket. Sautikit opens a WebSocket to the url in your <Stream>. Your server must advertise the audio.drachtio.org subprotocol during the handshake or the connection is rejected.
Fork audio both ways. Sautikit streams live call audio as binary 16-bit little-endian PCM frames to your socket, and plays back any PCM frames you send. This is full-duplex — the agent can listen and speak at the same time.
Run the script. Your bridge relays audio to your LLM, which asks the qualifying questions, transcribes the answers, and scores the lead.
Route hot leads. On a qualifying answer, your flow returns a <Dial> to transfer the call to a live rep. Cold leads get a polite close and a logged CDR.

Endpoints you call:

POST /v1/calls: place the outbound call with to, from, and an optional clientRequestId for idempotency and tracking.
PATCH /v1/numbers/{number_id}: set the routing_url on your from number so Sautikit knows which webhook to fetch on answer.
GET /v1/calls/{call_sid}: fetch the call detail record (duration, status, timestamps) after the call ends.

Voice actions used:

Stream: fork live call audio to your WebSocket for real-time AI. Available via the XML form today.
Dial: warm-transfer a hot lead to a live rep or external number.
Say: text-to-speech for a fixed intro or fallback message.

curl -X POST "https://api.sautikit.com/v1/calls" \
  -H "Authorization: Bearer $SAUTIKIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"to":["+254712345678"],"from":"+254711000001","clientRequestId":"lead-42"}'

Sautikit dials +254712345678. When the lead answers, it fetches the routing_url configured on +254711000001.

Your answer webhook responds with an application/xml body that opens the audio stream:

<Response>
  <Stream
    name="lead-qualifier"
    url="wss://your-app.example.com/audio"
    track="both_tracks"
    outputSamplingRate="16000"
    statusCallback="https://your-app.example.com/stream-status"
    statusEvents="stream-started stream-stopped stream-error" />
</Response>

track="both_tracks" forks both the caller and callee legs. outputSamplingRate="16000" is the rate AI models expect. Audio frames are 16-bit little-endian PCM.

import { WebSocketServer } from "ws";
 
// Advertise the required subprotocol or Sautikit rejects the handshake.
const wss = new WebSocketServer({
  port: 8080,
  handleProtocols: () => "audio.drachtio.org",
});
 
const SCRIPT = [
  "Hi, this is Zawadi from Acme. Is now a good time for a couple of quick questions?",
  "Are you the person who handles buying decisions for your team?",
  "Roughly how many seats would you need?",
  "What is your timeline to get started?",
];
 
wss.on("connection", (socket) => {
  const llm = connectToLLM(); // Gemini Live / OpenAI / self-hosted
  let step = 0;
 
  llm.ask(SCRIPT[step]); // speak the first question
 
  // Live PCM from the call -> your LLM for transcription + turn detection
  socket.on("message", (pcmFrame) => llm.pushAudio(pcmFrame));
 
  // LLM audio replies -> back down the same socket to the caller
  llm.on("audio", (pcmFrame) => socket.send(pcmFrame));
 
  // When the model finishes an answer, advance the script or route the lead
  llm.on("answer", ({ text, hot }) => {
    logAnswer(step, text);
    if (hot) return warmTransfer(socket); // return <Dial> to a live rep
    if (++step < SCRIPT.length) llm.ask(SCRIPT[step]);
    else closePolitely(socket);
  });
});

warmTransfer ends the stream and hands the in-progress call to your flow's <Dial> step, so a human rep joins the same live call.

Outbound calls are billed per second in KES on the connected leg — you pay for the time the lead is actually on the line, from answer to hangup. A 90-second qualification call bills 90 seconds; there is no separate charge for running the Stream or for the WebSocket round-trips.

LLM inference runs on your own provider (Gemini, OpenAI, or self-hosted), so those tokens or minutes are billed by that provider, not by Sautikit. When a hot lead is warm-transferred, the connected rep leg continues to bill per second for its duration.

Because these are outbound calls, responsible use matters: only dial opted-in contacts, disclose recording and AI assistance where required, and keep to lawful calling hours in each market.

Voice actions concept: the full <Stream> attribute reference and status events.
How to build an AI voice agent: end-to-end walkthrough of the stream-to-LLM loop.
AI voice engine with Gemini: wiring Gemini Live to a Sautikit audio stream.
Dial voice action: warm-transfer options for handing hot leads to a rep.
AI support agent use case: the inbound counterpart to outbound qualification.

Sales teams across Africa qualifying inbound web leads by phone — SaaS, insurance, real-estate, fintech — before a rep spends time on them.
Growth teams running high-volume callbacks where a human-only funnel cannot keep up with lead flow.
NGOs and research firms running phone surveys or follow-up questionnaires at scale.
Product teams that want to plug their own LLM and qualification script into a managed voice layer instead of building telephony from scratch.

The outbound real-time loop:

Place the call. Your backend calls POST /v1/calls with to and from. Sautikit dials the lead.
Answer. When the callee picks up, Sautikit fetches your voice webhook — the routing_url on the from number, or a call-level answer URL — which returns XML <Response> containing a <Stream>.
Open the socket. Sautikit opens a WebSocket to the url in your <Stream>. Your server must advertise the audio.drachtio.org subprotocol during the handshake or the connection is rejected.
Fork audio both ways. Sautikit streams live call audio as binary 16-bit little-endian PCM frames to your socket, and plays back any PCM frames you send. This is full-duplex — the agent can listen and speak at the same time.
Run the script. Your bridge relays audio to your LLM, which asks the qualifying questions, transcribes the answers, and scores the lead.
Route hot leads. On a qualifying answer, your flow returns a <Dial> to transfer the call to a live rep. Cold leads get a polite close and a logged CDR.

Endpoints you call:

POST /v1/calls: place the outbound call with to, from, and an optional clientRequestId for idempotency and tracking.
PATCH /v1/numbers/{number_id}: set the routing_url on your from number so Sautikit knows which webhook to fetch on answer.
GET /v1/calls/{call_sid}: fetch the call detail record (duration, status, timestamps) after the call ends.

Voice actions used:

Stream: fork live call audio to your WebSocket for real-time AI. Available via the XML form today.
Dial: warm-transfer a hot lead to a live rep or external number.
Say: text-to-speech for a fixed intro or fallback message.

curl -X POST "https://api.sautikit.com/v1/calls" \
  -H "Authorization: Bearer $SAUTIKIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"to":["+254712345678"],"from":"+254711000001","clientRequestId":"lead-42"}'

Sautikit dials +254712345678. When the lead answers, it fetches the routing_url configured on +254711000001.

Your answer webhook responds with an application/xml body that opens the audio stream:

<Response>
  <Stream
    name="lead-qualifier"
    url="wss://your-app.example.com/audio"
    track="both_tracks"
    outputSamplingRate="16000"
    statusCallback="https://your-app.example.com/stream-status"
    statusEvents="stream-started stream-stopped stream-error" />
</Response>

track="both_tracks" forks both the caller and callee legs. outputSamplingRate="16000" is the rate AI models expect. Audio frames are 16-bit little-endian PCM.

import { WebSocketServer } from "ws";
 
// Advertise the required subprotocol or Sautikit rejects the handshake.
const wss = new WebSocketServer({
  port: 8080,
  handleProtocols: () => "audio.drachtio.org",
});
 
const SCRIPT = [
  "Hi, this is Zawadi from Acme. Is now a good time for a couple of quick questions?",
  "Are you the person who handles buying decisions for your team?",
  "Roughly how many seats would you need?",
  "What is your timeline to get started?",
];
 
wss.on("connection", (socket) => {
  const llm = connectToLLM(); // Gemini Live / OpenAI / self-hosted
  let step = 0;
 
  llm.ask(SCRIPT[step]); // speak the first question
 
  // Live PCM from the call -> your LLM for transcription + turn detection
  socket.on("message", (pcmFrame) => llm.pushAudio(pcmFrame));
 
  // LLM audio replies -> back down the same socket to the caller
  llm.on("audio", (pcmFrame) => socket.send(pcmFrame));
 
  // When the model finishes an answer, advance the script or route the lead
  llm.on("answer", ({ text, hot }) => {
    logAnswer(step, text);
    if (hot) return warmTransfer(socket); // return <Dial> to a live rep
    if (++step < SCRIPT.length) llm.ask(SCRIPT[step]);
    else closePolitely(socket);
  });
});

warmTransfer ends the stream and hands the in-progress call to your flow's <Dial> step, so a human rep joins the same live call.

Because these are outbound calls, responsible use matters: only dial opted-in contacts, disclose recording and AI assistance where required, and keep to lawful calling hours in each market.

Voice actions concept: the full <Stream> attribute reference and status events.
How to build an AI voice agent: end-to-end walkthrough of the stream-to-LLM loop.
AI voice engine with Gemini: wiring Gemini Live to a Sautikit audio stream.
Dial voice action: warm-transfer options for handing hot leads to a rep.
AI support agent use case: the inbound counterpart to outbound qualification.