IVR menus are the backbone of Kenyan fintech customer service: balance checks, loan repayment confirmations, and M-Pesa dispute routing all flow through them. This guide builds a four-level IVR tree with Sautikit's JSON voice-action verbs: main menu, payment confirmation branch, agent transfer branch, and a callback scheduling branch, with localised Swahili/English prompts and KES amounts voiced correctly.
Draw the call flow on paper before touching code. Each branch is a GetDigits → Webhook → next voice response cycle. A four-level tree for a Kenyan fintech product might look like:
Inbound call
└── Main menu (press 1, 2, 3, or 0)
├── 1 → Payment confirmation branch
│ └── Read back last transaction, press 1 to confirm
├── 2 → Agent transfer branch
│ └── Dial to agent pool
├── 3 → Callback scheduling branch
│ └── Record voice message, confirm slot
└── 0 → Repeat menu
Key design rules before implementation:
Every GetDigits must have a timeout and a fallback action for when no input is received
Every Redirect must respond within 5 seconds or the call falls back to the previous state
Keep menu depth to 3 levels maximum: callers on mobile audio lose context past the third branch
IVR menu flow
Each digit press posts back to your voice_callback_url. Your server reads the Digits field and returns the next action set.
The action URL receives the digit the caller pressed. Route to the correct branch handler:
function redirectTo(res, path) { // Return a Redirect verb to another webhook path. res.json({ actions: [{ redirect: { url: `${BASE_URL}${path}` } }], });}app.post("/menu-branch", (req, res) => { const digit = req.body.Digits || ""; switch (digit) { case "1": return redirectTo(res, "/payment-confirmation"); case "2": return redirectTo(res, "/agent-transfer"); case "3": return redirectTo(res, "/callback-schedule"); case "0": return redirectTo(res, "/voice"); default: // Unrecognised digit: replay menu return redirectTo(res, "/voice"); }});
The Redirect verb hands off call control to the target URL. The 5-second deadline is the time your target handler has to return a response. If it takes longer, Sautikit treats it as a timeout and re-plays the previous menu state. Keep your branch handlers lean: any database query in the handler path must complete within ~2 seconds to leave margin for network round-trip.
Reading a KES amount aloud in a way that Kenyan callers understand requires formatting. The Say verb with language: sw-KE can voice Swahili numbers natively, but the amount must be formatted as the TTS engine expects it.
The reliable approach is to format the amount explicitly in text:
const ONES = ["", "moja", "mbili", "tatu", "nne", "tano", "sita", "saba", "nane", "tisa"];const TENS = ["", "kumi", "ishirini", "thelathini", "arobaini", "hamsini", "sitini", "sabini", "themanini", "tisini"];function hundreds(n) { const h = Math.floor(n / 100); const t = Math.floor((n % 100) / 10); const o = n % 10; const parts = []; if (h > 0) parts.push(h > 1 ? `mia ${ONES[h]}` : "mia moja"); if (t > 0) { parts.push(TENS[t]); if (o > 0) parts.push(`na ${ONES[o]}`); } else if (o > 0) { parts.push(ONES[o]); } return parts.join(" ");}// Format a KES amount as Swahili words for the Say verb.// Handles amounts from 0 to 999,999.// E.g. 1250.50 → "shilingi elfu moja mia mbili hamsini na senti hamsini"function formatKesSwahili(amountKes) { const shillings = Math.trunc(amountKes); const cents = Math.round((amountKes - shillings) * 100); const parts = []; const thousands = Math.floor(shillings / 1000); const remainder = shillings % 1000; if (thousands > 0) { parts.push(thousands === 1 ? "elfu moja" : `elfu ${hundreds(thousands)}`); } if (remainder > 0) { parts.push(hundreds(remainder)); } let result = parts.length ? `shilingi ${parts.join(" ")}` : "shilingi sifuri"; if (cents > 0) { result += ` na senti ${hundreds(cents)}`; } return result;}
Use it in the payment confirmation handler:
app.post("/payment-confirmation", async (req, res) => { const caller = req.body.From || ""; // Fetch the last transaction for this caller from your database const txn = await getLastTransaction(caller); if (!txn) { return res.json({ actions: [ { say: { text: "We could not find a recent transaction on your account.", language: "en-KE" } }, { redirect: { url: `${BASE_URL}/voice` } }, ], }); } const amountText = formatKesSwahili(txn.amount_kes); const ref = txn.reference; res.json({ actions: [ { say: { text: `Your last payment was ${amountText}, ` + `reference ${ref}. ` + "Press 1 to confirm receipt, or 2 to report a dispute.", language: "en-KE", }, }, { getDigits: { numDigits: 1, timeout: 8000, action: `${BASE_URL}/payment-confirm-response`, }, }, { hangup: {} }, ], });});
The Dial verb connects the caller to an agent. Importantly, the caller's original From number is preserved in the transferred call's metadata; the agent receives it in the X-Original-Caller field in the Sautikit webhook at the agent leg.
app.post("/agent-transfer", (req, res) => { // The agent pool number: a queue or direct line const AGENT_NUMBER = "+254720000100"; res.json({ actions: [ { say: { text: "Connecting you to an agent. Please hold.", language: "en-KE", }, }, { dial: { number: AGENT_NUMBER, // The caller's From number is automatically forwarded // as X-Original-Caller in the webhook to the agent leg. // Your agent system can read it to pre-populate the screen. timeout: 30, action: `${BASE_URL}/agent-transfer-complete`, }, }, // If agent doesn't answer: { say: { text: "All agents are busy. Press 3 to leave a callback request.", language: "en-KE", }, }, { redirect: { url: `${BASE_URL}/callback-schedule` } }, ], });});
On the agent side, read the caller's number from the Sautikit webhook:
app.post("/agent-webhook", (req, res) => { // When Sautikit dials the agent, it includes the original caller's number const originalCaller = req.get("X-Original-Caller") || req.body.originalCaller || "unknown"; // Pre-populate the CRM screen for the agent crmLookup(originalCaller); res.json({ actions: [{ say: { text: "Connecting customer now.", language: "en-KE" } }], });});
This pattern means the agent never needs to ask "what number are you calling from?"; the number is already in the CRM before the agent picks up.
The callback branch records a short voice message from the caller and schedules a return call. The Record verb captures audio and delivers the recording URL via webhook:
app.post("/callback-schedule", (req, res) => { res.json({ actions: [ { say: { text: "Please leave your name and a brief reason for your call " + "after the tone. Press the hash key when you are done.", language: "en-KE", }, }, { record: { maxLength: 60, // 60-second maximum finishOnKey: "#", action: `${BASE_URL}/callback-recorded`, playBeep: true, }, }, { hangup: {} }, ], });});app.post("/callback-recorded", (req, res) => { const recordingUrl = req.body.RecordingUrl; const caller = req.body.From || ""; const duration = req.body.RecordingDuration || 0; // Store the callback request in your queue saveCallbackRequest(caller, recordingUrl, duration); res.json({ actions: [ { say: { text: "Thank you. We have received your callback request " + "and will call you back within 2 hours.", language: "en-KE", }, }, { hangup: {} }, ], });});
This IVR handles voice end-to-end. When a caller needs to be reached over SMS, WhatsApp, or USSD, or handed to a full human-agent desk with ticketing, Helloduty adds those channels on top of the same Sautikit voice layer.