USSD built Kenya's mobile financial services industry and is not going away: it works on every handset, requires no data, and costs the user nothing. But USSD has hard technical limits: 182-character session payloads, synchronous state on the operator's Mobile Switching Centre (MSC), no audio, and no multi-party interaction. This post maps the cases where USSD genuinely wins, the cases where voice wins decisively, and the migration pattern for adding voice to a USSD flow.
USSD (Unstructured Supplementary Service Data) has two properties that make it irreplaceable in Kenya's mobile market:
Works on every handset without data. A 2007 Nokia feature phone with a Safaricom SIM can access a USSD menu. This reach matters in a market where smartphone penetration is growing but not universal, and where data bundle depletion is a daily reality for millions of users.
Costs the user nothing. USSD sessions are billed to the business (approximately KES 0.50–1.50 per session under typical aggregator agreements in Kenya), not the end user. For financial services targeting low-income users, a zero-cost user experience is not optional; it is the product.
USSD also has near-instant session initiation (sub-2-second menu display on Safaricom 2G), which voice cannot match: a call takes 2–4 seconds to set up, plus the time for a TTS prompt to play before the user can respond.
These advantages make USSD the right choice for:
Balance checks
Simple PIN entry
Menu navigation where each step has a clear, short text response
Microfinance loan applications (collect ID, amount, confirm; under 182 chars per screen)
182-character session payload. Each USSD screen is capped at 182 characters of response content. Narrating loan terms, reading a multi-item order back to a customer, or explaining a repayment schedule in any detail is impossible. Developers work around this with numbered sub-menus, but the user experience degrades into a maze of "Press 5 for more options."
Synchronous MSC session state. A USSD session lives in the operator's Mobile Switching Centre. The MSC maintains session state for the duration of the interaction. If the session is idle for more than 180 seconds, the MSC terminates it. There is no grace period and no recovery. Long-running flows (review terms → think → confirm) time out on the MSC before the user finishes reading.
No audio. USSD is a text channel. You cannot play a pre-recorded loan term disclosure, capture a verbal acceptance, or verify a voice biometric. For regulated lending in Kenya, where CBK guidelines require borrower acknowledgement in a verifiable format, USSD's text-only nature creates compliance gaps.
No multi-party. USSD is a 1:1 channel between a single subscriber and the USSD gateway. You cannot use it to connect a customer to a live agent, add a guarantor to a loan conversation, or facilitate any multi-party interaction.
A voice call on Sautikit has no operator-imposed session timeout. A loan terms narration that takes 90 seconds is fine. A guided dispute resolution call that takes 8 minutes is fine. The session persists for as long as the call is active, limited only by your call flow logic.
Voice also enables:
Audio naturalness: narrating amounts, names, and reference numbers in Swahili or English is clearer than displaying them in 6-point USSD text on a small screen.
Recorded acceptance: a Record verb captures the borrower's verbal "I agree", creating an auditable evidence trail that satisfies CBK lending guidelines.
Live agent transfer: a Dial verb can hand the caller to a human agent mid-flow without re-authentication.
Real-time branching: GetDigits captures user intent at each step with clear audio prompts, without the 182-character constraint.
The voice disadvantage: cost and friction. A 45-second outbound voice OTP call costs KES 2.25 at KES 3/min. A USSD session costs KES 0.50–1.50 and nothing to the user. For high-volume, simple-query flows, USSD remains more economical. Inbound voice calls are free (KES 0/min), which changes the economics for flows where the user initiates the call.
The most practical migration path is not to replace USSD but to use it for what it does well (quick input collection) and hand off to voice for the step that exceeds USSD's capabilities.
The pattern:
User dials *123# on their handset. USSD session starts on the Safaricom MSC.
USSD collects the user's loan amount, repayment period, and national ID via short menus, all within the 182-char constraint.
At the final USSD screen, the app server responds with END Confirm. We will call you to read your loan terms. Please stay on the line.
The END response terminates the USSD session on the MSC. Simultaneously, the app server fires a POST /v1/calls to Sautikit.
Sautikit places an outbound call to the user's E.164 number within ~2 seconds of the USSD END.
The voice flow narrates the loan terms, captures verbal acceptance via Record, and reads back the reference number.
The USSD session termination (END) is the trigger. Here is the handoff in Node.js:
// USSD handler: final confirmation screenapp.post("/ussd", async (req, res) => { const { text, phoneNumber } = req.body; const steps = text.split("*"); if (steps.length === 4) { // User has completed all input steps (amount, period, ID confirmation) const loanData = parseLoanInputs(steps); await db.savePendingLoan(phoneNumber, loanData); // Schedule the voice handoff: fire-and-forget (don't await) triggerLoanDisclosureCall(phoneNumber, loanData).catch(console.error); // Terminate the USSD session res.send("END Your loan terms will be read to you in a moment. Please stay on the line."); return; } // ... earlier menu steps});async function triggerLoanDisclosureCall(phoneNumber, loanData) { // Normalise USSD phone format to E.164 with + const e164 = phoneNumber.startsWith("+") ? phoneNumber : `+${phoneNumber}`; await fetch("https://api.sautikit.com/v1/calls", { method: "POST", headers: { "Authorization": `Bearer ${process.env.SAUTIKIT_API_KEY}`, "Content-Type": "application/json", "Idempotency-Key": `loan-disclosure-${loanData.id}`, }, body: JSON.stringify({ from: "+254712345678", // Your Sautikit number to: [e164], voice_actions: { actions: [ { say: { text: `Hello. You are applying for a loan of KES ${loanData.amount.toLocaleString()} over ${loanData.period} months. Your monthly repayment will be KES ${loanData.monthly.toLocaleString()}. Press 1 to accept these terms, or press 2 to cancel.`, language: "en-KE" } }, { getDigits: { numDigits: 1, timeout: 10000, finishOnKey: "", action: `https://your-server.example.com/loan/acceptance?loan_id=${loanData.id}` } } ] } }), });}
The Idempotency-Key header with loan-${loanData.id} prevents double-calls if the USSD webhook fires twice (a known USSD aggregator behaviour on timeout).
For a 45-second outbound voice disclosure at KES 3/min: 45 / 60 × 3 = KES 2.25. This is 1.5–6x the cost of a USSD session, but voice disclosure is not optional for regulated lending. The USSD-only equivalent would require reading a multi-paragraph disclosure through USSD's 182-char limit, which is practically impossible and legally insufficient.
Developers migrating from USSD to voice sometimes ask whether they need to surrender their USSD shortcode when adding voice. The answer is no.
USSD shortcodes are registered under the Networks and Application Services licence class (or equivalent under the aggregator's licence). VoIP services operate under the ASP licence. These are separate licence classes and a business can hold both simultaneously.
The recommended architecture keeps USSD for session initiation, because shortcode-based USSD (*123#) requires no smartphone and no data plan, maintaining reach to feature phone users, while adding Sautikit voice for the richer interaction steps that exceed USSD's technical limits.
Sautikit is voice-only, so if you need to run the USSD menus, SMS fallbacks, or a human-agent desk alongside your voice flow, Helloduty provides those channels in the same family.