Break past USSD limits: when to migrate from USSD to voice, and how

USSD built Kenya's mobile financial services industry and is not going away: it works on every handset, requires no data, and costs the user nothing. But USSD has hard technical limits: 182-character session payloads, synchronous state on the operator's Mobile Switching Centre (MSC), no audio, and no multi-party interaction. This post maps the cases where USSD genuinely wins, the cases where voice wins decisively, and the migration pattern for adding voice to a USSD flow.

USSD (Unstructured Supplementary Service Data) has two properties that make it irreplaceable in Kenya's mobile market:

Works on every handset without data. A 2007 Nokia feature phone with a Safaricom SIM can access a USSD menu. This reach matters in a market where smartphone penetration is growing but not universal, and where data bundle depletion is a daily reality for millions of users.
Costs the user nothing. USSD sessions are billed to the business (approximately KES 0.50–1.50 per session under typical aggregator agreements in Kenya), not the end user. For financial services targeting low-income users, a zero-cost user experience is not optional; it is the product.

USSD also has near-instant session initiation (sub-2-second menu display on Safaricom 2G), which voice cannot match: a call takes 2–4 seconds to set up, plus the time for a TTS prompt to play before the user can respond.

These advantages make USSD the right choice for:

Balance checks
Simple PIN entry
Menu navigation where each step has a clear, short text response
Microfinance loan applications (collect ID, amount, confirm; under 182 chars per screen)
Service registration flows

USSD's limits are architectural, not incidental:

182-character session payload. Each USSD screen is capped at 182 characters of response content. Narrating loan terms, reading a multi-item order back to a customer, or explaining a repayment schedule in any detail is impossible. Developers work around this with numbered sub-menus, but the user experience degrades into a maze of "Press 5 for more options."

Synchronous MSC session state. A USSD session lives in the operator's Mobile Switching Centre. The MSC maintains session state for the duration of the interaction. If the session is idle for more than 180 seconds, the MSC terminates it. There is no grace period and no recovery. Long-running flows (review terms → think → confirm) time out on the MSC before the user finishes reading.

No audio. USSD is a text channel. You cannot play a pre-recorded loan term disclosure, capture a verbal acceptance, or verify a voice biometric. For regulated lending in Kenya, where CBK guidelines require borrower acknowledgement in a verifiable format, USSD's text-only nature creates compliance gaps.

No multi-party. USSD is a 1:1 channel between a single subscriber and the USSD gateway. You cannot use it to connect a customer to a live agent, add a guarantor to a loan conversation, or facilitate any multi-party interaction.

A voice call on Sautikit has no operator-imposed session timeout. A loan terms narration that takes 90 seconds is fine. A guided dispute resolution call that takes 8 minutes is fine. The session persists for as long as the call is active, limited only by your call flow logic.

Voice also enables:

Audio naturalness: narrating amounts, names, and reference numbers in Swahili or English is clearer than displaying them in 6-point USSD text on a small screen.
Recorded acceptance: a Record verb captures the borrower's verbal "I agree", creating an auditable evidence trail that satisfies CBK lending guidelines.
Live agent transfer: a Dial verb can hand the caller to a human agent mid-flow without re-authentication.
Real-time branching: GetDigits captures user intent at each step with clear audio prompts, without the 182-character constraint.

The voice disadvantage: cost and friction. A 45-second outbound voice OTP call costs KES 2.25 at KES 3/min. A USSD session costs KES 0.50–1.50 and nothing to the user. For high-volume, simple-query flows, USSD remains more economical. Inbound voice calls are free (KES 0/min), which changes the economics for flows where the user initiates the call.

Use case	USSD	Voice	Winner
Balance check	Instant, zero-cost	Overkill	USSD
PIN change	Short, secure, text-only	Unnecessary	USSD
Loan application (collect inputs)	Works well	Slower setup	USSD
Loan term disclosure (regulated)	182-char limit, no audio	Long narration, recordable	Voice
OTP (high-value transaction)	SMS fallback issues	Second-factor via different channel	Voice
Customer dispute resolution	Cannot transfer to agent	Live agent via `Dial`	Voice
Order confirmation (complex)	Limited characters	Audio readback	Voice
Missed-call callback	No audio channel	Core use case	Voice
Delivery ETA notification	Session timeout risk	Outbound call	Voice
Feature phone user, no data	Works perfectly	Also works (no data needed)	USSD

The most practical migration path is not to replace USSD but to use it for what it does well (quick input collection) and hand off to voice for the step that exceeds USSD's capabilities.

The pattern:

User dials *123# on their handset. USSD session starts on the Safaricom MSC.
USSD collects the user's loan amount, repayment period, and national ID via short menus, all within the 182-char constraint.
At the final USSD screen, the app server responds with END Confirm. We will call you to read your loan terms. Please stay on the line.
The END response terminates the USSD session on the MSC. Simultaneously, the app server fires a POST /v1/calls to Sautikit.
Sautikit places an outbound call to the user's E.164 number within ~2 seconds of the USSD END.
The voice flow narrates the loan terms, captures verbal acceptance via Record, and reads back the reference number.

The USSD session termination (END) is the trigger. Here is the handoff in Node.js:

// USSD handler: final confirmation screen
app.post("/ussd", async (req, res) => {
  const { text, phoneNumber } = req.body;
  const steps = text.split("*");
 
  if (steps.length === 4) {
    // User has completed all input steps (amount, period, ID confirmation)
    const loanData = parseLoanInputs(steps);
    await db.savePendingLoan(phoneNumber, loanData);
 
    // Schedule the voice handoff: fire-and-forget (don't await)
    triggerLoanDisclosureCall(phoneNumber, loanData).catch(console.error);
 
    // Terminate the USSD session
    res.send("END Your loan terms will be read to you in a moment. Please stay on the line.");
    return;
  }
 
  // ... earlier menu steps
});
 
async function triggerLoanDisclosureCall(phoneNumber, loanData) {
  // Normalise USSD phone format to E.164 with +
  const e164 = phoneNumber.startsWith("+") ? phoneNumber : `+${phoneNumber}`;
 
  await fetch("https://api.sautikit.com/v1/calls", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${process.env.SAUTIKIT_API_KEY}`,
      "Content-Type": "application/json",
      "Idempotency-Key": `loan-disclosure-${loanData.id}`,
    },
    body: JSON.stringify({
      from: "+254712345678",   // Your Sautikit number
      to: [e164],
      voice_actions: {
        actions: [
          {
            say: {
              text: `Hello. You are applying for a loan of KES ${loanData.amount.toLocaleString()} over ${loanData.period} months. Your monthly repayment will be KES ${loanData.monthly.toLocaleString()}. Press 1 to accept these terms, or press 2 to cancel.`,
              language: "en-KE"
            }
          },
          {
            getDigits: {
              numDigits: 1,
              timeout: 10000,
              finishOnKey: "",
              action: `https://your-server.example.com/loan/acceptance?loan_id=${loanData.id}`
            }
          }
        ]
      }
    }),
  });
}

The Idempotency-Key header with loan-${loanData.id} prevents double-calls if the USSD webhook fires twice (a known USSD aggregator behaviour on timeout).

Interaction type	USSD cost (business)	Voice cost	Notes
Simple balance check	KES 0.50–1.50	KES 1.00 (20 s outbound × KES 3/min)	USSD wins for outbound; inbound is free
Loan application input (4 screens)	KES 2.00–6.00	N/A (use USSD)	USSD for input
Loan term disclosure (45 s outbound call)	N/A	KES 2.25	Voice required by CBK guidelines
OTP verification (25 s outbound call)	KES 1.50 (SMS fallback)	KES 1.25	Voice now cheaper than SMS
Full loan flow (USSD input + voice disclosure)	KES 2.00–6.00 + KES 2.25	N/A	Combined: KES 4.25–8.25 total

For a 45-second outbound voice disclosure at KES 3/min: 45 / 60 × 3 = KES 2.25. This is 1.5–6x the cost of a USSD session, but voice disclosure is not optional for regulated lending. The USSD-only equivalent would require reading a multi-paragraph disclosure through USSD's 182-char limit, which is practically impossible and legally insufficient.

Developers migrating from USSD to voice sometimes ask whether they need to surrender their USSD shortcode when adding voice. The answer is no.

USSD shortcodes are registered under the Networks and Application Services licence class (or equivalent under the aggregator's licence). VoIP services operate under the ASP licence. These are separate licence classes and a business can hold both simultaneously.

The recommended architecture keeps USSD for session initiation, because shortcode-based USSD (*123#) requires no smartphone and no data plan, maintaining reach to feature phone users, while adding Sautikit voice for the richer interaction steps that exceed USSD's technical limits.

Sautikit is voice-only, so if you need to run the USSD menus, SMS fallbacks, or a human-agent desk alongside your voice flow, Helloduty provides those channels in the same family.

Create a Sautikit workspace and claim a phone number.
Top up over M-Pesa: KES billing, no card.
Wire your USSD END screen to fire a POST /v1/calls so voice picks up where the session leaves off.

Start with Sautikit → · See pricing → · Need SMS, WhatsApp & an agent desk? Helloduty →