If your IVR is getting support tickets saying "I pressed 1 but nothing happened", the cause is almost certainly in-band DTMF failing on Safaricom's AMR-NB mobile audio path. Sautikit uses RFC 4733 out-of-band DTMF by default, which bypasses the codec compression problem. But GetDigits timeout tuning matters too: increasing the default 5 000 ms timeout to 8 000 ms reduced "no input" errors by 23% in a measured Kenyan IVR deployment.
DTMF (Dual-Tone Multi-Frequency) is the signalling system used when you press a key on a phone keypad. Each key produces a combination of two audio tones: one from a row frequency and one from a column frequency:
1 209 Hz
1 336 Hz
1 477 Hz
697 Hz
1
2
3
770 Hz
4
5
6
852 Hz
7
8
9
941 Hz
*
0
#
For example, pressing 1 produces a simultaneous 697 Hz and 1 209 Hz tone. The receiver uses a Goertzel algorithm to detect which two frequencies are present.
There are two ways to transport DTMF over an IP voice path:
In-band DTMF: the tones are encoded as audio and carried inside the RTP audio stream, subject to the same codec compression as speech.
Out-of-band DTMF (RFC 2833 / RFC 4733): the keypress is signalled as a separate RTP event packet with its own payload type, independent of the audio codec.
The critical difference is that in-band DTMF is vulnerable to codec compression. Out-of-band DTMF is not.
Voice codecs optimise for the frequency characteristics of human speech, which are concentrated in the 300–3 400 Hz range. DTMF tones fall within this range (697 Hz to 1 477 Hz), so they are not filtered out by the codec. But codecs compress audio by discarding information that a human listener would not notice. The problem is that DTMF tone integrity depends on precise amplitude and phase relationships that a voice codec has no reason to preserve.
G.711 at 64 kbps passes in-band DTMF tones intact. The G.711 encoding is essentially lossless for the DTMF frequency range, and a G.711-encoded DTMF tone can be detected by a Goertzel algorithm at the receiver with >99% reliability.
The problem occurs when Safaricom's mobile network is in the path.
Safaricom's 2G and 3G mobile subscribers are transcoded through an AMR-NB (Adaptive Multi-Rate Narrowband) codec at 12.2 kbps before the audio reaches the PSTN interconnect. AMR-NB is specifically designed to maximise speech intelligibility at very low bitrates. It achieves this by using a CELP (Code-Excited Linear Prediction) encoder that models the vocal tract rather than encoding raw audio waveforms.
The consequence: AMR-NB is not optimised for sine-wave DTMF tones. In particular, it introduces up to 40% tone detection failure on the 697 Hz component of the row-1 keys (1, 2, 3). The 697 Hz tone falls in the lower range of the AMR-NB codec's effective reproduction window. The CELP encoder reconstructs this frequency with sufficient accuracy for speech intelligibility but not with sufficient phase coherence for DTMF detection.
The practical result: a Safaricom mobile subscriber who presses 1, 2, or 3 on your IVR has a 30–40% chance of that press not being detected by an in-band DTMF detector, even though the audio sounded fine to both parties.
This is not a Sautikit-specific issue. It is a fundamental consequence of the AMR-NB transcoding that all Kenyan mobile operators apply to 2G/3G subscribers.
RFC 4733 (an update to RFC 2833) defines a separate RTP payload type for telephony events. When a user presses a key, the SIP client or media gateway sends a series of RTP packets with payload type 101 (by convention, though the actual number is negotiated via SDP) that carry the DTMF event independently of the audio stream.
These event packets are not affected by AMR-NB transcoding because they are not audio. They travel as data in RTP packets alongside the audio stream. The DTMF detector on the Sautikit media gateway reads the event packets and signals the detected digit to your GetDigits verb handler.
Sautikit uses RFC 4733 by default. The SDP negotiation that enables it appears as:
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
This tells the remote party that payload type 101 carries telephony events (DTMF digits 0–9 plus *, #, A, B, C, D; events 0 through 16). Any SIP client or PSTN gateway that supports RFC 4733 will send DTMF digits as event packets on payload type 101 rather than encoding them in the audio stream.
To verify RFC 4733 is active on a call, open a Wireshark capture and filter on rtp. You should see RTP packets with payload type 101 appearing at keypress events. The payload will contain the digit value (0–15), a duration, and an end-of-event marker.
The GetDigits voice action in Sautikit looks like this in a voice flow JSON:
{ "actions": [ { "action": "Say", "text": "Welcome. Press 1 for account balance. Press 2 to speak with an agent. Press 3 to hear this menu again.", "language": "en-KE" }, { "action": "GetDigits", "numDigits": 1, "timeout": 8000, "finishOnKey": "#", "webhookUrl": "https://your-server.example.com/ivr/main-menu" } ]}
Key parameters:
numDigits: Set this to the exact number of digits you expect. When the user has entered numDigits digits, GetDigits stops listening immediately and fires the webhook; there is no need to wait for finishOnKey or the timeout. This reduces perceived wait time significantly. For a single-key menu, "numDigits": 1 means the webhook fires the instant the first key is pressed.
timeout: The number of milliseconds to wait for the first keypress after the Say completes. The default is 5 000 ms. Increase this to 8 000 ms for Kenyan mobile deployments.
The 8 000 ms recommendation comes from A/B test data collected from a real Kenyan IVR deployment (a financial services company's customer line handling ~2 000 calls/day, primarily Safaricom 3G subscribers). The test compared:
Group A: timeout: 5000 (default)
Group B: timeout: 8000
Results over 14 days:
Metric
Group A (5 000 ms)
Group B (8 000 ms)
"No input" errors
18.2%
14.0%
Reduction
n/a
−23%
Average time-to-keypress
3 100 ms
3 100 ms
The average time-to-keypress did not change; users pressed keys at the same speed. The reduction in "no input" errors came from users who were still processing the audio menu when the 5-second timeout fired. Mobile users on slow networks have higher cognitive latency when navigating audio menus: the audio quality fluctuation means they sometimes need to re-listen or wait for a degraded audio segment to become intelligible, adding 2–3 seconds to their response time.
finishOnKey: Set to "#" (the default) for single-entry flows. For digit sequences (like a PIN or account number), "#" is used as the submit key. Make sure your Say prompt mentions pressing # to confirm.
When GetDigits times out with no input received, the webhook fires with the digits field empty or absent. Your webhook handler must handle this case. A robust pattern:
import express from "express";const app = express();app.use(express.urlencoded({ extended: true }));const attempts = new Map();app.post("/ivr/main-menu", (req, res) => { const digits = req.body.Digits || ""; const callId = req.body.call_id; const attempt = (attempts.get(callId) || 0) + 1; attempts.set(callId, attempt); if (!digits) { if (attempt >= 3) { // Three failed attempts: read an error message and hang up. return res.json({ actions: [ { say: { text: "We did not receive your selection. Please call back and try again. Goodbye.", language: "en-KE", }, }, { hangup: {} }, ], }); } // Replay the menu with a shorter, more direct prompt. return res.json({ actions: [ { say: { text: "Sorry, we did not hear that. Please press 1 for balance, 2 for agent, or 3 to repeat.", language: "en-KE", }, }, { getDigits: { numDigits: 1, timeout: 8000, finishOnKey: "#", action: "https://your-server.example.com/ivr/main-menu", }, }, ], }); } // Route to the appropriate branch. const branchMap = { "1": "/ivr/balance", "2": "/ivr/agent", "3": "/ivr/main-menu", }; res.json({ actions: [{ redirect: { url: branchMap[digits] || "/ivr/main-menu" } }], });});
The three-attempt pattern with a Hangup on the third failure prevents calls from sitting open indefinitely with no resolution. Inbound calls are free (KES 0/min), but a call that never progresses is a poor experience regardless of cost.
When a caller presses 2 for an agent, Sautikit handles the voice leg; pair it with Helloduty when you need a live human-agent desk and ticketing to catch those handoffs.
sipp is a SIP load testing tool that can inject DTMF events into test calls. This is useful for verifying that your GetDigits configuration detects digits correctly at scale:
# Install sipp on Ubuntuapt-get install sipp# Run a test scenario that places 50 simultaneous calls,# each sending DTMF digit "1" after 3 secondssipp -sn uac \ -s +254712345678 \ sip.sautikit.com \ -l 50 \ -m 50 \ -d 5000 \ -key dtmf_digit "1" \ -trace_msg \ -r 5 \ -rp 1000
Monitor your webhook server's /ivr/main-menu endpoint during the test. The ratio of requests with digits: "1" to total requests gives your in-test DTMF detection rate. With RFC 4733 enabled (Sautikit's default), expect >99% detection. If you see lower rates, check that your SIP client is advertising payload type 101 in its SDP offer.