A Sautikit deployment has five composable layers. A DID phone number sits at the top. Attached to it are SIP credentials for device authentication, and a voice URL that receives call state callbacks. Calls are logged as call records with events. Recordings are stored separately and fetched via presigned URL. This post maps how each layer works and how they connect.
A DID (Direct Inward Dialling) number is the entry point. Every number you claim from Sautikit is:
An E.164 number in the inventory (GET /v1/numbers/available)
Assigned to exactly one workspace (POST /v1/numbers/{id}/claim)
Billed at the monthly rate and per-minute rates shown in the inventory record
Capable of voice; the capabilities array on the inventory row lists what the number can do
Once claimed, the number appears in GET /v1/numbers for your workspace. It has a status that moves through active → suspended → released. Releasing a number (DELETE /v1/numbers/{id}) makes it available for other workspaces to claim in the next inventory cycle.
Numbers have a routing configuration at PATCH /v1/numbers/{id}/routing. This is where you set:
voice_url: the URL Sautikit POSTs call callbacks to when a call arrives on this number
events_url: a separate URL for call lifecycle events (started, answered, completed)
voice_signing_secret: used to verify the HMAC signature on incoming callbacks
SIP credentials let a SIP-capable device (softphone, WebRTC client, hardware phone, Asterisk, FreePBX) register against the Sautikit network and make or receive calls through your claimed number.
The password is shown once. If you lose it, rotate it at POST /v1/numbers/{id}/sip-credentials/{credential_id}/rotate.
SIP credentials are bound to a specific number. A device registered with these credentials can only place calls from that number's E.164 identity, and inbound calls to that number can ring the registered device.
If you are building a browser-based softphone, the ws_url is the WebSocket endpoint for WebRTC-over-SIP (SIP/WS or SIP/WSS). Standard WebRTC SIP stacks (JsSIP, SIP.js) connect to wss://sip.sautikit.com using these credentials.
The voice URL is where you control call flow. When a call arrives (inbound) or progresses (outbound via PBX callbacks), Sautikit POSTs the call state to your voice_url. Your endpoint returns a JSON response that tells the PBX what to do next.
Voice action loop
The request body from Sautikit contains the PBX call state:
Your response tells the PBX what to do. The voice-actions DSL supports these verbs:
Verb
Description
say
Synthesise and speak text (TTS)
play
Stream an audio file URL
getDigits
Collect DTMF input with optional prompt
dial
Connect to a number or SIP URI
conference
Place the caller into a named conference room
record
Record the caller's audio
redirect
Hand off call flow to another URL
reject
Reject the call with a reason code
hangup
End the call immediately
A simple IVR response. If you are coming from an XML-based voice API (Africa's Talking, Twilio/TwiML), both forms below work at runtime: return the XML on the left or the JSON on the right, whichever fits your codebase. Sautikit parses and validates the JSON DSL; raw XML is forwarded to the telephony engine byte-for-byte:
<Response><Say language="en-US">Welcome to Acme support. Press 1 for billing, 2 for technical.</Say><GetDigits maxDigits="1" timeout="10" finishOnKey="#" /></Response>
XML nests any prompt <Say> inside <GetDigits> and uses maxDigits where JSON uses numDigits. If you already have an XML-based handler, you can return raw PBX XML directly; Sautikit forwards it to the telephony engine byte-for-byte:
<Response> <Say>Connecting you now.</Say> <Dial><Number>+254733000000</Number></Dial></Response>
The PBX supports a standard XML verb set. JSON is preferred for new integrations because the DSL is typed and validated before reaching the PBX.
Maximum 50 actions per response (MaxActionSteps). Actions are executed sequentially.
Every call, inbound or outbound, generates a call record (calls table). The record is available immediately at GET /v1/calls/{id}.
Call state machine
completed is the only fully-successful terminal state. failed · busy · canceled are additional terminal states not shown above.
Key fields on the call record:
Field
Type
Notes
id
UUID
Call row ID
direction
inbound / outbound
local_e164
string
Your number
remote_e164
string
The other party
status
string
See lifecycle above
duration_seconds
integer
Null until completed
cost_minor
integer
KES cents; null until debited
pbx_call_id
string
HD_… from the PBX
Call events are finer-grained than status: GET /v1/calls/{id}/events returns a timeline of every state transition with occurred_at timestamps. Use this for debugging: a failed call will show what the PBX reported before hanging up.
Aggregate stats across all calls in your workspace: GET /v1/calls/stats. Returns total call count, failed count, total minutes, and total spend for the selected period.
If the recording is still being captured: status: pending, HTTP 202, retry_after_seconds tells you when to poll again.
The presigned URL is valid for 15 minutes. Re-fetch to get a fresh one. Recording retention follows your storage tier: the free 5 GB tier includes the default retention window. Assets pruned by retention policy return status: expired and HTTP 410.
The wallet deduction happens in the worker after the call.completed event. The cost_minor field on the call record is populated once the debit is confirmed. There is no separate billing delay; it is typically seconds after the call ends.