Understand Sautikit in one read: how numbers, SIP, voice actions, and recordings compose

A Sautikit deployment has five composable layers. A DID phone number sits at the top. Attached to it are SIP credentials for device authentication, and a voice URL that receives call state callbacks. Calls are logged as call records with events. Recordings are stored separately and fetched via presigned URL. This post maps how each layer works and how they connect.

The call stack: five composable layers

Each layer is independently configurable. Layer 1–3 are set up at provisioning time; Layers 4–5 are generated per call.

A DID (Direct Inward Dialling) number is the entry point. Every number you claim from Sautikit is:

An E.164 number in the inventory (GET /v1/numbers/available)
Assigned to exactly one workspace (POST /v1/numbers/{id}/claim)
Billed at the monthly rate and per-minute rates shown in the inventory record
Capable of voice; the capabilities array on the inventory row lists what the number can do

Once claimed, the number appears in GET /v1/numbers for your workspace. It has a status that moves through active → suspended → released. Releasing a number (DELETE /v1/numbers/{id}) makes it available for other workspaces to claim in the next inventory cycle.

Numbers have a routing configuration at PATCH /v1/numbers/{id}/routing. This is where you set:

voice_url: the URL Sautikit POSTs call callbacks to when a call arrives on this number
events_url: a separate URL for call lifecycle events (started, answered, completed)
voice_signing_secret: used to verify the HMAC signature on incoming callbacks

SIP credentials let a SIP-capable device (softphone, WebRTC client, hardware phone, Asterisk, FreePBX) register against the Sautikit network and make or receive calls through your claimed number.

Create credentials for a number:

curl -s -X POST \
  "https://api.sautikit.com/v1/numbers/a1b2c3d4-.../sip-credentials" \
  -H "Authorization: Bearer $SAUTIKIT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"label": "alice-desk"}'

Response (reveal-once):

{
  "id": "...",
  "tenant_number_id": "...",
  "label": "alice-desk",
  "username": "ts_alice_abc123",
  "password": "sk_live_...",
  "sip": {
    "host": "sip.sautikit.com",
    "port": 5060,
    "transport": "tls",
    "ws_url": "wss://sip.sautikit.com"
  }
}

The password is shown once. If you lose it, rotate it at POST /v1/numbers/{id}/sip-credentials/{credential_id}/rotate.

SIP credentials are bound to a specific number. A device registered with these credentials can only place calls from that number's E.164 identity, and inbound calls to that number can ring the registered device.

If you are building a browser-based softphone, the ws_url is the WebSocket endpoint for WebRTC-over-SIP (SIP/WS or SIP/WSS). Standard WebRTC SIP stacks (JsSIP, SIP.js) connect to wss://sip.sautikit.com using these credentials.

The voice URL is where you control call flow. When a call arrives (inbound) or progresses (outbound via PBX callbacks), Sautikit POSTs the call state to your voice_url. Your endpoint returns a JSON response that tells the PBX what to do next.

Voice action loop

The request body from Sautikit contains the PBX call state:

{
  "callSessionState": "Answered",
  "direction": "inbound",
  "callerNumber": "+254700000001",
  "clientDialedNumber": "+254712345678",
  "sessionId": "HD_a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "callStartTime": "2026-06-27T09:21:11.000Z",
  "isActive": true
}

Sautikit signs the request with X-Sautikit-Signature: t=<unix>,v1=<hex>. Verify it:

signed_input = raw_body_bytes + "." + unix_seconds_string
v1 = hmac_sha256(voice_signing_secret, signed_input).hex()
reject if abs(now - t) > 300

Your response tells the PBX what to do. The voice-actions DSL supports these verbs:

Verb	Description
`say`	Synthesise and speak text (TTS)
`play`	Stream an audio file URL
`getDigits`	Collect DTMF input with optional prompt
`dial`	Connect to a number or SIP URI
`conference`	Place the caller into a named conference room
`record`	Record the caller's audio
`redirect`	Hand off call flow to another URL
`reject`	Reject the call with a reason code
`hangup`	End the call immediately

A simple IVR response. If you are coming from an XML-based voice API (Africa's Talking, Twilio/TwiML), both forms below work at runtime: return the XML on the left or the JSON on the right, whichever fits your codebase. Sautikit parses and validates the JSON DSL; raw XML is forwarded to the telephony engine byte-for-byte:

{
"actions": [
  { "say": { "text": "Welcome to Acme support. Press 1 for billing, 2 for technical.", "language": "en-US" } },
  { "getDigits": { "numDigits": 1, "timeout": 10, "finishOnKey": "#" } }
]
}

<Response>
<Say language="en-US">Welcome to Acme support. Press 1 for billing, 2 for technical.</Say>
<GetDigits maxDigits="1" timeout="10" finishOnKey="#" />
</Response>

XML nests any prompt <Say> inside <GetDigits> and uses maxDigits where JSON uses numDigits. If you already have an XML-based handler, you can return raw PBX XML directly; Sautikit forwards it to the telephony engine byte-for-byte:

<Response>
  <Say>Connecting you now.</Say>
  <Dial><Number>+254733000000</Number></Dial>
</Response>

The PBX supports a standard XML verb set. JSON is preferred for new integrations because the DSL is typed and validated before reaching the PBX.

Maximum 50 actions per response (MaxActionSteps). Actions are executed sequentially.

Every call, inbound or outbound, generates a call record (calls table). The record is available immediately at GET /v1/calls/{id}.

Call state machine

completed is the only fully-successful terminal state. failed · busy · canceled are additional terminal states not shown above.

Key fields on the call record:

Field	Type	Notes
`id`	UUID	Call row ID
`direction`	`inbound` / `outbound`
`local_e164`	string	Your number
`remote_e164`	string	The other party
`status`	string	See lifecycle above
`duration_seconds`	integer	Null until completed
`cost_minor`	integer	KES cents; null until debited
`pbx_call_id`	string	`HD_…` from the PBX

Call events are finer-grained than status: GET /v1/calls/{id}/events returns a timeline of every state transition with occurred_at timestamps. Use this for debugging: a failed call will show what the PBX reported before hanging up.

Aggregate stats across all calls in your workspace: GET /v1/calls/stats. Returns total call count, failed count, total minutes, and total spend for the selected period.

Recordings are optional. To enable recording on an outbound call, add a record field to the dial action in your voice-actions response:

{
"actions": [
  { "dial": { "number": "+254700000001", "record": "record-from-answer" } }
]
}

<Response>
<Dial record="record-from-answer">
  <Number>+254700000001</Number>
</Dial>
</Response>

For inbound calls, use the record verb before or after a dial:

{
"actions": [
  {
    "record": {
      "maxLength": 300,
      "finishOnKey": "#",
      "action": "https://your-server.example.com/recording-done"
    }
  }
]
}

<Response>
<Record maxLength="300" finishOnKey="#" action="https://your-server.example.com/recording-done" method="POST" />
</Response>

After the call ends, the recording goes through a two-stage delivery:

Recording lifecycle

To fetch the recording for a call:

curl -s "https://api.sautikit.com/v1/calls/{call_id}/recording" \
  -H "Authorization: Bearer $SAUTIKIT_API_KEY"

If the recording is ready:

{
  "call_id": "9d2b1f53-8c0e-4f1d-9a6b-5d3a8c47e9f0",
  "status": "ready",
  "url": "https://recordings.sautikit.com/9d2b1f53.wav?sig=abc123",
  "mime": "audio/wav",
  "duration_ms": 105000,
  "expires_at": "2026-06-27T10:36:11Z"
}

If the recording is still being captured: status: pending, HTTP 202, retry_after_seconds tells you when to poll again.

The presigned URL is valid for 15 minutes. Re-fetch to get a fresh one. Recording retention follows your storage tier: the free 5 GB tier includes the default retention window. Assets pruned by retention policy return status: expired and HTTP 410.

Inbound call flow

The wallet deduction happens in the worker after the call.completed event. The cost_minor field on the call record is populated once the debit is confirmed. There is no separate billing delay; it is typically seconds after the call ends.

Create a Sautikit workspace and claim a phone number.
Top up over M-Pesa: KES billing, no card.
Point the number's voice_url at your voice-actions endpoint and place a test call.

Start with Sautikit → · See pricing → · Need SMS, WhatsApp & an agent desk? Helloduty →