Your users are already in the browser: the support agent on a dashboard, the customer clicking "call us" on a pricing page. WebRTC lets you place and receive real phone calls from that same tab, no desk phone or dialer app required.
TL;DR
Sautikit exposes a WebRTC/SIP gateway: the browser SDK opens a WebSocket to it and streams G.711/PCMU audio both ways.
Auth is a short-lived WebRTC client token (a compact JWT) minted on your server and handed to the browser; never ship a long-lived API key to the client.
Browser calling fits support desks, click-to-call, and agent softphones; for a full multi-channel desk (SMS/WhatsApp/tickets), bridge to Helloduty.
A browser cannot speak SIP or the PSTN directly. WebRTC bridges that gap: the browser negotiates a media session (SRTP audio) with a gateway, and the gateway translates to and from the telephony network on the other side.
With Sautikit that gateway is the WebRTC/SIP gateway. The browser SDK opens a secure WebSocket to it, authenticates with a token, and negotiates a G.711/PCMU audio stream. From that point the browser behaves like a softphone: it can originate an outbound call to any +E164 number, or register to receive inbound calls routed to your workspace.
Two things make this practical instead of painful:
You never expose your API key in the browser. The client authenticates with a scoped, short-lived token instead.
The gateway handles codecs and NAT traversal. Your app deals with "start call / end call / mute", not RTP.
The golden rule of browser calling: the browser never holds your SAUTIKIT_API_KEY. If it did, anyone with devtools open could drain your prepaid wallet.
Instead you run a tiny endpoint on your own backend. It authenticates the logged-in user (your session, your rules), then asks Sautikit to mint a short-lived WebRTC client token and returns just that token to the browser. The token is a compact JWT scoped to browser calling and expiring in minutes, so a leaked token is low-blast-radius.
Here is the server side as an Express handler. Treat the token request as "your server asks Sautikit for a WebRTC token and returns it". See the guide for the exact request body.
import express from "express";const app = express();// Your own auth middleware decides who is allowed a softphone.app.post("/webrtc-token", requireLoggedInUser, async (req, res) => { const r = await fetch("https://api.sautikit.com/v1/webrtc/tokens", { method: "POST", headers: { Authorization: `Bearer ${process.env.SAUTIKIT_API_KEY}`, "Content-Type": "application/json", }, // Identify which softphone identity this browser acts as. // See /developers/guides/browser-calling-with-webrtc for the // authoritative request shape and claims. body: JSON.stringify({ identity: req.user.id }), }); if (!r.ok) { return res.status(502).json({ error: "token_mint_failed" }); } const { token, expires_at } = await r.json(); // Hand the browser only the short-lived token, never the API key. res.json({ token, expires_at });});app.listen(3000);
On the client, fetch a token from your endpoint and hand it to the Sautikit browser SDK. The SDK opens the WebSocket to the gateway, negotiates audio, and exposes a call object you drive with plain method calls. The exact SDK import and method names are in the guide; the flow looks like this:
// 1. Get a short-lived token from YOUR server.const { token } = await fetch("/webrtc-token", { method: "POST" }).then((r) => r.json(),);// 2. Initialise the softphone with that token.const phone = await Sautikit.createSoftphone({ token });// 3. Outbound: click-to-call a customer.document.querySelector("#call").addEventListener("click", async () => { const call = await phone.connect({ to: "+254700000000" }); call.on("accepted", () => console.log("connected")); call.on("disconnected", () => console.log("call ended")); document .querySelector("#hangup") .addEventListener("click", () => call.disconnect());});// 4. Inbound: answer calls routed to this identity.phone.on("incoming", (call) => { if (confirm(`Incoming call from ${call.from}. Answer?`)) call.accept(); else call.reject();});
Because the token expires, long-lived sessions should re-fetch before it lapses: request a fresh token when expires_at is near and re-initialise. Keep a getUserMedia microphone permission prompt in mind: browsers require a user gesture and a secure (HTTPS) origin for WebRTC audio.
Browser calling shines when the human is already looking at a screen:
Support desks. Agents take and place calls from your dashboard: no separate hardware, no per-seat phone system. Call state lives next to the ticket.
Click-to-call. A "call me" or "call sales" button on your app or marketing site connects a visitor to an agent without either party dialing.
Internal tools. Ops and collections teams calling from an admin panel, with the call bound to the record they're viewing.
When you'd rather not put a softphone in the browser at all (say you just want to trigger a server-orchestrated call), use the Calls API (POST /v1/calls) and control the call with voice actions instead. Browser calling and server-side calling share the same wallet and number pool, so you can mix both.
If your agents need voice plus SMS, WhatsApp, and a shared ticket queue in one pane, that's a full agent desk. Helloduty layers those channels on top of the same Sautikit voice stack.
Do I need my own SIP server to make browser calls?
No. Sautikit runs the WebRTC/SIP gateway. Your browser talks WebRTC to the gateway and the gateway handles the telephony side; you only provide a token endpoint on your backend.
Why a short-lived token instead of my API key?
Anything shipped to the browser is public. A scoped JWT that expires in minutes limits damage if it leaks, while your SAUTIKIT_API_KEY, which can drain your prepaid wallet, stays server-side.
What audio codec does the gateway use?
The WebRTC/SIP gateway negotiates G.711/PCMU audio between the browser and the telephony network. The browser SDK handles the negotiation for you.
Can I receive inbound calls in the browser?
Yes. Register the softphone with an identity, route a claimed number's inbound calls to that identity, and handle the incoming event. Inbound minutes are free (KES 0).
Is there a per-call charge?
Outbound browser calls bill from your prepaid wallet at KES 3.00/min (billed per second from the moment the call connects). Inbound is free. See /pricing for the source of truth.