Never miss an event: how Sautikit webhook retry semantics survive failures

Sautikit delivers webhooks with an at-least-once guarantee using exponential backoff with full jitter across eight retry attempts over approximately seven days. A missed delivery means a missed business event: a call record never written, a confirmation never sent. This post explains the exact retry schedule, what your endpoint must return to stop retries, and how to build an idempotent handler using the event_id ULID field.

Sautikit's webhook delivery system provides at-least-once delivery. This means every event will be delivered a minimum of one time, but under failure conditions (network retries, infrastructure failover, your endpoint returning a temporary error) the same event may be delivered more than once.

Exactly-once delivery would require a distributed two-phase commit across Sautikit's infrastructure and your system, which imposes unacceptable complexity and latency on the hot path. The correct solution for event consumers is not to demand exactly-once delivery, but to design idempotent handlers: handlers that produce the same outcome whether they receive an event once or ten times.

This is a well-understood pattern. The key insight is that your webhook handler must not use arrival count as state. Treat each delivery as a proposal; your system decides whether to act on it based on whether it has already processed that specific event_id.

The retry schedule across all eight delivery attempts is:

Attempt	Nominal interval	With full jitter
1 (initial)	immediate	n/a
2	30 seconds	0–30s
3	2 minutes	0–120s
4	10 minutes	0–600s
5	30 minutes	0–1 800s
6	2 hours	0–7 200s
7	6 hours	0–21 600s
8	24 hours	0–86 400s
Dead-letter	7 days after attempt 8	n/a

Sautikit uses full jitter (each retry interval is a uniformly random value in [0, base_interval]) rather than equal jitter (which would be [base_interval/2, base_interval]). The distinction matters at scale: equal jitter spreads retries over the top half of the interval window, which still causes thundering-herd spikes when thousands of webhooks fail simultaneously. Full jitter distributes retries across the entire window, halving the expected peak load on your ingress during a shared failure event.

Here is a Go implementation of both strategies so you can see the difference:

package jitter
 
import (
	"math/rand"
	"time"
)
 
// FullJitter returns a uniformly random duration in [0, base].
// Preferred: evenly distributes retries across the full window.
func FullJitter(base time.Duration) time.Duration {
	return time.Duration(rand.Int63n(int64(base)))
}
 
// EqualJitter returns a random duration in [base/2, base].
// Retries cluster in the top half of the window; higher peak load.
func EqualJitter(base time.Duration) time.Duration {
	half := base / 2
	return half + time.Duration(rand.Int63n(int64(half)))
}
 
// RetryDelay returns the delay before the nth retry attempt using full jitter.
// Attempt 1 = first retry, using 30-second base.
func RetryDelay(attempt int) time.Duration {
	bases := []time.Duration{
		30 * time.Second,
		2 * time.Minute,
		10 * time.Minute,
		30 * time.Minute,
		2 * time.Hour,
		6 * time.Hour,
		24 * time.Hour,
	}
	if attempt < 1 || attempt > len(bases) {
		return 0
	}
	return FullJitter(bases[attempt-1])
}

The decision tree is:

Response received?
  YES → Status code in 2xx (200–299)?
    YES → ✓ Delivery confirmed. No retry.
    NO  → 3xx redirect?
      YES → ✗ Not followed. Treated as failure. Retry.
      NO  → 4xx or 5xx?
        YES → ✗ Failure. Retry.
  NO (timeout after 30 seconds) → ✗ Failure. Retry.

The key behaviour to note: 3xx redirects are intentionally not followed. If Sautikit followed redirects, a malicious actor who gained control of your DNS could redirect webhook deliveries to an attacker-controlled endpoint. The signed payload would still arrive at the attacker's server, and even though the HMAC would prevent body tampering, the redirect itself exfiltrates the signed event. By refusing to follow redirects, Sautikit ensures the delivery target cannot be silently changed after webhook registration.

Return 200 OK from your handler to acknowledge delivery. The response body is discarded; Sautikit only looks at the status code.

Every webhook payload includes an event_id field. This is a ULIDv2, a 26-character sortable identifier that is monotonically increasing within the same millisecond. You do not need to understand the ULID spec to use it; you only need to know that it is unique per event, and you can use it as a deduplication key.

The simplest correct implementation is a database unique constraint:

-- PostgreSQL: processed_webhook_events table
CREATE TABLE processed_webhook_events (
  event_id    TEXT        PRIMARY KEY,
  event_type  TEXT        NOT NULL,
  call_id     TEXT,
  processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

In your handler:

import express from "express";
import { Pool } from "pg";
 
const app = express();
app.use(express.json());
 
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
 
app.post("/webhooks/sautikit", async (req, res) => {
  // (signature verification omitted; see the signature verification post)
  const event = req.body;
  const eventId = event.event_id;
  const eventType = event.event_type;
  const callId = event.call_id ?? null;
 
  const result = await pool.query(
    `INSERT INTO processed_webhook_events (event_id, event_type, call_id)
     VALUES ($1, $2, $3)
     ON CONFLICT (event_id) DO NOTHING`,
    [eventId, eventType, callId],
  );
 
  if (result.rowCount === 0) {
    // This event_id was already processed. Return 200 so Sautikit
    // stops retrying. Do NOT return 4xx, which would trigger another retry.
    return res.sendStatus(200);
  }
 
  // Process the event exactly once.
  if (eventType === "call.completed") {
    await handleCallCompleted(event);
  }
 
  return res.sendStatus(200);
});

The INSERT ... ON CONFLICT DO NOTHING is a complete idempotency solution without needing a distributed lock. Because ULIDv2 is monotonically increasing within a millisecond, a simple index on event_id provides O(log n) lookup; no special index type needed.

The critical detail in the handler above: always return 200 for duplicate events, not 409 or 4xx. A 4xx response tells Sautikit the delivery failed and schedules a retry, which would cause you to process the event again, the opposite of what you want.

After eight attempts spanning approximately seven days, Sautikit moves the event to a dead-letter state. At that point, the event will not be automatically retried. Dead-lettered events are visible in the Sautikit dashboard under Webhooks → Failed Events.

You should build an alerting system that detects when events enter dead-letter state. A practical pattern is to configure a low-balance alert on your wallet (so you notice if your service stopped processing top-up events) and to monitor your processed_webhook_events table for gaps in the expected event sequence.

For truly critical events (call completions on voice OTP flows, for example), consider a secondary polling loop that checks GET /v1/calls?status=completed&created_after=<ts> as a reconciliation mechanism. If your webhook handler is down for 7 days, polling is the safety net.

Every delivery attempt includes the X-Sautikit-Delivery-Attempt header with the attempt number (1 for first delivery, 2 for first retry, etc.). Log this on every request:

func handleWebhook(w http.ResponseWriter, r *http.Request) {
	attempt := r.Header.Get("X-Sautikit-Delivery-Attempt")
	logger := slog.With(
		"attempt",    attempt,
		"event_type", extractEventType(r),
		"path",       r.URL.Path,
	)
 
	if attempt != "1" {
		logger.Warn("webhook retry received; check for previous processing failures")
	}
 
	// ... rest of handler
}

If you see attempt numbers above 1 in production, it means a previous delivery failed. Common causes: handler threw an unhandled exception returning 500, handler timed out (responses must arrive within 30 seconds), or the database was temporarily unavailable. Fix the root cause before retries exhaust.

Create a Sautikit workspace and claim a phone number.
Top up over M-Pesa: KES billing, no card.
Register a webhook endpoint and log the X-Sautikit-Delivery-Attempt header to catch retries early.

Start with Sautikit → · See pricing → · Need SMS, WhatsApp & an agent desk? Helloduty →