Skip to main content
Field Guides

Best practices for managing webhook event streams

Build reliable, load-tested webhook consumers that are resilient to transient failures and scale with your application.

Last Updated:

Guide

LiveKit webhooks let your backend receive near-realtime notifications when rooms, participants, and tracks change (and for ingress/egress lifecycle events). They're ideal for triggering application logic, maintaining a "room state" model outside of LiveKit, auditing, billing, or kicking off downstream jobs.

This guide focuses on building a reliable, load-tested webhook consumer that is resilient to transient failures.

With LiveKit Cloud, webhooks can be configured in the Settings section of your project's dashboard, see Settings->Webhooks.

For Egress, extra webhooks can also be configured inside Egress requests.

Understand what LiveKit sends

Before building your consumer, it helps to know how events are formatted and what to expect.

Payload format and headers

  • LiveKit sends HTTP POST requests with a JSON-encoded WebhookEvent in the body.
  • Requests use Content-Type: application/webhook+json (make sure your framework accepts it).
  • Requests include an Authorization header containing a signed JWT with a SHA-256 hash of the payload, used to validate authenticity/integrity.

Event types you'll see

Common webhook events include:

  • room_started, room_finished
  • participant_joined, participant_left, participant_connection_aborted
  • track_published, track_unpublished
  • egress_started, egress_updated, egress_ended
  • ingress_started, ingress_ended

Each event includes:

  • id (UUID) and createdAt (UNIX seconds).

Delivery, retries, and what that implies for your design

LiveKit webhooks are push-based HTTP requests, so there are no guarantees of delivery.

LiveKit mitigates transient failures by retrying delivery multiple times and preserves ordering when events queue up—newer events won't be delivered ahead of older ones until the older ones are delivered or abandoned.

This has two core implications:

  1. Assume at least one delivery. You may receive duplicates. Your consumer should be idempotent.
  2. Don't block the webhook request. If your endpoint is slow, you'll increase retries, amplify your load, and fall behind.

The golden rule: ACK fast, process async

Keeping the request path fast and moving work off the critical path is the key to reliable webhook handling.

What not to do in the webhook request path

Avoid doing these synchronously before returning 2xx:

  • Writing to your primary database (especially if it enforces constraints or triggers heavy indexes)
  • Calling other services (payment, CRM, provisioning, LLM jobs, etc.)
  • Running "full validation" that could fail on unexpected new fields
  • Performing expensive enrichment (e.g., looking up customer/account metadata)

If any of that work fails and you return a non-2xx status code, LiveKit will retry the webhook, potentially causing duplicates—and if you've already partially processed the event, you can end up with inconsistent downstream state.

What to do instead

In the request path:

  1. Verify the webhook signature/authenticity.
  2. Minimal schema sanity checks (e.g., "has event and id").
  3. Enqueue the raw payload (or store it durably).
  4. Return 2xx immediately.

After the response:

  • A worker processes the queue, performing validation, persistence, side effects, and fan-out.

The reference receiver validators require the raw POST body (not a parsed JSON object). For example, in Express, you must use express.raw({ type: 'application/webhook+json' }) so the signature validation works.

Idempotency and deduping

LiveKit includes a unique ID for each webhook event. Use it as your primary dedupe key:

  • Maintain a "seen events" store (a database table with a unique constraint, etc.).
  • If an event id is already processed, treat the delivery as a no-op and continue returning 2xx.

This protects you from:

  • Retries due to transient network errors
  • Repeated deliveries caused by your own slowdowns
  • Worker restarts that replay messages from a queue

Store raw first; be flexible about fields

LiveKit may expand event payloads over time (new event types, new fields). Your consumer should:

  • Parse unknown fields safely
  • Prefer storing the raw JSON (or a minimal normalized subset plus raw)
  • Version your internal schema so you can reprocess historical events if needed

Even within existing events, some sections are intentionally partial. For example, for track publish/unpublish events, the docs note that only sid, identity, and name are sent within the Room and Participant objects.

So downstream consumers should not assume every object is "fully hydrated."

Monitoring and operational guardrails

At minimum, monitor:

  • Request rate, 2xx vs non-2xx
  • p95/p99 latency of the webhook endpoint
  • Queue depth/consumer lag
  • Worker error rate and DLQ (dead-letter queue) growth
  • Dedup hit rate (spikes often indicate retries/backpressure)

Alert on patterns like:

  • Sustained non-2xx responses
  • Rising latency + rising retries symptoms (often correlated with queue depth)
  • Processing lag exceeding your business tolerance (e.g., billing/automation delayed)

Because delivery is best-effort and retries are finite, treat webhook processing as a production system with on-call runbooks.

Suggested reference architecture (simple and robust)

A minimal, production-ready setup looks like this:

Webhook endpoint

  • Raw-body capture + signature verification
  • Enqueue raw payload + headers + received timestamp
  • Return 2xx

Queue

  • SQS / PubSub / Kafka / NATS / Redis streams (pick your standard)
  • Optional DLQ for poison messages

Worker

  • Dedup by event id
  • Validate/normalize
  • Update state store (room lifecycle, participant sessions, track metrics)
  • Trigger downstream actions (jobs, billing, analytics)