Stripe Webhooks Done Right: Production Architecture

Stripe retries webhooks for 72 hours. Naive handlers duplicate orders. Here's the production architecture: idempotency, BullMQ queues, signature verification.

Stripe
Next.js
TypeScript
SaaS
BullMQ

Stripe sent one webhook. Your database has three orders. What happened?

This is not a hypothetical. I've seen it in production — on vatnode.dev, on pikkuna.fi, on pi-pi.ee. The Stripe webhook hits your endpoint, your server takes 800ms to respond, Stripe times out and queues a retry. Meanwhile your server did complete the work. Now you have a duplicate. Then another one arrives 30 minutes later.

The naive implementation — parse the JSON, run your business logic, return 200 — fails in ways that only show up under real traffic. Here's what production actually looks like.

Why the Naive Implementation Breaks

Stripe's retry policy is more aggressive than most developers expect. If your endpoint returns anything other than a 2xx status code, or takes longer than 30 seconds to respond, Stripe retries. The schedule: immediately, then at 5 minutes, 30 minutes, 2 hours, 5 hours, 10 hours, and so on — up to 72 hours and roughly 15–18 total attempts.

That means if your server returned 500 at 9 AM on Monday due to a database hiccup, Stripe will still be trying to deliver the same event on Tuesday morning. If your server is back up, it will process the event — possibly creating a duplicate subscription, a duplicate order, or sending a duplicate email to your customer.

The naive handler looks like this:

// app/api/webhooks/stripe/route.ts — DON'T do this
export async function POST(request: Request) {
  const body = await request.json(); // Wrong: parsed body won't verify
  const event = body as Stripe.Event;

  if (event.type === "payment_intent.succeeded") {
    await createOrder(event.data.object); // No idempotency check
  }

  return new Response("ok", { status: 200 });
}

Three problems here: no signature verification, no protection against replay attacks, and synchronous processing that blocks the response while your business logic runs. If createOrder takes 3 seconds and Stripe's timeout is strict, you'll get retries even when the order was created successfully.

Idempotency — The Foundation

Before any queue, before any worker, you need idempotency: the guarantee that processing the same event twice produces the same result as processing it once.

Stripe makes this straightforward — every event has a unique id field (e.g., evt_1OqXyz...). Store this ID when you process the event. On subsequent attempts, check the store first and short-circuit.

I use PostgreSQL for durable idempotency keys and Redis for a fast pre-check layer. Here's the Drizzle ORM schema:

// packages/db/schema/webhook-events.ts
import { pgTable, text, timestamp, jsonb } from "drizzle-orm/pg-core";

export const webhookEvents = pgTable("webhook_events", {
  id: text("id").primaryKey(), // Stripe event ID — evt_1OqXyz...
  type: text("type").notNull(), // payment_intent.succeeded, etc.
  processedAt: timestamp("processed_at").notNull().defaultNow(),
  payload: jsonb("payload").notNull(),
});

And the idempotency check — run this before any business logic:

// lib/webhook-idempotency.ts
import { db } from "@/packages/db";
import { webhookEvents } from "@/packages/db/schema/webhook-events";
import { eq } from "drizzle-orm";
import { redis } from "@/lib/redis"; // ioredis instance

export async function isAlreadyProcessed(eventId: string): Promise<boolean> {
  // Fast path: Redis check first (sub-millisecond)
  const cached = await redis.get(`webhook:processed:${eventId}`);
  if (cached) return true;

  // Slow path: database check (handles Redis eviction or restarts)
  const existing = await db
    .select({ id: webhookEvents.id })
    .from(webhookEvents)
    .where(eq(webhookEvents.id, eventId))
    .limit(1);

  if (existing.length > 0) {
    // Restore to Redis cache so future checks are fast
    await redis.set(`webhook:processed:${eventId}`, "1", "EX", 86400 * 7);
    return true;
  }

  return false;
}

export async function markAsProcessed(
  eventId: string,
  type: string,
  payload: unknown
): Promise<void> {
  // Write to DB first — this is the source of truth
  await db.insert(webhookEvents).values({
    id: eventId,
    type,
    payload,
  });

  // Then cache in Redis for fast future lookups
  await redis.set(`webhook:processed:${eventId}`, "1", "EX", 86400 * 7);
}

Signature Verification in Next.js App Router

Here's the subtle thing that breaks almost every Next.js webhook tutorial: request.json() returns a parsed object. Stripe's signature verification requires the raw bytes of the original request body. Once parsed, the signature check will always fail.

In the Pages Router, you'd disable bodyParser. In the App Router, you use request.arrayBuffer():

// app/api/webhooks/stripe/route.ts
import Stripe from "stripe";
import { NextResponse } from "next/server";
import { isAlreadyProcessed } from "@/lib/webhook-idempotency";
import { webhookQueue } from "@/lib/webhook-queue";

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!, {
  apiVersion: "2025-01-27.acacia",
});

const webhookSecret = process.env.STRIPE_WEBHOOK_SECRET!;

export async function POST(request: Request) {
  // arrayBuffer() gives us raw bytes — required for signature verification
  const rawBody = await request.arrayBuffer();
  const signature = request.headers.get("stripe-signature");

  if (!signature) {
    return NextResponse.json({ error: "Missing signature" }, { status: 400 });
  }

  let event: Stripe.Event;

  try {
    // Convert ArrayBuffer to Buffer for the Stripe SDK
    event = stripe.webhooks.constructEvent(Buffer.from(rawBody), signature, webhookSecret);
  } catch (err) {
    console.error("Webhook signature verification failed:", err);
    return NextResponse.json({ error: "Invalid signature" }, { status: 400 });
  }

  // Check idempotency before doing anything else
  const alreadyProcessed = await isAlreadyProcessed(event.id);
  if (alreadyProcessed) {
    // Return 200 so Stripe stops retrying — this is intentional
    return NextResponse.json({ received: true, duplicate: true });
  }

  // Enqueue the event — don't process synchronously
  await webhookQueue.add(
    event.type,
    { event },
    {
      jobId: event.id, // BullMQ deduplicates by jobId within the active window
      attempts: 5,
      backoff: { type: "exponential", delay: 5000 },
    }
  );

  // Return immediately — Stripe considers this success
  return NextResponse.json({ received: true });
}

Two things to notice. First, the idempotency check runs before enqueuing — so if Stripe retries and the job is still in the queue (not yet processed), you return 200 and BullMQ's jobId deduplication handles the rest. Second, the endpoint returns immediately after enqueuing. This response time is typically under 50ms, well within Stripe's timeout window.

BullMQ Queue Architecture

Processing webhooks synchronously in the HTTP handler means your response time is tied to every downstream API call — CRM updates, email sends, database writes. One slow dependency and you're getting retries.

The right architecture: webhook endpoint enqueues, a separate worker processes.

// lib/webhook-queue.ts
import { Queue } from "bullmq";
import { redis } from "@/lib/redis";

export const webhookQueue = new Queue("stripe-webhooks", {
  connection: redis,
  defaultJobOptions: {
    attempts: 5,
    backoff: {
      type: "exponential",
      delay: 5000, // 5s, 10s, 20s, 40s, 80s
    },
    removeOnComplete: { count: 1000 }, // Keep last 1000 for debugging
    removeOnFail: false, // Keep failed jobs in DLQ for inspection
  },
});
// workers/stripe-webhook.worker.ts
import { Worker, Job } from "bullmq";
import Stripe from "stripe";
import { redis } from "@/lib/redis";
import { isAlreadyProcessed, markAsProcessed } from "@/lib/webhook-idempotency";
import { handlePaymentSucceeded } from "./handlers/payment-succeeded";
import { handleSubscriptionDeleted } from "./handlers/subscription-deleted";
import { handleInvoicePaymentFailed } from "./handlers/invoice-payment-failed";

const worker = new Worker(
  "stripe-webhooks",
  async (job: Job<{ event: Stripe.Event }>) => {
    const { event } = job.data;

    // Double-check idempotency inside the worker — edge case: worker restart mid-job
    const alreadyProcessed = await isAlreadyProcessed(event.id);
    if (alreadyProcessed) {
      return { skipped: true, reason: "duplicate" };
    }

    switch (event.type) {
      case "payment_intent.succeeded":
        await handlePaymentSucceeded(event.data.object as Stripe.PaymentIntent);
        break;

      case "customer.subscription.deleted":
        await handleSubscriptionDeleted(event.data.object as Stripe.Subscription);
        break;

      case "invoice.payment_failed":
        await handleInvoicePaymentFailed(event.data.object as Stripe.Invoice);
        break;

      default:
        console.log(`Unhandled webhook type: ${event.type}`);
        return { skipped: true, reason: "unhandled_type" };
    }

    // Mark processed only after the handler succeeds
    // If the handler throws, BullMQ retries the job — we don't mark it done
    await markAsProcessed(event.id, event.type, event.data.object);

    return { processed: true };
  },
  {
    connection: redis,
    concurrency: 5,
  }
);

worker.on("failed", (job, err) => {
  // After all retries exhausted, alert your team
  console.error(`Webhook job ${job?.id} failed permanently:`, err);
});

The dead letter queue behavior is built into BullMQ — failed jobs (after all retries) stay in the failed state. I keep a Telegram alert on the failed event so I know immediately when something needs manual intervention.

Handling the Key Events

payment_intent.succeeded — the most critical handler. Wrap it in a database transaction so partial writes don't leave inconsistent state:

// workers/handlers/payment-succeeded.ts
import Stripe from "stripe";
import { db } from "@/packages/db";
import { orders, users } from "@/packages/db/schema";
import { eq } from "drizzle-orm";

export async function handlePaymentSucceeded(paymentIntent: Stripe.PaymentIntent): Promise<void> {
  const customerId = paymentIntent.customer as string;
  const metadata = paymentIntent.metadata;

  await db.transaction(async (tx) => {
    await tx.insert(orders).values({
      stripePaymentIntentId: paymentIntent.id,
      customerId,
      amount: paymentIntent.amount,
      currency: paymentIntent.currency,
      status: "paid",
    });

    if (metadata.plan) {
      await tx
        .update(users)
        .set({ plan: metadata.plan, planActivatedAt: new Date() })
        .where(eq(users.stripeCustomerId, customerId));
    }
  });

  // Send confirmation email outside the transaction — failure here
  // doesn't roll back the order; it just retries the email separately
  await sendOrderConfirmation({ paymentIntentId: paymentIntent.id });
}

customer.subscription.deleted — downgrade the user, don't delete their data. Keep at least 30 days of data post-cancellation. Stripe fires this for both immediate cancellations and end-of-period ones — check cancel_at_period_end to differentiate.

invoice.payment_failed — send a dunning email, don't immediately revoke access. Stripe's Smart Retries will attempt the charge again. Give the customer a grace period to update their payment method.

Testing with Stripe CLI

# Forward all events to your local Next.js dev server
stripe listen --forward-to localhost:3000/api/webhooks/stripe

# Trigger specific events
stripe trigger payment_intent.succeeded
stripe trigger customer.subscription.deleted
stripe trigger invoice.payment_failed

For testing idempotency: copy the event ID from the CLI output and call your endpoint twice with the same payload — the second call should return { received: true, duplicate: true } within milliseconds.

Gotchas That Cost Me Real Time

request.json() silently breaks signature verification. The raw body and the serialized JSON aren't byte-for-byte identical. Always use request.arrayBuffer(). This is the most common issue I see in Next.js webhook implementations.

BullMQ jobId deduplication only covers active jobs. If a job is completed or failed, BullMQ will accept a new job with the same ID. That's why the database idempotency check is still necessary — it covers retries arriving weeks after the original was processed.

STRIPE_WEBHOOK_SECRET differs between environments. The CLI secret (whsec_... with CLI prefix) is different from the dashboard secret. Use separate environment variables for each environment.

Never trust event.data.object amounts without checking currency. EUR amounts are in cents (100 = €1.00). JPY has no minor unit (100 = ¥100). Always pair the amount with the currency field when storing.

What This Looks Like in Production

On vatnode.dev, webhooks hit the endpoint and return within 40–60ms. The BullMQ worker processes the actual subscription logic asynchronously, with 5 concurrent workers handling bursts. Zero duplicate subscriptions created since launch.

On pikkuna.fi, the same architecture drives the full order pipeline — Stripe fires the webhook, the worker triggers Zoho CRM, PostNord shipment creation, Netvisor invoice, and Mailgun confirmation email in sequence. The webhook endpoint returned 200 within 50ms; the full chain completes in under 2 minutes.


If you're building a SaaS or e-commerce platform with Stripe, you'll hit exactly these problems — usually at the worst moment, like during a product launch or after a server restart.

I've built reliable Stripe integrations across several production systems, from subscription SaaS to high-volume international e-commerce. If you need a senior developer who can own the payment infrastructure end-to-end — get in touch. I'm available for freelance projects and long-term engagements.


Related projects: Pikkuna E-commerce Platform — full order pipeline with Stripe, Zoho CRM, and PostNord integration. Vatnode VAT validation SaaS — subscription billing where this webhook architecture is in production.

Iurii Rogulia

Iurii Rogulia

Senior Full-Stack Developer | Python, React, TypeScript, SaaS, APIs

Senior full-stack developer based in Finland. I write about Python, React, TypeScript, and real-world software engineering.