Guides / Monitoring
Monitoring

How to Monitor Stripe Webhook Health

Last updated June 2026 · ~9 min read

Stripe webhooks are the quietest part of your payment stack, right until they're the most expensive. When your handler stops working, nothing visibly breaks. A bad deploy, a database refusing writes, an exception thrown on every event: checkouts still succeed, Stripe still charges cards, and the events that tell your system what happened never get processed. Subscriptions don't activate. Cancellations don't propagate. Fulfilment never fires. You find out when a customer emails asking why they paid and got nothing.

Stripe does try to warn you. It retries a failed delivery with exponential backoff for up to three days, emails the account owner once it has seen a run of failures, and can disable the endpoint outright at the end of that window. The emails are real and worth keeping enabled, but they're not the early-warning system most teams assume they are. They land after several failed retries, not on the first miss; they only catch failures Stripe can see from its side; and on a Friday night they often arrive Saturday morning, hours into Stripe's three-day retry clock. A handler that returns 200 OK while silently skipping events looks perfectly healthy to Stripe, and you won't hear a thing.

This guide covers how to catch a dead webhook pipeline in minutes rather than hours using heartbeat monitoring, the one design decision that makes or breaks the approach, an honest account of what it catches and what it misses, and a Node setup you can paste in.

Why webhook failures are so easy to miss

A webhook handler fails differently from the rest of your app. Break a web page and you get angry users plus a spike in your error logs. Break a webhook handler and you get nothing visible. The traffic is server-to-server, the failures happen in the background, and Stripe's retries smooth over short outages so well that a long one doesn't feel any different until the damage is done.

You've probably read the standard advice for a solid handler: verify the signature, return a 2xx inside Stripe's roughly 10-second timeout, then process asynchronously, and dedupe on event ID because delivery is at-least-once. All correct, all worth doing. None of it tells you when the handler has quietly stopped doing its job. For that you need something outside your app that expects to hear from the handler on a schedule and complains the moment it goes quiet.

The core idea: a heartbeat from inside the handler

A heartbeat monitor runs a normal uptime check backwards. Instead of something pinging your server to ask "are you up?", your code pings a monitoring URL to say "I'm alive and doing my job." The monitor knows roughly how often it should hear from you, and if a ping doesn't land inside the expected window plus a grace period, it raises an incident.

For Stripe webhooks the move is simple: once your handler finishes processing an event, fire a ping at a heartbeat URL. While events flow and get processed, the pings keep arriving and the monitor stays green. The instant processing stops, whether the handler is down, the route is 404ing, a database write is throwing, or Stripe has disabled the endpoint, the pings stop too, and your grace period later the monitor alerts.

One detail decides whether this works: ping after successful processing, never on receipt. Ping the moment a request arrives and all you've confirmed is that Stripe can reach your URL. A handler that receives every event and then throws on all of them would still look perfectly healthy. Ping at the end of a successful run and a broken handler produces silence, which is the exact signal you're trying to catch.

The one decision that breaks this if you get it wrong: volume

Here's the trap. Webhook traffic is bursty and irregular. A busy storefront might process events every few seconds; a B2B SaaS might go six hours overnight without a single one. Ping only on real events, set a tight expected interval, and a quiet stretch becomes indistinguishable from an outage. You get paged at 4 a.m. because nobody happened to subscribe between midnight and dawn.

Two honest ways to handle this, and the right one depends on what you're actually worried about:

ApproachWhat it provesBest for
Per-event ping, wide intervalEvents are flowing AND being processedEndpoints with a steady baseline of traffic
Scheduled keepalive pingYour service is alive and able to pingLow or irregular-volume endpoints

With the per-event approach, set the expected interval a bit longer than your realistic worst-case gap between events, then add a grace period on top. You'll catch a real outage at the cost of slower detection on quiet endpoints. With the keepalive approach, a small scheduled job in your app pings the heartbeat on a fixed timer no matter what webhook activity there is. Be honest with yourself about what that proves, though: it confirms your process is up and can reach the monitor, not that Stripe events are arriving and getting handled. That's a liveness check, not a delivery check. Plenty of teams run both, a keepalive answering "is the service alive" and application-level logging of processed event IDs answering "are events being handled correctly."

A concrete setup (Node / Express)

First, create a heartbeat monitor and copy its URL. It looks like https://api.failover.io/heartbeat/<uuid>. Know one thing before you wire it up: the endpoint takes POST, not GET. A GET returns a 404; a successful POST returns {"received":true}. That's the Cronitor convention rather than the Healthchecks.io one, so don't expect a browser hit to register a ping.

Then ping it at the end of a successful event handler, fire-and-forget, so a monitoring call can never delay your 200 back to Stripe or break event processing:

// The heartbeat URL from your failover.io monitor const HEARTBEAT = 'https://api.failover.io/heartbeat/<your-uuid>'; app.post('/stripe/webhook', express.raw({ type: 'application/json' }), async (req, res) => { let event; try { event = stripe.webhooks.constructEvent( req.body, req.headers['stripe-signature'], process.env.STRIPE_WEBHOOK_SECRET ); } catch (err) { // bad signature / payload — tell Stripe, do NOT ping return res.status(400).send(`Webhook Error: ${err.message}`); } // Ack fast — Stripe counts a delivery failed after ~10s res.json({ received: true }); try { await handleEvent(event); // your real processing pingHeartbeat(); // only after success } catch (err) { console.error('webhook processing failed', err); // deliberately no ping — a failed run should look like silence } } ); // Fire-and-forget. A monitoring ping must never throw into your handler. function pingHeartbeat() { fetch(HEARTBEAT, { method: 'POST' }).catch(() => {}); }

For the keepalive variant on a low-volume endpoint, drop the per-event call and POST to the same URL from a scheduled job instead, a cron, a serverless schedule, or a setInterval in a long-running process, every few minutes. Set the monitor's expected interval to match that cadence.

What this catches, and what it doesn't

Heartbeat monitoring answers one question, and answers it well: did my webhook pipeline go dark? It catches a handler process that's down, a deploy that broke the route, a database layer throwing on every write, an endpoint Stripe disabled, a server that's become unreachable. Those failures quietly cost you the most, and they're the ones Stripe's own email is slowest to flag, often hours after the first miss.

Be honest about the boundary. A heartbeat does not catch per-event problems: one event type failing while the rest succeed, signature verification breaking for a single source, partial processing, or a logic bug that "succeeds" while doing the wrong thing. For per-event correctness you still want idempotency keys, a dead-letter log of failed events, and Stripe's own delivery dashboard. Treat the heartbeat as the smoke alarm for total failure, not a replacement for handling each event right.

Getting the alert to an actual human

Detection is only half the job. A monitor that spots the outage and then sends one email that lands while you're asleep hasn't really helped; you're still down until morning. What you want is an alert that climbs until someone acknowledges it: notify, wait, and if nobody responds, get louder through another channel or another person, rather than firing once and calling it done.

This matters more for webhooks than for an ordinary site outage, precisely because the failure stays invisible everywhere else. No flood of user complaints backs up the alert. The monitor's escalation is your signal, so it has to be one that reliably reaches a waking human.

How failover.io does this. Create a monitor of type Heartbeat, set an Expected Interval and a Grace Period, then point your handler's POST at the generated URL. If pings stop for longer than interval plus grace, failover.io opens an incident and runs your alert chain, climbing step by step until someone acknowledges, then sends a recovery alert on its own once pings resume. The free plan includes heartbeat monitors, 5 monitors total, and 2-step alert chains, with alerts over email and chat apps like Slack, Discord, and Telegram, so you can wire this up and watch it work at no cost. SMS and voice-call escalation, the steps built to wake someone, start on the Pro plan. You can start free and add them later.

The short version

Stripe will email you when webhooks start failing, and again when it disables the endpoint after three days, but the email lands late and only catches what Stripe can see from its side. To know within minutes that your handler has stopped doing real work, have it ping a heartbeat URL after each successful event and let a monitor alert you when the pings stop. Mind the volume trade-off: per-event pinging fits steady traffic, while a scheduled keepalive fits quiet endpoints, so long as you know which question each one answers. Ping after success, never on receipt. Keep the ping fire-and-forget. Use POST, not GET. And make sure the alert climbs until a human acknowledges, because for webhooks there's no other sign that anything has gone wrong.


Know the moment your webhooks go quiet.

failover.io: heartbeat monitoring with escalation that climbs until someone acknowledges. Free plan includes heartbeats; SMS and voice escalation on Pro.

Start monitoring free →