Documentation

Setup & usage guide

Contents

HTTP monitors
Heartbeat monitors
Value tracking
Alert channels
Alert chains
Acknowledging alerts
Status pages
Embedding a status page
Team & roles
On-call schedules
Webhook payload
Billing & plans

HTTP monitors

An HTTP monitor checks a URL on a schedule. We send a request, look at the status code (and optionally a keyword in the response body), and mark the monitor up or down.

Creating a monitor

Go to Monitors → New monitor.
Pick HTTP as the type.
Enter the URL you want to check.
Choose a check interval.
(Optional) Set a request method, custom headers, expected status code, body keyword, or body assertion.
Save.

Settings

Field	What it does
URL	The endpoint we probe. Must be publicly reachable.
Method	GET, POST, PUT, HEAD. Default GET.
Check interval	How often we probe. Lower = faster detection. Plan-dependent.
Timeout	How long we wait before marking the check failed. Default 10s.
Expected status	The HTTP status that means "up". Default 200.
Keyword	(Optional) String we look for in the response body. If missing, the check fails even with a 200.
Retry threshold	How many consecutive failures before we open an incident. Default 2; protects against transient blips.
SSL check	If on, we also check the certificate and warn before expiry.

Why your monitor flapped without alerting

If a check fails once but the next check passes, no incident opens. The retry threshold exists exactly to prevent paging you for one-off blips. Set retry threshold to 1 if you want every failure to alert.

Heartbeat monitors

A heartbeat monitor inverts the model. Instead of us probing your service, your service pings us. If we don't hear from you within the expected window, we open an incident.

This is the right tool for cron jobs, batch workers, scheduled scripts, and anything that runs in the background where you want to know when it stops running.

Creating a heartbeat monitor

Go to Monitors → New monitor.
Pick Heartbeat as the type.
Set the Expected interval: how often your job is supposed to ping (e.g. 3600 seconds for an hourly cron).
Set the Grace period: how late a ping can be before we mark you down (e.g. 300 seconds).
Save. You'll get a unique URL after creation.

Pinging the heartbeat URL

After saving, the monitor's detail page shows your heartbeat URL:

https://api.failover.io/heartbeat/<monitor-uuid>

Send a POST request to that URL each time your job runs successfully:

curl -fsS -X POST https://api.failover.io/heartbeat/<monitor-uuid>

In a crontab:

0 * * * * /usr/local/bin/my-job.sh && curl -fsS -X POST https://api.failover.io/heartbeat/<monitor-uuid>

The && ensures we only get pinged when the job exits successfully.

Treat the URL as a credential. Anyone with the URL can ping the heartbeat. Don't commit it to public repos.

Expected interval vs grace period

If you say "expected interval: 3600s, grace: 300s", we mark you down only when we haven't heard from you for 3600 + 300 = 3900 seconds. The grace period exists for jobs that run a bit late or take varying amounts of time. Set grace to 0 for strict on-time enforcement.

Value tracking (blockchain & RPC nodes)

A blockchain node can answer every request with HTTP 200 while its block height has been frozen for an hour. Status checks and even body assertions can't see this, because each check looks at one response in isolation. Value tracking compares checks against each other: you point the monitor at a number in the response that should keep changing, and if it stops changing for too long, the monitor goes down.

Settings

Field	What it does
Track value path	JSON path to the value, e.g. `result` for `eth_blockNumber` or `result.sync_info.latest_block_height` for Tendermint. Hex values like `0x16a2c80` are decoded automatically.
Stall window	Seconds the value may sit unchanged before the check fails. 30 to 86400. Size it to your chain: roughly 60s suits Ethereum's 12-second blocks, while Bitcoin needs 7200s because hour-long gaps between blocks are normal. Leave it empty to record values without ever failing on a stall.

Presets

The create-monitor form has presets for Bitcoin Core, EVM chains (Ethereum, Polygon, BSC, L2s), Solana, Tendermint/Cosmos, Polkadot/Substrate, XRPL, and NEAR. Picking one fills in the request method, the JSON-RPC body, the value path, and a stall window sized to that chain's block cadence. You can edit any of it afterwards.

How failures behave

A stall is an ordinary check failure: it counts toward the retry threshold, opens an incident, and climbs the alert chain like any downtime. The failure reason names the value and how long it's been frozen, e.g. Tracked value stalled: result = 23735424 unchanged for 179s (window 120s). If the response carries a JSON-RPC error object instead of a result, the path won't resolve and the check fails with the node's own error message in the reason, which is how you find out your node is amendment-blocked or stuck syncing rather than just "down."

Two details worth knowing. Any change counts as movement, including a decrease, since a node resyncing from a snapshot legitimately reports a lower height; frozen is the failure, not direction. And each check's observed value is stored, so the monitor's check history shows the height per check and you can watch it freeze during an incident.

Alert channels

A channel is one way we can reach you. Set up your channels in Channels. We support 10 types:

Channel	Notes
Email	Free on every plan.
SMS	Pro+ plans. Tap the acknowledge link in the message to stop the cascade.
Voice call	Pro+ plans. We call you and read the alert. Press 1 to acknowledge.
Webhook	POST to a URL of your choice. See payload format.
Slack	Incoming webhook URL.
Discord	Channel webhook URL.
Telegram	Bot token + chat ID.
Microsoft Teams	Incoming webhook URL.
PagerDuty	Integration key (Events API v2).
ntfy	Topic on ntfy.sh or your self-hosted ntfy server.

Use the Test button on each channel to verify it's wired up before relying on it.

Alert chains

An alert chain is the sequence of channels we try when a monitor opens an incident. The cascade exists because a single channel can fail: email goes to spam, Slack is down, your phone is on silent. Multiple channels in sequence catch what a single channel misses.

How it works

When a monitor opens an incident, we trigger the first step of the chain. If that step isn't acknowledged within its delay, we trigger the next. The cascade continues until either someone acknowledges, or we run out of steps.

Example chain:

Slack: immediate
Email: wait 2 minutes, trigger if not acked
SMS to on-call: wait 5 more minutes, trigger if not acked
Voice call to on-call: wait 5 more minutes, trigger if not acked
PagerDuty: last resort, wait 10 more minutes

If someone acks during step 2, we never proceed to steps 3, 4, or 5. The cascade halts the moment an ack arrives.

Assigning a chain to a monitor

Each monitor can be linked to one alert chain. From the monitor's detail page, choose a chain from the dropdown. You can reuse the same chain across many monitors.

Acknowledging alerts

Acknowledging an alert tells us "I've got this" and stops the cascade.

Email: click the acknowledge link in the email.
SMS: tap the acknowledge link in the message.
Voice call: press 1 on the keypad.
Slack / Discord / Teams: click the acknowledge button in the message.
Dashboard: click Acknowledge on the incident.

Acknowledging stops the cascade for the current incident only. If the same monitor goes down again later, a new incident opens and the chain starts fresh.

Status pages

A status page is a public URL showing the current up/down state of your monitors. Useful for customer-facing transparency during outages.

Creating a status page

Go to Status → New status page.
Give it a name and slug (used in the URL).
Pick which monitors appear on the page.
(Optional) Add a logo, custom title, and accent color.
Save. The page is live at https://status.failover.io/<page-id>/<slug>.

What's shown

For each monitor we show: current status, uptime percentage, and a history bar showing recent check results.

You can have multiple status pages: one for customers, one internal, one per product line. Each page has its own URL and can show a different selection of monitors.

Embedding a status page

Drop your status page into your own website with an iframe. From the Status pages list, click the Embed button to copy the snippet:

<iframe
  src="https://status.failover.io/<page-id>/<slug>"
  style="width: 100%; height: 100%; min-height: 600px; border: 0;">
</iframe>

The status page is responsive and fills its container. For a narrower embed, wrap the iframe in a container with the width you want.

Team & roles

Team plans support multiple users in the same workspace. Invite teammates from Team with one of two roles:

Role	Can do
Member	View monitors, channels, incidents, status pages. Acknowledge incidents. Cannot create, edit, or delete.
Admin	Everything Member can do, plus create / edit / delete monitors, channels, alert chains, status pages. Cannot manage billing or invite other admins.

The workspace owner always has full access: billing, team management, account deletion. Owner is implicit, not a role you assign.

Only the owner can invite admins. Admins can invite members. Invites expire after 7 days. Pending invites count toward your plan's seat limit.

On-call schedules

An on-call schedule is a rotation of teammates who take turns being the alert target. Instead of always paging the same person, the schedule routes alerts to whoever is on duty right now.

How it works

Create a schedule from On-call. Add participants and define the rotation cadence (daily, weekly, custom).
Each participant has a phone number stored on the schedule.
Create a channel of type On-call SMS or On-call Voice pointing at the schedule.
Add that channel to an alert chain like any other channel.

When the chain triggers that step, we look up who's currently on-call and dispatch SMS or voice to that person's phone. The next time the chain triggers, it might be a different person, whoever is on the rota at that moment.

Why use it

For a 3-person team alternating weekly: instead of three separate channels and three chain steps, you have one on-call channel and one chain step. The schedule decides who gets the alert.

Phone numbers are stored per-schedule, not per-user; change someone's phone in one place when their number changes.

Webhook payload

Webhook channels POST a JSON payload to your URL. Example payload for an incident-open event:

{
  "event": "incident.opened",
  "incident_id": "01H...",
  "monitor": {
    "id": "44eac8a1-...",
    "name": "Production API",
    "type": "http",
    "url": "https://api.example.com/health"
  },
  "status": "down",
  "error": "Connection timeout after 5000ms",
  "started_at": "2026-04-28T14:32:11Z",
  "ack_url": "https://api.failover.io/ack/<token>"
}

For incident-resolved:

{
  "event": "incident.resolved",
  "incident_id": "01H...",
  "monitor": { ... },
  "status": "up",
  "started_at": "2026-04-28T14:32:11Z",
  "resolved_at": "2026-04-28T14:38:42Z",
  "duration_seconds": 391
}

We retry webhook delivery on 5xx responses with exponential backoff. We don't retry on 4xx: if your endpoint returns 400, we treat it as your decision to refuse the alert.

Billing & plans

Plans differ on number of monitors, check interval, channel types available, and team seats. See pricing for the current breakdown.

Changing plans

Go to Billing and click Change plan. Upgrades take effect immediately and prorate. Downgrades take effect at the end of the current billing cycle.

Cancelling

Cancel from the customer portal under Billing. Your subscription stays active until the end of the current period, and monitors keep running until then. After that, monitors stop and alerts pause, but your data and configuration stay intact in case you decide to come back.

Trouble with a payment?

If a payment fails, we'll retry it for one week and email you. If it still doesn't go through, your subscription is cancelled. Your monitors stop running and you'll lose access to alerts, but your data and configuration are kept; pay the next month and everything resumes where you left off.

Still stuck?

Email or use the contact form.