[1,132 words, 6 minute read time]
False positives—uptime monitor false alarms—are the fastest way to make monitoring useless. If your team is getting “DOWN” alerts when the site is fine, you’ll eventually do the worst possible thing: ignore alerts.
The good news: most false alerts are fixable with 5 settings. Once you tune them, you can reduce noise dramatically without losing real incident detection.
This guide explains the common causes of false positives in uptime monitoring, the configuration changes that fix them, and a practical troubleshooting checklist you can use every time an alert looks suspicious.
If you’re not sure how to interpret status codes and redirect behavior, start here: HTTP monitoring explained.
What is a “false positive” in uptime monitoring?
A false positive is any alert that says:
- “The site is down”
when, in reality: - users can still access the site normally, or
- the issue is limited to the monitor (probe/network/tool config), not your site.
False positives create alert fatigue, which leads to slower responses and missed real incidents.
The 5 settings that fix most false alerts
If you want the shortest path to fewer false alarms, start here.
1) Increase timeout (within reason)
If your timeout is too aggressive, a brief slowdown becomes “down.”
Recommended starting point:
- Timeout: ~10 seconds for most websites
Raise it if you regularly see:
- timeouts during known peak traffic
- slow backend calls
- heavy pages (temporarily)
Don’t set it so high that you delay detection of true outages.
2) Add retries (don’t alert on the first failure)
Many false positives are brief network blips. Retries solve that.
Recommended starting point:
- Retries: 2 (or require 2–3 consecutive failures)
This converts “one hiccup” into “confirmed problem.”
3) Use confirmation logic (especially with multi-region)
If your tool supports it, confirm downtime from a second check or region before alerting.
Recommended starting point:
- Alert only after 2 probes agree, or
- alert only after failures persist across multiple checks
This is the biggest “noise reducer” for teams with broad audiences.
More on region strategy: multi-location monitoring.
4) Follow redirects (or monitor the final canonical URL)
Redirect chains and loops can cause false downtime:
- monitor hits a redirect loop → timeout
- monitor expects 200 but receives 301/302 and flags it incorrectly
- monitor targets HTTP when site forces HTTPS
Fix:
- monitor the final canonical URL (usually HTTPS)
- ensure redirects are followed (or reduce redirect hops)
5) Add a keyword check (to avoid “200 but wrong content”)
A classic false positive (or worse: a false negative) happens when:
- the server returns 200 OK, but the page is a maintenance page, bot-block page, login page, or cached error page.
A keyword check validates that the correct page content loaded.
Recommended starting point:
- Add one keyword check to your most important page (pricing, booking, login, checkout load)
The main causes of false positives (and what to do)
Cause 1: Transient network issues (the “blip” problem)
Symptoms:
- one-off timeouts
- single failed check then recovery
- failures only in one region
Fix:
- retries + confirmation logic
- don’t alert on a single failure
Cause 2: WAF/bot protection blocks (403/429)
Symptoms:
- monitor shows 403 Forbidden or 429 Too Many Requests
- real users can load the site normally
- failures may be region-specific
Fix options:
- allowlist monitoring IP ranges (if supported)
- relax WAF rules for monitoring probes
- reduce check frequency if you’re triggering rate limits
- add keyword checks (sometimes WAF returns a block page with 200)
Cause 3: TLS/SSL handshake issues
Symptoms:
- monitor reports SSL error, handshake failure, cert issues
- site works for you in a browser (until it doesn’t)
Common causes:
- expired certificate
- incomplete certificate chain
- hostname mismatch
- older clients rejected by TLS configuration
Fix:
- enable SSL monitoring (if available)
- renew/auto-renew certs
- verify full chain and correct hostname
Cause 4: Redirect loops or long redirect chains
Symptoms:
- “too many redirects”
- timeouts
- flapping between up/down
Fix:
- monitor the canonical destination URL
- simplify redirect rules
- avoid redirect loops involving trailing slashes, www/non-www, HTTP→HTTPS
For a deeper breakdown, see HTTP monitoring explained.
Region strategy: how to reduce false alarms without missing real outages
Regions are a double-edged sword:
- More regions can reveal real regional outages
- But alerting on any single-region failure can increase noise
Recommended approach (balanced)
- Use 2 regions for important services
- Alert only when 2 regions agree, or when failures persist (e.g., 2–3 consecutive checks)
- Treat single-region failures as “degraded/regional anomaly” unless your users heavily depend on that region
If your audience is global, multi-location monitoring is essential—but it must be paired with confirmation logic. See: multi-location monitoring.
Keyword checks: choosing the right keyword (so it reduces noise)
Keyword checks only help if the keyword is stable.
Good keywords
- unique to the page
- present on every successful load
- not tied to dynamic content
Examples:
- a unique H1 (“Pricing”, “Checkout”, “Welcome back”)
- your brand name + a page-specific phrase
- a stable UI label (“Add to cart”, “Sign in”)
Bad keywords
- rotating promo text
- dates/times
- personalized names
- dynamic prices
- generic words like “Home” or “Welcome”
Recommended settings list (copy/paste defaults)
Start here for most websites:
- Check type: HTTP(s) for homepage + keyword check for key page
- Interval: 5 minutes
- Timeout: 10 seconds
- Retries: 2
- Redirects: follow redirects (or monitor canonical URL)
- Regions: 2 for critical pages; 1 for low-priority
- Alerting: alert only on confirmed failures (no single blip paging)
Then tighten intervals (1 minute) only for revenue-critical pages and only when you have noise under control.
“If 403 then…” table (fast diagnosis for common false alerts)
| If your monitor shows… | Likely cause | Quick fix |
|---|---|---|
| 403 Forbidden | WAF/bot protection blocking probe | Allowlist monitor IPs; adjust WAF rules; add keyword check |
| 429 Too Many Requests | Rate limiting (monitor frequency too high or WAF threshold) | Reduce frequency; adjust WAF/rate limit; confirm with retries |
| SSL/TLS error | Cert expired/mismatch/chain issue | Fix cert + chain; enable SSL monitoring |
| Timeout | Too-short timeout, transient network, overloaded origin | Increase timeout, add retries, check origin load |
| 301/302 loop | Redirect misconfig (www/non-www, slash rules) | Monitor canonical URL; fix redirects; follow redirects |
| 200 OK but “down” | Wrong content page (maintenance/block/login) | Add keyword validation; choose stable keyword |
Troubleshooting checklist (use this every time)
When an alert seems wrong, run this:
- Is it confirmed? (retries, consecutive failures)
- Is it regional? (one location or multiple?)
- What status code/error type is it? (403/429/5xx/timeout/SSL)
- Does the URL redirect? (loop/chain/unexpected destination)
- Is WAF/bot protection involved? (403/429 or block pages)
- Is the content correct? (keyword check passes?)
- Did anything change recently? (deploy, DNS, CDN, cert renewal)
If your alerting strategy needs cleanup beyond false positives, see alerts best practices.
Reduce alerts by configuring retries + a keyword check (CTA)
If you want the fastest win today, do two things:
- Enable retries/confirmation so one blip can’t page you
- Add one keyword check to your most important page to prevent “wrong content” alerts
CTA: Reduce alerts by configuring retries + a keyword check—it’s the highest-leverage way to eliminate false positives without losing real downtime detection.