{"id":5504,"date":"2026-01-07T09:40:20","date_gmt":"2026-01-07T17:40:20","guid":{"rendered":"https:\/\/www.sslshopper.com\/website-monitoring\/?p=5504"},"modified":"2026-01-07T09:40:23","modified_gmt":"2026-01-07T17:40:23","slug":"website-down-incident-response","status":"publish","type":"post","link":"https:\/\/www.sslshopper.com\/website-monitoring\/website-down-incident-response\/","title":{"rendered":"Website Down? What to Do in the First 30 Minutes"},"content":{"rendered":"\n<p><strong><mark style=\"background-color:var(--base)\" class=\"has-inline-color has-contrast-3-color\">[1,271 words, 7 minute read time]<\/mark><\/strong><\/p>\n\n\n\n<p>A \u201csite down\u201d alert triggers adrenaline for a reason: downtime threatens revenue, trust, and your sanity. But the fastest way to make an outage worse is to jump straight into random debugging.<\/p>\n\n\n\n<p><strong>Triage first, diagnose second, fix third.<\/strong><\/p>\n\n\n\n<p>This guide is a practical incident response playbook for small teams, on-call rotations, and solo owners. It tells you exactly what to do in the first 30 minutes, how to identify where the failure lives (DNS vs hosting vs app vs third-party), how to communicate, and how to run a simple post-incident review.<\/p>\n\n\n\n<p>If you\u2019re building the full alerting + response system, this is part of the <strong><a href=\"https:\/\/www.sslshopper.com\/website-monitoring\/downtime-alerts\/\">downtime alerts hub<\/a><\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The first 5 minutes: confirm + scope (don\u2019t skip this)<\/h2>\n\n\n\n<p>Your job in the first five minutes is not to solve the outage. It\u2019s to answer two questions:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Is it real?<\/strong><\/li>\n\n\n\n<li><strong>How big is it?<\/strong><\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">First 5 minutes checklist (copy\/paste)<\/h3>\n\n\n\n<p><strong>Confirm<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Check your monitoring dashboard: was the alert <strong>confirmed<\/strong> (retries\/confirmation), or a single blip?<\/li>\n\n\n\n<li>Verify independently:\n<ul class=\"wp-block-list\">\n<li>load the site from your browser <strong>and<\/strong> a second network (phone hotspot is perfect)<\/li>\n\n\n\n<li>check from another location\/tool if available<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Look at the error type: timeout, 5xx, DNS failure, SSL error, 403\/429, keyword mismatch<\/li>\n<\/ul>\n\n\n\n<p><strong>Scope<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What\u2019s affected?\n<ul class=\"wp-block-list\">\n<li>homepage only, login, checkout, API, everything?<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Who\u2019s affected?\n<ul class=\"wp-block-list\">\n<li>one region or global?<\/li>\n\n\n\n<li>all users or only logged-in users?<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>When did it start?\n<ul class=\"wp-block-list\">\n<li>note the start time and whether there was a recent deploy\/config change<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p><strong>Declare<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Open an incident thread (Slack\/Teams\/text) and assign:\n<ul class=\"wp-block-list\">\n<li><strong>primary responder<\/strong><\/li>\n\n\n\n<li><strong>comms owner<\/strong> (even if that\u2019s you)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p><strong>Start a simple incident log<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Timestamped notes:\n<ul class=\"wp-block-list\">\n<li>\u201c15:02 alert fired\u201d<\/li>\n\n\n\n<li>\u201c15:04 confirmed from hotspot\u201d<\/li>\n\n\n\n<li>\u201c15:06 suspect DNS\u201d<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>This prevents confusion later and makes your postmortem easy.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Identify the layer: DNS vs hosting vs app vs third-party<\/h2>\n\n\n\n<p>Once the incident is confirmed and scoped, you want to identify the <em>layer<\/em> that\u2019s failing. Most outages map cleanly to one of these buckets.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Symptom \u2192 likely cause (quick table)<\/h3>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Symptom<\/th><th>What it usually means<\/th><th>Where to look first<\/th><\/tr><\/thead><tbody><tr><td>Domain won\u2019t resolve \/ \u201csite can\u2019t be reached\u201d<\/td><td>DNS issue, domain expired, resolver problem<\/td><td>DNS provider, registrar, <a href=\"https:\/\/www.sslshopper.com\/website-monitoring\/dns-domain-monitoring\/\">DNS monitoring<\/a><\/td><\/tr><tr><td>TLS\/SSL warning in browser<\/td><td>Expired cert, chain mismatch, TLS config issue<\/td><td>Certificate renewal, CDN\/WAF TLS settings<\/td><\/tr><tr><td>500 errors<\/td><td>Application bug or misconfig<\/td><td>App logs, recent deploys, env vars<\/td><\/tr><tr><td>502\/504 gateway errors<\/td><td>Upstream (app server) failing behind proxy\/load balancer<\/td><td>Load balancer, origin health, app servers<\/td><\/tr><tr><td>503 errors<\/td><td>Overload, maintenance mode, dependency failure<\/td><td>Capacity, maintenance toggles, upstream services<\/td><\/tr><tr><td>Timeout<\/td><td>Overload, networking issue, DB stalls, deadlocks<\/td><td>Host metrics, DB, upstream dependencies<\/td><\/tr><tr><td>Only one region affected<\/td><td>CDN POP issue, routing\/ISP, regional DNS<\/td><td>CDN status, multi-location checks, DNS<\/td><\/tr><tr><td>Checkout\/login broken but homepage loads<\/td><td>Third-party (payments\/auth) or app flow bug<\/td><td>Payment\/auth provider, flow tests, recent changes<\/td><\/tr><tr><td>403\/429 in monitors but site \u201cworks\u201d<\/td><td>WAF\/bot protection blocking probes<\/td><td>WAF rules, allowlisting, monitor config<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">DNS layer (often the sneakiest)<\/h3>\n\n\n\n<p>DNS issues create the classic \u201cworks for me\u201d problem because different resolvers and regions can behave differently.<\/p>\n\n\n\n<p>Check:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DNS provider status<\/li>\n\n\n\n<li>recent DNS changes<\/li>\n\n\n\n<li>domain expiration and nameserver correctness<\/li>\n<\/ul>\n\n\n\n<p>If you want proactive prevention here, start with <strong><a href=\"https:\/\/www.sslshopper.com\/website-monitoring\/dns-domain-monitoring\/\">DNS monitoring<\/a><\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Hosting \/ infrastructure layer<\/h3>\n\n\n\n<p>If DNS is fine but the site times out or returns 502\/503:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>check hosting provider status page<\/li>\n\n\n\n<li>check server health (CPU\/RAM\/disk)<\/li>\n\n\n\n<li>check whether your load balancer sees healthy upstreams<\/li>\n\n\n\n<li>consider whether you hit capacity (traffic spike, bot attack, background job storm)<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Application layer<\/h3>\n\n\n\n<p>If you see 500s or a specific flow fails:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>correlate with <strong>recent deploy time<\/strong><\/li>\n\n\n\n<li>roll back quickly if the timing matches<\/li>\n\n\n\n<li>check logs for errors\/exceptions<\/li>\n\n\n\n<li>check DB connectivity and migrations<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Third-party dependencies<\/h3>\n\n\n\n<p>Common external causes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>payment gateway outage<\/li>\n\n\n\n<li>auth\/OAuth provider issues<\/li>\n\n\n\n<li>critical API dependency latency<\/li>\n\n\n\n<li>CDN degradation<\/li>\n\n\n\n<li>email provider failures (for login\/verification flows)<\/li>\n<\/ul>\n\n\n\n<p>If your \u201csite is up\u201d but users can\u2019t complete critical actions, dependencies are a prime suspect.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The next 10 minutes: stabilize (stop the bleeding)<\/h2>\n\n\n\n<p>Once you have a working hypothesis, prioritize <strong>mitigation<\/strong> over perfect diagnosis.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Fast stabilization moves (choose what fits)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Rollback the last deploy<\/strong> (if the incident aligns with a change window)<\/li>\n\n\n\n<li>Disable a feature flag or revert a config toggle<\/li>\n\n\n\n<li>Scale up resources temporarily (compute\/database) if overloaded<\/li>\n\n\n\n<li>Bypass or degrade gracefully around a slow dependency<\/li>\n\n\n\n<li>Turn on a maintenance page <em>only if you need to protect data integrity<\/em><\/li>\n<\/ul>\n\n\n\n<p><strong>Guiding principle:<\/strong> restore service first, then investigate deeply.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Rollback strategy basics (how to do it safely)<\/h2>\n\n\n\n<p>Rollbacks are one of the most effective \u201csmall team\u201d outage tools\u2014when done calmly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">When to roll back<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The outage began right after a deploy\/config change<\/li>\n\n\n\n<li>Error rates spiked immediately after release<\/li>\n\n\n\n<li>A single flow (login\/checkout) broke after a change<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">How to roll back (simple version)<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Identify the last known good version (release tag\/commit\/build)<\/li>\n\n\n\n<li>Roll back <strong>one step<\/strong> (don\u2019t stack changes)<\/li>\n\n\n\n<li>Confirm recovery using monitors + real user checks<\/li>\n\n\n\n<li>Pause further deployments until stable<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Two rollback tips that save pain<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Don\u2019t debug in production first.<\/strong> Roll back to restore users, then debug with breathing room.<\/li>\n\n\n\n<li><strong>Write down what you changed.<\/strong> It helps you avoid reintroducing the issue later.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Communication steps (internal + status page)<\/h2>\n\n\n\n<p>Communication isn\u2019t \u201cnice.\u201d It prevents chaos and reduces support load.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Internal communication (within 10 minutes)<\/h3>\n\n\n\n<p>Post a quick note:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>what\u2019s happening (symptom, not speculation)<\/li>\n\n\n\n<li>what\u2019s impacted<\/li>\n\n\n\n<li>who\u2019s owning the fix<\/li>\n\n\n\n<li>when the next update will be<\/li>\n<\/ul>\n\n\n\n<p>Example:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cInvestigating: users seeing 503s on checkout. Confirmed in US-East and EU. @Sam owning. Next update in 15 minutes.\u201d<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">Status page: when and how<\/h3>\n\n\n\n<p>If customers are affected and the incident isn\u2019t resolved quickly, use a status page. Start here: <strong><a href=\"https:\/\/www.sslshopper.com\/website-monitoring\/status-page-guide\/\">status pages<\/a><\/strong>.<\/p>\n\n\n\n<p>A good rule:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If it impacts customer success and lasts longer than ~10\u201315 minutes, post a status update.<\/li>\n<\/ul>\n\n\n\n<p><strong>Status update template<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Status:<\/strong> Investigating \/ Identified \/ Monitoring \/ Resolved<\/li>\n\n\n\n<li><strong>Impact:<\/strong> who\/what is affected<\/li>\n\n\n\n<li><strong>Current state:<\/strong> brief and factual<\/li>\n\n\n\n<li><strong>Next update:<\/strong> time-bound commitment<\/li>\n<\/ul>\n\n\n\n<p>Avoid speculation. Be honest and short.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Minute 20\u201330: confirm recovery and prevent immediate relapse<\/h2>\n\n\n\n<p>Once you\u2019ve applied a fix or rollback:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Recovery checklist<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Confirm monitors show <strong>UP<\/strong> across regions<\/li>\n\n\n\n<li>Confirm critical flows:\n<ul class=\"wp-block-list\">\n<li>login (if SaaS)<\/li>\n\n\n\n<li>checkout (if ecommerce)<\/li>\n\n\n\n<li>key API endpoint (if product relies on API)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Watch for flapping (up\/down)<\/li>\n\n\n\n<li>Keep comms cadence until stable<\/li>\n<\/ul>\n\n\n\n<p>If it\u2019s \u201cup\u201d but slow, treat it as a performance incident (often a precursor to another outage).<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Copy\/paste runbook (printable checklist)<\/h2>\n\n\n\n<p>Paste this into a doc and keep it somewhere obvious.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Website Outage Runbook (30 minutes)<\/h3>\n\n\n\n<p><strong>Roles<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Primary responder: ________<\/li>\n\n\n\n<li>Comms owner: ________<\/li>\n\n\n\n<li>Backup\/escalation: ________<\/li>\n<\/ul>\n\n\n\n<p><strong>Links<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Monitoring dashboard: ________<\/li>\n\n\n\n<li>Hosting provider status: ________<\/li>\n\n\n\n<li>DNS provider\/registrar: ________<\/li>\n\n\n\n<li>Deploy\/CI pipeline: ________<\/li>\n\n\n\n<li>Logs\/APM: ________<\/li>\n\n\n\n<li>Status page: ________<\/li>\n<\/ul>\n\n\n\n<p><strong>0\u20135 minutes (Confirm + Scope)<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Confirm alert (retries\/regions)<\/li>\n\n\n\n<li>Verify from second network\/location<\/li>\n\n\n\n<li>Identify affected services\/pages<\/li>\n\n\n\n<li>Open incident channel + assign roles<\/li>\n\n\n\n<li>Start incident log (timestamps)<\/li>\n<\/ul>\n\n\n\n<p><strong>5\u201315 minutes (Identify layer)<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DNS vs hosting vs app vs third-party<\/li>\n\n\n\n<li>Check provider <a href=\"https:\/\/www.sslshopper.com\/website-monitoring\/status-page-guide\/\">status pages<\/a><\/li>\n\n\n\n<li>Check recent changes (deploy\/config\/DNS)<\/li>\n<\/ul>\n\n\n\n<p><strong>15\u201330 minutes (Mitigate + Communicate)<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Roll back if change-correlated<\/li>\n\n\n\n<li>Scale or fail over if overloaded<\/li>\n\n\n\n<li>Post internal update + next update time<\/li>\n\n\n\n<li>Post status page update if customer impact persists<\/li>\n\n\n\n<li>Confirm recovery across regions + critical flow test<\/li>\n<\/ul>\n\n\n\n<p><strong>After recovery<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Capture timeline and root cause<\/li>\n\n\n\n<li>Create action items (prevention + detection + docs)<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Post-incident review template (keep it lightweight)<\/h2>\n\n\n\n<p>A good postmortem isn\u2019t about blame\u2014it\u2019s about preventing repeat incidents.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Template (copy\/paste)<\/h3>\n\n\n\n<p><strong>Incident summary:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What happened (plain English)<\/li>\n<\/ul>\n\n\n\n<p><strong>Timeline:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Start time:<\/li>\n\n\n\n<li>Detection time:<\/li>\n\n\n\n<li>Mitigation time:<\/li>\n\n\n\n<li>Resolution time:<\/li>\n<\/ul>\n\n\n\n<p><strong>Impact:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Who was affected and how (regions, pages, users)<\/li>\n\n\n\n<li>Revenue\/support impact (if known)<\/li>\n<\/ul>\n\n\n\n<p><strong>Root cause:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Primary cause:<\/li>\n\n\n\n<li>Contributing factors:<\/li>\n<\/ul>\n\n\n\n<p><strong>What went well:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>(e.g., fast rollback, good comms)<\/li>\n<\/ul>\n\n\n\n<p><strong>What didn\u2019t go well:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>(e.g., unclear ownership, missing alerts, false positives)<\/li>\n<\/ul>\n\n\n\n<p><strong>Action items:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prevention (fix underlying issue)<\/li>\n\n\n\n<li>Detection (add\/adjust monitors, keyword checks, regions)<\/li>\n\n\n\n<li>Response (update runbook, escalation)<\/li>\n\n\n\n<li>Owner + due date for each item<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">CTA: Print\/save the checklist<\/h2>\n\n\n\n<p>You don\u2019t want to invent a process during an outage.<\/p>\n\n\n\n<p><strong>CTA:<\/strong> Print or save the <strong>first 30 minutes checklist<\/strong> and the <strong>runbook template<\/strong> somewhere your team can find in seconds\u2014then you\u2019ll triage first, diagnose second, and fix third.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[1,271 words, 7 minute read time] A \u201csite down\u201d alert triggers adrenaline for a reason: downtime threatens revenue, trust, and your sanity. But the fastest way to make an outage worse is to jump straight into random debugging. Triage first, diagnose second, fix third. This guide is a practical incident response playbook for small teams, &#8230; <a title=\"Website Down? What to Do in the First 30 Minutes\" class=\"read-more\" href=\"https:\/\/www.sslshopper.com\/website-monitoring\/website-down-incident-response\/\" aria-label=\"Read more about Website Down? What to Do in the First 30 Minutes\">Read more<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[110],"tags":[],"class_list":["post-5504","post","type-post","status-publish","format-standard","hentry","category-alerts"],"_links":{"self":[{"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/posts\/5504","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/comments?post=5504"}],"version-history":[{"count":2,"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/posts\/5504\/revisions"}],"predecessor-version":[{"id":5572,"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/posts\/5504\/revisions\/5572"}],"wp:attachment":[{"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/media?parent=5504"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/categories?post=5504"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sslshopper.com\/website-monitoring\/wp-json\/wp\/v2\/tags?post=5504"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}