The XML sitemap is the file at /sitemap.xml on most websites, and it is the single piece of SEO infrastructure that goes wrong most often without anyone noticing. Sitemaps are not glamorous. They are not visible to visitors. They contain no copy worth reading. They are also one of the highest-leverage technical-SEO assets a site has, and a broken or missing one is a quiet drag on every page's discoverability.

This post is the version of the sitemap conversation I want my clients to have read. What it is, what it does, what I do with it on every build, and what to check on your own site.

What an XML sitemap actually is

An XML sitemap is a single file that lists every public URL on a website, along with optional metadata about each one (last modified date, change frequency, relative priority). The format is a small XML schema published by sitemaps.org and supported by every major search engine.

A minimal sitemap entry looks like this:

<url>
  <loc>https://example.com/services/roof-repair/</loc>
  <lastmod>2026-04-22</lastmod>
</url>

A complete sitemap is a wrapper around as many of those entries as the site has pages, up to a per-file limit of 50,000 URLs (large sites split into multiple files indexed by a sitemap index file). For a typical service-business site of fifteen to fifty pages, the whole file is a few kilobytes.

What it actually does

The sitemap is a hint to search engines about which URLs the site considers canonical and worth indexing. It is not a directive: search engines crawl pages they discover through links regardless of whether those pages are in the sitemap, and they may decline to index pages that are in the sitemap. The sitemap accelerates discovery and helps the crawler prioritize, but it does not override the crawler's judgment.

For new sites, the sitemap is how Google finds pages quickly that would otherwise take weeks to discover through link-following. For established sites, the sitemap is how Google notices when an existing page has been updated, since the <lastmod> timestamp tells the crawler the page is worth a fresh visit.

For sites with internal pages that are not linked from the homepage (deep service-area pages, individual blog posts, pagination tail pages), the sitemap is how those pages get indexed at all.

What I do with sitemaps on every build

Every site I build ships with a sitemap.xml. The file is generated automatically at build time, not maintained by hand, and it stays current through every content update without anyone having to remember to update it.

Concretely, the build pipeline does five things:

  1. Auto-generate the sitemap from the page index. Every published page in the build's collection is included by default. Pages marked as noindex in their front matter (thank-you pages, private dashboards, hidden landing pages) are excluded. The script handles the canonical-URL logic.
  2. Set the lastmod from each page's front matter date. When a page is updated, its date can be bumped or left as the original publish date depending on whether the change is meaningful. Trivial typo fixes do not bump the date; substantive edits do.
  3. Reference the sitemap from robots.txt. Search engine crawlers check /robots.txt on every visit, and the Sitemap: directive in robots tells them where to find the sitemap without needing manual submission.
  4. Submit the sitemap to Google Search Console at launch. Manual one-time step, takes thirty seconds, makes the difference between a sitemap Google notices in a day versus a week.
  5. Add a human-readable sitemap at /sitemap/. Same content, formatted for visitors who want to navigate the site by index. The XML is for crawlers; the HTML version is for the small number of humans who genuinely use sitemaps to find pages.

The whole pipeline runs on every deploy. There is no manual maintenance burden, and the sitemap stays correct as long as the build is correct.

The most common ways sitemaps go wrong

I have audited enough small-business sites to see the same five sitemap failures repeat.

The sitemap is missing. The site has no /sitemap.xml at all. Search engines have to discover every page through link-following, which on a deep site can take weeks. Particularly common on Wix and older Squarespace sites that disable the sitemap by default.

The sitemap exists but is not in robots.txt. The file is there but the crawler does not know where to find it. Adding the Sitemap: line to robots.txt takes thirty seconds and meaningfully accelerates discovery.

The sitemap lists URLs that no longer exist. Pages were renamed or deleted but the sitemap still references the old paths. Crawlers hit 404s every time they try to index those URLs, which wastes crawl budget and signals an unhealthy site.

The sitemap omits half the site. A WordPress plugin generated the sitemap once at install time and never updated it. New blog posts, new service pages, new service-area pages all live outside the sitemap and take much longer to be discovered.

The sitemap includes URLs that should be private. Login pages, admin paths, search-result pages, paginated archive pages, thank-you pages, and other operational URLs that should not appear in search results all end up in the sitemap because the generator was not configured to exclude them. Google sometimes indexes them anyway.

What to check on your own site

Three quick checks any site owner can run in five minutes:

1. Does the sitemap exist? Visit https://yoursite.com/sitemap.xml in a browser. You should see a structured XML response, not a 404 or a redirect. If the file does not exist, that is the first thing to fix.

2. Is the sitemap referenced from robots.txt? Visit https://yoursite.com/robots.txt and look for a line that says Sitemap: https://yoursite.com/sitemap.xml. If the line is missing, search engines have to find the sitemap through Search Console submission rather than through the standard discovery mechanism.

3. Is the sitemap up to date? Open the sitemap in a browser and look at the dates. If the most recent <lastmod> is from years ago, the sitemap is stale. If the URLs in the sitemap do not match the URLs actually on the site, the generator is broken.

For deeper inspection, Google Search Console's Sitemaps report shows exactly which URLs Google has indexed, which it has discovered but not indexed, and which it has rejected. The report names the rejection reason for each URL, which is the diagnostic data I work with most often when investigating indexing issues.

What to do with the findings

If the sitemap is missing entirely, the fix depends on the platform.

Custom-coded sites: The sitemap is part of the build pipeline. If it is missing, the pipeline is missing a step. (For sites I build, this never happens; the sitemap is a build artifact tested on every deploy.)

WordPress: Yoast SEO and Rank Math both generate sitemaps automatically. The fix is usually installing one or the other and confirming the sitemap renders. Avoid relying on multiple SEO plugins simultaneously; they sometimes conflict and produce broken sitemaps.

Wix and Squarespace: Both platforms generate sitemaps automatically. If yours is missing or broken, the platform's settings panel has a toggle. Older Squarespace versions had genuine bugs here; Squarespace 7.1 is reliable.

Shopify: Built-in, no configuration needed. If a Shopify sitemap is broken, that is a platform-level bug worth a support ticket.

If the sitemap exists but is stale or contains broken URLs, the regeneration mechanism needs investigating. On WordPress, that usually means clearing the SEO plugin's cache. On a custom-coded site, that means checking whether the build is actually running on every deploy.

What a working sitemap should look like

For reference, the sitemap on this site (pikespeakwebdesigns.com/sitemap.xml) is the standard pattern: every published page included with a current lastmod, all noindex pages excluded, all blog posts present, all service-area pages present, all season landing pages present. It is regenerated on every build and currently lists about 200 URLs depending on the latest content.

If your site's sitemap looks meaningfully different from that pattern (much shorter than your actual page count, much longer because of stale URLs, missing entirely), that is a real opportunity for improvement.

If you want me to look at your specific site, the free five-point audit covers sitemap configuration as part of the standard pass. The audit returns a written report with the specific issues and recommended fixes; if the issues are minor, you can address them on whatever platform you are on. If they are not, the rebuild conversation is the next step.

Share this article
Sitemap on every build

Auto-generated, submitted to Google, kept current.

Every site I build ships with a clean sitemap.xml that auto-updates on every deploy and is submitted to Google Search Console at launch. Part of the standard plan.

Start a Conversation → See what's included