Astro SEO
Audits and improves the SEO setup of an Astro site against the full stack described in
Astro SEO: the definitive guide. The skill covers nine areas — technical foundation, structured data, content, site structure, performance, sitemaps and indexing, agent discovery, redirects, and analytics — and produces drop-in code for anything missing or weak.
The opinionated spine of this skill is
. Most of the fixes route through it. If the project doesn't use it yet, installing it is the first recommendation.
Code recipes live in — read it when you need to implement a specific fix. This file has the workflow and audit checklist.
Workflow
- Detect the project — confirm this is an Astro site and understand its shape.
- Audit — score nine categories and produce actionable findings.
- Improve — generate or modify files to close the gaps. Recipes are in .
- Metadata pass — invoke on every short string the skill generated (titles, descriptions, schema fields, FAQ answers, frontmatter excerpts).
- Verify — run the build, check validations pass, remind the user about non-file tasks (Search Console, Bing Webmaster Tools, IndexNow key verification).
Phase 0: Detect the project
Confirm the basics before auditing:
- / exists.
- has as a dependency.
- is set in — canonicals, sitemaps, and OG image URLs all derive from this. If it's missing, empty, or , flag it as a blocking issue before anything else. This is the single most common misconfiguration.
- Content collections in (or legacy markdown).
- Deployment target — read , , , , or to determine the host. This drives redirect and header syntax in Phase 2.
- Is already installed? If yes, record the version and which features are wired (grep for , , , , , , , , ). Check the installed version against the latest on npm with
npm view @jdevalk/astro-seo-graph version
. If the project is behind, recommend an upgrade in Phase 2 before auditing feature gaps — the package ships new defaults and fixes regularly, and an outdated version is a plausible cause for any audit finding. Phase 2 branches on this.
- Is the site multilingual? Check for in or multiple locale directories under . If yes, hreflang matters; if no, skip it.
Ask only what you can't detect. Don't ask the user what the site is about — read
and the homepage content.
Phase 1: Audit
Score each category out of 10. For each, give 2–4 specific findings that quote the actual code or config. Within each category, checks are tiered:
- Must — ship blockers. A failure here causes visible SEO regression.
- Should — standard practice. Skipping costs reach.
- Nice — forward-looking or situational. Useful but not baseline for every site.
Skip Nice checks for small personal blogs unless the user asks for the full treatment.
1. component and head metadata (/10)
- Must — single component for all head metadata (not scattered across layouts).
- Must — in is set to the production origin.
- Must — canonical URLs derived from config with tracking params stripped.
- Must — canonical omitted when is true (per Google's recommendation).
- Must — fallback chain for missing SEO fields:
seo.title → title → siteName
; seo.description → excerpt → first paragraph
. Pages with blank titles or descriptions are the most common symptom of a broken fallback.
- Should — meta includes , , .
- Should — Twitter tags suppressed when they duplicate Open Graph (Twitter falls back automatically).
- Should — alternates present on multilingual sites. Skip if monolingual.
- Nice — uses 's component rather than hand-rolled. (Hand-rolled that covers everything above is fine; this skill nudges toward the package because it handles the fallback chain and robots rules by default.)
2. Structured data / JSON-LD graph (/10)
- Single flat object, or a linked with multiple entities?
- Entities wired with references?
- , /, /, /, , all present where relevant?
- Trust signals: , , , , ?
- Validates in Rich Results Test and ClassySchema?
3. Content collections and SEO schema (/10)
- Content collections defined with Zod schemas?
- from enforcing title (5–120) and description (15–160) lengths?
- Required fields (, , ) enforced at build time?
- Markdown-stripped exposed in schema endpoints (up to 10K chars)?
4. Open Graph images (/10)
- Every page has an OG image, or many missing?
- 1200×675 (Google Discover minimum 1200px wide, 16:9 ratio)?
- Generated at build time via satori + sharp, or manual?
- JPEG (social platforms don't reliably support WebP/AVIF)?
- Route derives OG URL from the slug automatically?
- Every in rendered HTML has an attribute (or / for decorative images)? on catches this at build time in ≥ 1.1.0.
5. Sitemaps and indexing (/10)
- Must — installed, sitemap index reachable.
- Must — references the sitemap index.
- Must — RSS feed exists (), advertised via
<link rel="alternate" type="application/rss+xml">
, contains full post content (not truncated excerpts).
- Should — split per-collection via option (, etc.) — much easier to debug indexing in GSC.
- Should — populated from git commit timestamps, not frontmatter dates or CI file timestamps. ≥ 1.4.0 exports
gitLastmod(filePath, { excludeCommits, depth })
for this — it returns the committer date of the most recent non-excluded commit that touched the file, or if git is unavailable. Wire it into the sitemap callback. If the codebase has a hand-rolled helper, replace it with the package export — the export handles bulk-commit exclusion (imports, reformats) which a naïve lookup can't.
- Should — IndexNow integrated and submitting on each build, with key verification route at . ≥ 1.0.1 excludes from submissions by default. Gate submission on the production host (e.g.
process.env.CF_PAGES === '1' && CF_PAGES_BRANCH === 'main'
, VERCEL_ENV === 'production'
, ). Unconditional submission pings the endpoint on every local and preview deploy with URLs the production host hasn't served yet, which gets the key marked invalid (403) and forces rotation.
6. Agent discovery (/10)
- Should — schema endpoints () exposing corpus-wide JSON-LD.
- Should — schema map () listing all endpoints, with directive in .
- Should — at the site root listing pages (title + description) for LLM consumers. ≥ 0.9.0 generates this via the integration option.
- Should — markdown-alternate URLs ( next to ) serving clean markdown with YAML frontmatter for AI agents to consume without HTML parsing. ≥ 1.2.0 ships for the route and a integration option that emits
<link rel="alternate" type="text/markdown">
on every page. ≥ 1.3.0 adds post-build verification: the integration walks the build output and strips any link whose target isn't on disk (with a per occurrence) — so a misconfigured endpoint surfaces as a build warning instead of a shipped 404. Pair with a Cloudflare Transform Rule (URL rewrite via , no header — CF strips custom Vary values) for content negotiation on without needing SSR.
- Should — API catalog at per RFC 9727. ≥ 1.4.0 ships , which auto-types schema endpoints to their
https://schema.org/<Type>
URLs and absolutizes paths against . List the schema endpoints, schemamap, and any site-specific APIs (, , etc.). The RFC is finalized so the format is stable; adoption is early but the cost is one route file.
- Should — Content Signals directive in (e.g.
Content-Signal: ai-train=yes, search=yes, ai-input=yes
). Declares preferences for AI training, search grounding, and AI input use independently of crawl access. The spec is an IETF draft and adoption is early, but it's one line in a file every site already has.
- Should — header on pointing to discovery files (sitemap, llms.txt, api-catalog, schemamap, and any MCP / A2A cards the site publishes). Agents reading response headers find them without parsing HTML. On Cloudflare Pages / Netlify this is ; on Vercel it's .
- Nice — MCP server card at
/.well-known/mcp/server-card.json
(SEP-1649) and / or A2A agent card at /.well-known/agent-card.json
(A2A protocol). Only relevant when the site actually exposes an MCP server or A2A agent. Both are static JSON files declaring the endpoint, capabilities, and skills. Skip otherwise.
- Nice — pointing to a conversational endpoint. NLWeb is early; the tag is one line and worth having, but it's not a scoring blocker in 2026.
7. Performance (/10)
- Static output by default (no SSR on pages that don't need it)?
- Zero client-side JS unless an island requires it?
- Astro component used for all content images (responsive srcset, WebP, lazy, async)?
- Primary web font preloaded in woff2?
- with
defaultStrategy: 'viewport'
for prefetch?
- Hashed assets under serve
Cache-Control: public, max-age=31536000, immutable
?
- response header stripping UTM parameters from cache key?
8. Redirects and error handling (/10)
- (or equivalent) maintained for every URL that ever existed and moved?
- 301 not 302 for permanent moves?
- component from wired into the 404 page?
- 404 page itself returns a 404 status, not 200?
9. Build-time validation and content quality (/10)
- Must — integration running on each build with and enabled. For JSON-LD validation, pass
warnOnDanglingReferences: true
to in — that's the assembly-time check, not an integration option.
- Should — , , and enabled on (all default in ≥ 1.1.0). They catch missing alt text, titles or descriptions outside SERP bounds (defaults: title 30–65, description 70–200), and internal links that 404 or hit a trailing-slash mismatch. Upgrade to ≥ 1.1.1 if the project is on 1.1.0 — that patch release fixes two validator bugs: now recognises assets as valid targets (no more false positives on or ), and no longer truncates descriptions containing a raw apostrophe. Use only for SSR-only routes, wildcards, and params.
- Should — broken link checker in CI for external links. A lychee GitHub Action on every push to content files catches dead links before they go live; a weekly scheduled run catches link rot as external sites move or disappear. Broken outbound links are a bad UX and a negative trust signal. Internal links are covered by at build time; lychee handles everything else.
- Should — SEO strings (titles, descriptions, FAQ answers) audited for metadata quality — front-loading, concreteness, truncation fit, no title/description duplication. Phase 2.5 chains this in via . Individual post prose can be audited separately via .
Phase 2: Improve
Based on the audit, produce the concrete code. Always ask before overwriting.
Read for detailed recipes.
Branch on the Phase 0 findings. If
is already installed, skip the install step and focus on wiring the features the audit flagged as missing (IndexNow, FuzzyRedirect, schema endpoints, build validation). If the user has a hand-rolled setup that already satisfies the
Must checks in category 1, don't rip it out — add only what's missing. Replacement is a last resort, not the default.
sections: Install/upgrade, Integration config, BaseHead.astro, Content collection schema, Sitemap + git lastmod, OG image route, Schema endpoints, llms.txt, Markdown alternates, API catalog, Content Signals in robots.txt, MCP and A2A discovery cards, Link headers for agent discovery, RSS feed, Redirects + FuzzyRedirect, Performance headers, Broken link checker in CI.
Phase 2.5: Metadata and readability pass
Invoke the
skill on every short string the skill generated or modified: page titles, meta descriptions, schema
fields, FAQ answers, and any blog post frontmatter
values you wrote. It checks front-loading, concreteness, filler, active voice, title/description duplication, difficult words, SERP-truncation fit (title 30–65, description 70–200 — the same bounds
enforces), and one-idea-per-field. Apply the fixes directly. Skip the pass entirely for technical strings (URLs, schema
values, enum values).
If the project has a blog or docs content collection, mention to the user as a follow-up that the
skill can audit individual posts for multi-paragraph prose quality — but don't audit the entire content corpus yourself.
Phase 3: Verify
- Run . If is wired, this also runs H1 validation, duplicate-meta detection, and schema validation — surface any warnings.
- Spot-check the built HTML: one page's should now be clean, canonical correct, JSON-LD graph present and linked.
- Run the homepage through Rich Results Test and ClassySchema.
- Confirm exists and references per-collection sitemaps.
- If IndexNow is wired, confirm the key verification route returns the key at .
- Remind the user about tasks that can't be automated:
- Register the site in Google Search Console and Bing Webmaster Tools.
- Submit the sitemap index in both.
- Generate an IndexNow key and commit it to config.
- Install Plausible or equivalent privacy-friendly analytics.
Output format
markdown
## Astro SEO audit: [site name]
### Score
|---------------------------------------|------:|
| 1. `<Seo>` component and head | x/10 |
| 2. Structured data / JSON-LD graph | x/10 |
| 3. Content collections and schema | x/10 |
| 4. Open Graph images | x/10 |
| 5. Sitemaps and indexing | x/10 |
| 6. Agent discovery | x/10 |
| 7. Performance | x/10 |
| 8. Redirects and error handling | x/10 |
| 9. Build-time validation and content | x/10 |
| **Total** | xx/90 |
### Findings
[Grouped by category. Quote actual code/config. Be specific.]
### Files generated or changed
[List with short description of each.]
### Next steps
[Non-file tasks: GSC, Bing Webmaster Tools, IndexNow key generation, analytics.]
Key principles
- Opinionated defaults over optionality. The guide picks a stack; this skill applies it. Don't offer five alternatives when one works.
- is the spine. Route the component, schema endpoints, IndexNow, FuzzyRedirect, and build validation through it unless the user has a strong reason to hand-roll.
- Topics, not keyphrases. When reviewing content, focus on topical coverage and readability, not keyword density.
- Static, CDN-served HTML is the baseline. Don't add SSR to solve problems static builds already don't have.
- Agent discovery matters now. Schema endpoints, schema map, NLWeb tags — the crawler is no longer the only consumer.