BetterAISearch
Technical|7 min read

AI crawlability audit: what Writesonic found when they tested 62 webpage elements across 6 AI crawlers — and what it means for your site

BE
BetterAISearch Editorial Team
BetterAISearch

JSON-LD, the format Google recommends for structured data, scored zero out of six for readability across the six major AI crawlers that Writesonic tested. If your AI optimisation strategy is built on schema markup, this study changes the calculation.

0 / 6
AI crawlers that could read JSON-LD structured data
Writesonic, 62 webpage elements tested across 6 AI crawlers, March 2026.

What the study tested

In March 2026, Writesonic published a systematic test of 62 webpage elements across six major AI crawlers: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, Gemini crawler (Google), Meta-ExternalAgent, and Applebot. For each element, they scored whether the crawler could reliably extract the information. The score was the number of crawlers out of six that read the element accurately.

The results are significantly different from what most AI optimisation advice assumes.

What AI crawlers cannot read

The worst-performing category was metadata embedded in script tags. JSON-LD scored zero. Open Graph tags scored zero. Meta descriptions scored zero. Twitter card tags scored zero. These are among the most commonly recommended technical SEO elements, and none of them appear to be inputs that AI crawlers are using.

JavaScript-rendered content scored poorly across the board. Pages where the body content is populated by React, Vue, or similar frameworks — and is empty in the HTML source — were largely unreadable. This has significant implications for single-page applications and any site that relies on client-side rendering for its primary content.

Element typeAI crawler readability (out of 6)Notes
Title tag5 / 6Only metadata element with strong cross-crawler readability
Visible body text6 / 6Primary input for all crawlers; must be in HTML source
H1 / H2 / H3 headings6 / 6Semantic heading structure strongly readable
Alt text (images)4 / 6Varies by crawler; GPTBot and Perplexitybot strongest
JSON-LD structured data0 / 6Not read by any tested crawler
Meta description0 / 6Not extracted by AI crawlers in tests
Open Graph tags0 / 6Not extracted by AI crawlers in tests
JS-rendered content1 / 6Most AI crawlers do not execute JavaScript

Source: Writesonic, 62 webpage elements, 6 AI crawlers, March 2026 (abridged)

What AI crawlers do read

Visible body text scored six out of six. Every crawler tested could read the text that appears on the page for a human reader. This is the primary content channel for AI retrieval.

Semantic heading structure scored six out of six. H1, H2, and H3 tags were reliably parsed across all crawlers. This confirms what content structure research suggests: clear heading hierarchy is not just a user experience signal, it is a machine-readability signal for AI systems.

The title tag scored five out of six — the only metadata element with meaningful AI crawler readability. Meta-ExternalAgent (Meta's crawler) was the exception. For all other major crawlers, the title tag is the one metadata element you should ensure is accurate and descriptive.

Image alt text scored four out of six. GPTBot and PerplexityBot showed the strongest alt text readability. For image-heavy content, alt text remains a worthwhile signal — but not for all crawlers.

What a proper AI crawlability audit should check

A traditional technical SEO audit focuses on crawl errors, redirect chains, canonical tags, and structured data validation. An AI crawlability audit has a different checklist.

1. Robots.txt permissions for AI crawlers

The first check is whether your robots.txt is blocking AI crawlers. The major crawlers and their user-agent strings are listed below. Any disallow rule that matches these agents prevents those AI systems from indexing your content for potential citations.

robots.txt — allow all AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Meta-ExternalAgent
Allow: /

2. Server-side vs client-side rendering

If your primary content is rendered by JavaScript, check whether AI crawlers can access it. The simplest test is to view the page source (Ctrl+U in Chrome) and search for your key body text. If the text is not present in the raw HTML, most AI crawlers are not reading it.

Next.js, Nuxt, and similar frameworks with server-side rendering (SSR) or static generation (SSG) produce HTML that AI crawlers can read. Client-side-only rendering (CSR) does not.

3. Heading hierarchy and content structure

Each page should have exactly one H1 that describes the primary topic. H2s should represent major subtopics. H3s should represent subdivisions within H2 sections. Heading text should be descriptive and include relevant topic terms — AI crawlers use headings to understand content structure in the same way users do.

AirOps analysis of 815,484 AI-cited pages found that pages with 7 to 20 subheadings achieved higher citation rates than pages with fewer or more. Over-structuring and under-structuring both reduce citation probability.

4. Crawl rate and server stability

AI crawlers typically use the crawl-delay directive in robots.txt and respect Retry-After headers. If your server responds slowly or intermittently, crawlers may skip pages or fail to complete a crawl. Check your server logs for GPTBot, ClaudeBot, and PerplexityBot access patterns to confirm they are completing successful crawls.

5. Content accessibility — no login walls, no paywalls

AI crawlers do not authenticate. Any content behind a login, paywall, or cookie consent gate that prevents page load is invisible to them. If you want AI systems to cite specific content, that content must be accessible without authentication.

What schema markup still does

JSON-LD not being read by AI crawlers does not mean schema markup is worthless. It means the value is in a different channel.

Schema markup benefits Google AI Overviews indirectly, because Google AI Overviews builds on Google Search infrastructure that does read structured data. For AIO specifically, Article and Person schema remain relevant. For ChatGPT, Anthropic, and Perplexity, the evidence suggests schema is not a direct input to their retrieval systems.

The more productive reframe: visible content signals are the universal AI crawlability layer. Schema markup is a Google-specific amplifier. Both have value; they operate in different channels.

The bottom line

AI crawlability is fundamentally different from traditional search crawlability. The technical elements that matter most are the ones visible to human readers: body text, heading structure, title tags, and clean HTML. The metadata layer that traditional SEO builds on — JSON-LD, Open Graph, meta descriptions — does not appear to reach most AI crawlers.

Run an AI crawlability audit that starts with what crawlers can actually read, not what traditional SEO tooling tells you to add. The two audits have significant overlap in outcomes — both reward structured, accessible, clearly attributed content — but the checklist is different.

Frequently asked questions

What is an AI crawlability audit?

An AI crawlability audit reviews your website to determine how effectively AI crawlers — including GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and Googlebot-Image — can access, parse, and extract content from your pages. Unlike a traditional SEO technical audit, an AI crawlability audit focuses on the elements that AI retrieval systems actually read: visible page content, title tags, semantic HTML structure, and robots.txt allowances. Metadata that traditional crawlers rely on (JSON-LD, Open Graph, meta descriptions) scores near-zero on AI crawler readability tests.

Can AI crawlers read JSON-LD schema markup?

No. A Writesonic study published in March 2026 tested 62 webpage elements across 6 major AI crawlers. JSON-LD scored zero out of six for readability — none of the tested AI crawlers reliably extracted information from JSON-LD structured data. This is a significant finding because JSON-LD is the recommended implementation format for schema markup and is widely included on pages as an AEO signal. The evidence suggests it does not directly influence AI crawler behaviour.

What technical elements do AI crawlers actually read?

The Writesonic study found that the title tag was the only metadata element with strong cross-crawler readability, scoring five out of six. Visible page content — the text a human user sees in their browser — is the primary input for all AI crawlers tested. Semantic HTML structure (proper H1, H2, H3 hierarchy), descriptive anchor text, and clean paragraph formatting also showed positive readability signals. JavaScript-rendered content and metadata embedded in scripts were consistently low-scoring.

Which AI crawlers should I allow in robots.txt?

The major AI crawlers and their robots.txt user-agent strings are: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), ChatGPT-User (OpenAI browsing), anthropic-ai (Anthropic), and Google-Extended (Google AI training). Blocking these agents prevents the respective AI systems from indexing your content for potential citations. If you want your content considered for AI search citations, these agents should be allowed.

Does page speed affect AI crawlability?

Indirectly. AI crawlers typically crawl at lower rates than Googlebot, but slow or unstable server response times can cause crawlers to skip pages entirely. More critically, JavaScript-heavy pages that require rendering before content is accessible present a significant AI crawlability risk — most AI crawlers do not execute JavaScript, meaning dynamically loaded content is invisible to them.

Related tactics in the database

About the author

BE
BetterAISearch Editorial Team
BetterAISearch

The BetterAISearch team synthesises peer-reviewed studies, platform documentation, and independent research into actionable, scored tactics.