Semantic HTML for AI: 12-Item Checklist

WHY SEMANTIC HTML MATTERS FOR AI.

AI fetchers are aggressive at extracting content but conservative about ambiguity. When a page has clean semantic structure (proper <article>, <section>, descriptive headings), the bot identifies the body, the navigation, the metadata reliably. When a page is <div> soup, the bot guesses, sometimes incorrectly, and the citation goes to a more confidently-extractable competitor.

The 12-item checklist below covers the items that move citation rate. It is not the full HTML semantics spec; it is the working subset that matters for AI extraction.

THE CHECKLIST.

Twelve items in priority order:

1. One <h1> per page. Reflects the page's primary topic. Not a logo. Not a section heading.
2. Heading hierarchy is monotonic. H1 -> H2 -> H3. No skipping levels. No styling-driven swaps (e.g., <h2> used for visual size with no logical role).
3. Body content in <article> or <main>. Not in <div class="content">. The semantic tag is the hint to the parser.
4. Sections in <section>. Each section has its own heading. Sections with no heading are aside content, not sections.
5. Navigation in <nav>. So the parser can ignore it for content extraction.
6. Header / footer in <header> / <footer>. Same reason.
7. Each page has a unique <meta name="description">. Not the site default duplicated everywhere.
8. Open Graph completeness. Per article N. 04 plus og:type matching the page (article, website, product).
9. Images have alt attributes. Descriptive, not "image123.jpg".
10. Links have descriptive anchor text. Not "click here". The anchor is a signal of what the destination is about.
11. Lists in <ul> / <ol>. Not <br>-separated paragraphs.
12. Code blocks in <pre><code>. Inline code in <code>. Helps AI distinguish prose from technical content.

Finding 01.

Heading specifics

HEADING HIERARCHY.

Most sites get headings wrong in one of two ways: too many H1s (one per visual block, treated as styling) or skipped levels (H1 jumps to H4 because of designer-driven sizes).

AI extraction relies on the heading hierarchy to identify the document's logical structure. One H1 = the page is about this. H2s = main sections. H3s = subsections. Non-monotonic hierarchies confuse the parser; some bots fall back to character-count heuristics, which usually misidentify sidebars or related-content blocks as primary content.

If your design system uses heading sizes inconsistently, decouple visual size from semantic level via CSS. h2 { font-size: 28px; } for one section, h2.bigger { font-size: 36px; } for another. The semantic level stays correct; the visual hierarchy is purely a styling choice.

THE FRAMEWORK DEFAULTS THAT BREAK THIS.

Common framework defaults that produce non-semantic HTML by accident:

Component libraries that wrap everything in <div> (e.g., default Material-UI, default Bootstrap layouts).
Page builders (Elementor, Divi) that emit <section class="section"> with no heading - a sign of style-driven section labels.
Headless CMS templates that flatten Markdown body content into <div> instead of <article>.
Marketing-page generators (some Webflow templates, some Framer setups) that produce nested <div> trees with no semantic tags at all.

BEFORE / AFTER EXAMPLE.

Before: <div class="main"><div class="hero"><div class="title">Welcome</div><div class="text">Long body...</div></div></div>

After: <main><article><header><h1>Welcome</h1></header><p>Long body...</p></article></main>

Same visual result with appropriate CSS. Massively different parser output. AI fetchers extract the second clean; the first they sometimes get right, sometimes attribute the title to a navigation block.

THE BOTTOM LINE.

Twelve checklist items, four to eight hours of refactoring on a typical site, junior-developer skill level. Pays back in extraction reliability across all four AI surfaces. Run the checklist against your top 20 most-trafficked pages first. Most templates fix in batch once you find the defect pattern.

SEMANTIC HTML
FOR AI.

WHY SEMANTIC HTML MATTERS FOR AI.

THE CHECKLIST.

HEADING HIERARCHY.

THE FRAMEWORK DEFAULTS THAT BREAK THIS.

BEFORE / AFTER EXAMPLE.

THE BOTTOM LINE.

MEASURE THE LEVERS
THAT ACTUALLY EXIST.

WHY SEMANTIC HTML MATTERS FOR AI.

THE CHECKLIST.

HEADING HIERARCHY.

THE FRAMEWORK DEFAULTS THAT BREAK THIS.

BEFORE / AFTER EXAMPLE.

THE BOTTOM LINE.

MEASURE THE LEVERSTHAT ACTUALLY EXIST.

KEEP READING.

Do AI Bots Render JavaScript?

The Answer-First Paragraph Pattern

JS to SSR Migration for AI Visibility

Schema Markup That AI Models Actually Use

MEASURE THE LEVERS
THAT ACTUALLY EXIST.