Skip to content

Explore per-page .md file generation alongside bundle-based llms.txt #909

@sirugh

Description

@sirugh

Problem

The current bundle-based approach in generate-llms-full.js relies on path filters to group content (e.g., p => p.startsWith('b2b/')). As pages move, are added, or are removed, these filters can silently drift — bundles may include stale paths or miss new content entirely.

Observed pattern in the ecosystem

Other Adobe doc properties serve a .md variant at the same path as each HTML page:

  • EXL: https://experienceleague.adobe.com/en/docs/commerce-on-cloud/user-guide/overview.md
  • developer.adobe.com: https://developer.adobe.com/commerce/extensibility/starter-kit/integration/create-integration.md

This makes every page self-describing and machine-readable without any explicit filter maintenance.

Proposal

Generate a .md (or .txt) file for every materialized page at its corresponding path. This would:

  • Eliminate filter drift entirely — the set of AI-readable files stays in sync with the site automatically
  • Make individual pages citable and retrievable by AI without loading large bundles
  • Follow an established pattern in the Adobe ecosystem

Tradeoffs to consider

Bundles still have value — they encode our best guess at semantic groupings (e.g., the b2b bundle collects all B2B dropin content). A flat per-page model loses that structure.

Context size is a real constraint — LLMs read files verbatim up to a certain size; beyond that, they summarize. Bundles that are too large already suffer from this. Per-page files would be small by nature, which is actually an advantage for verbatim comprehension.

A hybrid approach may be the right answer: per-page .md files for granular retrieval + smaller, curated bundles for semantic groupings — where bundles are assembled by topic rather than by path prefix, reducing the risk of drift.

Related

  • #908 — AI-generated blurbs for llms.txt bundles

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions