Problem
The current bundle-based approach in generate-llms-full.js relies on path filters to group content (e.g., p => p.startsWith('b2b/')). As pages move, are added, or are removed, these filters can silently drift — bundles may include stale paths or miss new content entirely.
Observed pattern in the ecosystem
Other Adobe doc properties serve a .md variant at the same path as each HTML page:
- EXL:
https://experienceleague.adobe.com/en/docs/commerce-on-cloud/user-guide/overview.md
- developer.adobe.com:
https://developer.adobe.com/commerce/extensibility/starter-kit/integration/create-integration.md
This makes every page self-describing and machine-readable without any explicit filter maintenance.
Proposal
Generate a .md (or .txt) file for every materialized page at its corresponding path. This would:
- Eliminate filter drift entirely — the set of AI-readable files stays in sync with the site automatically
- Make individual pages citable and retrievable by AI without loading large bundles
- Follow an established pattern in the Adobe ecosystem
Tradeoffs to consider
Bundles still have value — they encode our best guess at semantic groupings (e.g., the b2b bundle collects all B2B dropin content). A flat per-page model loses that structure.
Context size is a real constraint — LLMs read files verbatim up to a certain size; beyond that, they summarize. Bundles that are too large already suffer from this. Per-page files would be small by nature, which is actually an advantage for verbatim comprehension.
A hybrid approach may be the right answer: per-page .md files for granular retrieval + smaller, curated bundles for semantic groupings — where bundles are assembled by topic rather than by path prefix, reducing the risk of drift.
Related
- #908 — AI-generated blurbs for llms.txt bundles
Problem
The current bundle-based approach in
generate-llms-full.jsrelies on path filters to group content (e.g.,p => p.startsWith('b2b/')). As pages move, are added, or are removed, these filters can silently drift — bundles may include stale paths or miss new content entirely.Observed pattern in the ecosystem
Other Adobe doc properties serve a
.mdvariant at the same path as each HTML page:https://experienceleague.adobe.com/en/docs/commerce-on-cloud/user-guide/overview.mdhttps://developer.adobe.com/commerce/extensibility/starter-kit/integration/create-integration.mdThis makes every page self-describing and machine-readable without any explicit filter maintenance.
Proposal
Generate a
.md(or.txt) file for every materialized page at its corresponding path. This would:Tradeoffs to consider
Bundles still have value — they encode our best guess at semantic groupings (e.g., the b2b bundle collects all B2B dropin content). A flat per-page model loses that structure.
Context size is a real constraint — LLMs read files verbatim up to a certain size; beyond that, they summarize. Bundles that are too large already suffer from this. Per-page files would be small by nature, which is actually an advantage for verbatim comprehension.
A hybrid approach may be the right answer: per-page
.mdfiles for granular retrieval + smaller, curated bundles for semantic groupings — where bundles are assembled by topic rather than by path prefix, reducing the risk of drift.Related