docs: add robots.txt and llms.txt for docs site discoverability by eccogrinder · Pull Request #1425 · gnosisguild/enclave

eccogrinder · 2026-03-13T21:05:13Z

Purpose

This reflects the ongoing rename from Enclave → The Interfold.

Adds robots.txt and llms.txt to improve discoverability and AI retrieval of the Interfold documentation.

robots.txt allows search indexing and AI retrieval while preserving ai-train=no.
llms.txt provides a canonical description and entry points for LLM retrieval systems.

Files added

/public/robots.txt
/public/llms.txt

Follow-up (separate PR)

Add automatic sitemap generation for the docs using next-sitemap.

Proposed approach:

pnpm add next-sitemap
create next-sitemap.config.js
add "postbuild": "next-sitemap" to scripts
deploy → https://docs.theinterfold.com/sitemap.xml generated automatically

Summary by CodeRabbit

Documentation
- Introduced comprehensive documentation on the Interfold concept, outlining confidential coordination goals, approaches for achieving shared verifiable outcomes across independent parties, and additional resources.
- Added search engine configuration directives to manage content indexing behavior and prevent unauthorized use for AI model training.

vercel · 2026-03-13T21:05:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
crisp	Ready	Preview, Comment	Mar 20, 2026 6:23am
enclave-docs	Ready	Preview, Comment	Mar 20, 2026 6:23am

coderabbitai · 2026-03-13T21:05:31Z

📝 Walkthrough

Walkthrough

Two new public documentation files are added: one describing The Interfold concept of confidential coordination with its goals and resources, and one providing robots.txt directives for search engine and AI training policies.

Changes

Cohort / File(s)	Summary
Public Documentation `docs/public/llms.txt`, `docs/public/robots.txt`	Adds documentation about The Interfold concept and configuration for search engine crawling policies, including AI training restrictions and sitemap directives.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A rabbit hops through docs so new,
With llms.txt and robots too!
Search engines learn where not to peek,
While Interfold's secrets stay unique! 🐰✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'docs: add robots.txt and llms.txt for docs site discoverability' directly and clearly summarizes the main change—adding two public configuration files to improve site discovery.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch docs-robots-llms-discovery

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can suggest fixes for GitHub Check annotations.

Configure the reviews.tools.github-checks setting to adjust the time to wait for GitHub Checks to complete.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/public/robots.txt`:
- Line 5: The robots.txt currently includes a Sitemap:
https://docs.theinterfold.com/sitemap.xml entry before the sitemap is generated;
remove or comment out that Sitemap line in the public robots.txt (or
conditionally add it only when the sitemap.html/xml artifact is produced by the
build) so the deployed robots.txt does not reference a non-existent
/sitemap.xml; locate the Sitemap line in the docs/public/robots.txt and either
delete it or gate its addition into the CI/build that emits sitemap.xml.
- Line 2: Replace the Cloudflare-specific "Content-Signal:
search=yes,ai-train=no" line with explicit robots.txt user-agent sections:
remove the Content-Signal header and add per-crawler directives (e.g.,
User-agent: GPTBot / Anthropic crawlers / PerplexityBot with Disallow: / to
block AI crawlers, and User-agent: Googlebot / bingbot with appropriate
Allow/Disallow rules to permit search indexing). Edit the robots.txt so it
contains explicit User-agent and Disallow/Allow blocks for each crawler instead
of the Content-Signal header.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3d8b1737-914f-4e05-9f15-55f375baa61d

📥 Commits

Reviewing files that changed from the base of the PR and between 1cd8ae0 and bc39cb5.

📒 Files selected for processing (2)

docs/public/llms.txt
docs/public/robots.txt

coderabbitai · 2026-03-13T21:09:37Z

docs/public/robots.txt

@@ -0,0 +1,5 @@
+User-agent: *
+Content-Signal: search=yes,ai-train=no


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Is Content-Signal: search=yes,ai-train=no in robots.txt currently supported by major crawlers (Googlebot, Bingbot, Common Crawl, GPTBot, ClaudeBot, PerplexityBot)? Please provide official documentation links and recommended alternatives if unsupported.

💡 Result:

Is Content-Signal: search=yes,ai-train=no supported by major crawlers?

No—there is currently no official documentation from Googlebot, Bingbot, Common Crawl (CCBot), OpenAI (GPTBot), Anthropic (Claude bots), or Perplexity (PerplexityBot) stating they parse/obey the non-standard Content-Signal: robots.txt line. It’s a Cloudflare-defined extension/policy for expressing preferences, not part of what these crawlers document as supported robots.txt directives. (blog.cloudflare.com)

What happens in practice (per official docs)

Googlebot: Google documents support for only user-agent, allow, disallow, sitemap in robots.txt; other fields are not supported (so Content-Signal is not a supported directive). (developers.google.com)

Bingbot: Bing documents that it follows REP directives in robots.txt (no mention of Content-Signal). (blogs.bing.com)

CCBot (Common Crawl): Common Crawl documents blocking via standard User-agent: CCBot + Disallow. No mention of Content-Signal. (commoncrawl.org)

GPTBot (OpenAI): OpenAI documents control via GPTBot / OAI-SearchBot robots.txt tags. No mention of Content-Signal. (openai.com)

Claude bots (Anthropic): Anthropic documents opting out by disallowing their user-agents in robots.txt. No mention of Content-Signal. (support.anthropic.com)

PerplexityBot: Perplexity documents control via PerplexityBot robots.txt tags. No mention of Content-Signal. (docs.perplexity.ai)

Official documentation links (copy/paste)

Cloudflare Content Signals Policy (introduces Content-Signal): https://blog.cloudflare.com/content-signals-policy/ https://developers.cloudflare.com/bots/additional-configurations/managed-robots-txt/ Google robots.txt spec / supported fields: https://developers.google.com/crawling/docs/robots-txt/robots-txt-spec Bing on robots.txt / REP behavior: https://blogs.bing.com/webmaster/August-2009/Crawl-delay-and-the-Bing-crawler%2C-MSNBot https://blogs.bing.com/webmaster/May-2012/To-crawl-or-not-to-crawl%2C-that-is-BingBot-s-questi Common Crawl (CCBot) blocking: https://commoncrawl.org/ccbot OpenAI (GPTBot / OAI-SearchBot / ChatGPT-User): https://openai.com/gptbot Anthropic (how to block Claude crawlers via robots.txt): https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler Perplexity crawlers / robots.txt tags: https://docs.perplexity.ai/docs/resources/perplexity-crawlers

Recommended alternatives (when Content-Signal is unsupported)

Use standard robots.txt allow/disallow rules per crawler user-agent

Google indexing control: User-agent: Googlebot + Allow/Disallow (standard REP). (developers.google.com)

Bing indexing control: User-agent: bingbot + Allow/Disallow (standard REP). (blogs.bing.com)

Common Crawl opt-out: User-agent: CCBot + Disallow: /. (commoncrawl.org)

For AI-specific crawlers, disallow their documented user-agents

OpenAI training crawler: disallow GPTBot (and separately manage OAI-SearchBot if you want search visibility without training). (openai.com)

Anthropic: disallow the relevant Anthropic bot user-agent(s) per their help doc. (support.anthropic.com)

Perplexity: manage PerplexityBot via robots.txt per their docs. (docs.perplexity.ai)

Enforce with infrastructure controls (recommended if you need real control)

Cloudflare explicitly notes content signals are preferences and recommends combining them with WAF/Bot Management. (blog.cloudflare.com)

Citations:

1: https://blog.cloudflare.com/content-signals-policy/

2: https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt

3: https://blogs.bing.com/webmaster/August-2009/Crawl-delay-and-the-Bing-crawler%2C-MSNBot?utm_source=openai

4: https://commoncrawl.org/ccbot

5: https://openai.com/gptbot

6: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler?utm_source=openai

7: https://docs.perplexity.ai/docs/resources/perplexity-crawlers

8: https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt

9: https://blogs.bing.com/webmaster/August-2009/Crawl-delay-and-the-Bing-crawler%2C-MSNBot?utm_source=openai

10: https://commoncrawl.org/ccbot

11: https://openai.com/gptbot

12: https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler?utm_source=openai

13: https://docs.perplexity.ai/docs/resources/perplexity-crawlers

14: https://blog.cloudflare.com/content-signals-policy/

Use standard robots.txt directives instead of Content-Signal

Line 2's Content-Signal: search=yes,ai-train=no is a Cloudflare-specific extension not supported by any major crawler (Google, Bing, Common Crawl, GPTBot, Claude, PerplexityBot). The ai-train=no intent will not be enforced. Use standard robots.txt rules with per-crawler user-agent directives:

Disallow AI crawlers: User-agent: GPTBot, User-agent: Anthropic crawlers, User-agent: PerplexityBot + Disallow: /

Allow search indexing: User-agent: Googlebot, User-agent: bingbot + standard allow/disallow rules

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/public/robots.txt` at line 2, Replace the Cloudflare-specific "Content-Signal: search=yes,ai-train=no" line with explicit robots.txt user-agent sections: remove the Content-Signal header and add per-crawler directives (e.g., User-agent: GPTBot / Anthropic crawlers / PerplexityBot with Disallow: / to block AI crawlers, and User-agent: Googlebot / bingbot with appropriate Allow/Disallow rules to permit search indexing). Edit the robots.txt so it contains explicit User-agent and Disallow/Allow blocks for each crawler instead of the Content-Signal header.

coderabbitai · 2026-03-13T21:09:37Z

docs/public/robots.txt

+Content-Signal: search=yes,ai-train=no
+Allow: /
+
+Sitemap: https://docs.theinterfold.com/sitemap.xml


⚠️ Potential issue | 🟡 Minor

Sitemap URL may be premature without generated file

Line 5 points to /sitemap.xml, but PR objectives say sitemap automation lands in a follow-up PR. If that file is not present at deploy time, this is a temporary broken discovery hint.

Suggested adjustment

- Sitemap: https://docs.theinterfold.com/sitemap.xml + # Add after sitemap.xml is generated/deployed + # Sitemap: https://docs.theinterfold.com/sitemap.xml

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Sitemap: https://docs.theinterfold.com/sitemap.xml

# Add after sitemap.xml is generated/deployed

# Sitemap: https://docs.theinterfold.com/sitemap.xml

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/public/robots.txt` at line 5, The robots.txt currently includes a Sitemap: https://docs.theinterfold.com/sitemap.xml entry before the sitemap is generated; remove or comment out that Sitemap line in the public robots.txt (or conditionally add it only when the sitemap.html/xml artifact is produced by the build) so the deployed robots.txt does not reference a non-existent /sitemap.xml; locate the Sitemap line in the docs/public/robots.txt and either delete it or gate its addition into the CI/build that emits sitemap.xml.

Add files via upload

bc39cb5

eccogrinder added the documentation Improvements or additions to documentation label Mar 13, 2026

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

Merge branch 'main' into docs-robots-llms-discovery

d0b1ef2

vercel bot deployed to Preview – enclave-docs March 20, 2026 06:22 View deployment

vercel bot deployed to Preview – crisp March 20, 2026 06:23 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add robots.txt and llms.txt for docs site discoverability#1425

docs: add robots.txt and llms.txt for docs site discoverability#1425
eccogrinder wants to merge 2 commits intomainfrom
docs-robots-llms-discovery

eccogrinder commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,5 @@
		User-agent: *
		Content-Signal: search=yes,ai-train=no

	Sitemap: https://docs.theinterfold.com/sitemap.xml
	# Add after sitemap.xml is generated/deployed
	# Sitemap: https://docs.theinterfold.com/sitemap.xml

Conversation

eccogrinder commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Files added

Follow-up (separate PR)

Summary by CodeRabbit

Uh oh!

vercel bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Is Content-Signal: search=yes,ai-train=no supported by major crawlers?

What happens in practice (per official docs)

Official documentation links (copy/paste)

Recommended alternatives (when Content-Signal is unsupported)

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eccogrinder commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

vercel bot commented Mar 13, 2026 •

edited

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Is `Content-Signal: search=yes,ai-train=no` supported by major crawlers?

Recommended alternatives (when `Content-Signal` is unsupported)