Centralized, AI-readable documentation extracted from 598+ frameworks, libraries, and developer tools. Automated extraction tools keep documentation current with upstream sources.
llm-code-docs/
├── docs/
│ ├── llms-txt/ # 339 sites following llms.txt standard (HIGHEST PRIORITY)
│ ├── github-scraped/ # 136 Git repository extractions
│ ├── web-scraped/ # 122 web-scraped documentation sources
│ └── github-repos/ # Individual GitHub repo docs
├── scripts/ # All extraction and update tools
├── AGENTS.md # Guide for AI agents using these docs
├── CLAUDE.md # AI assistant instructions
├── index.yaml # Index of all documentation sources
└── README.md # This file
See AGENTS.md for detailed guidance on finding and using documentation in this repository.
339 sites following the llms.txt standard - optimized for LLM consumption.
Notable sources include:
- AI/LLM: Anthropic, OpenAI, Vercel AI SDK, LangChain, Ollama
- Web Frameworks: Next.js, React, Vue, Astro, Remix, SvelteKit
- Python: FastAPI, Pydantic, Streamlit, Gradio
- JavaScript: Bun, Deno, Vite, Vitest, Zod
- Databases: Supabase, PlanetScale, Turso, Neon
- Infrastructure: Cloudflare, Vercel, Fly.io, Railway
136 repositories cloned and extracted for comprehensive documentation, including:
| Category | Examples |
|---|---|
| AI/ML | vLLM, TensorRT-LLM, Whisper, Stable Diffusion, RAGFlow, FAISS |
| Python | FastAPI, Flask, Celery, Gunicorn, HTTPX, Matplotlib |
| JavaScript | ESLint, Jest, Express, Electron, Mermaid, XtermJS |
| Go | Go docs, gopls, golangci-lint, Delve, govulncheck |
| DevOps | Caddy, Trivy, Steampipe, SearXNG, WasmEdge |
| Language Servers | Neovim, nvim-lspconfig, pygls, vscode-languageserver |
122 sources scraped from documentation sites without llms.txt support, including:
- Cloud APIs: AWS SDK, Google Cloud, Azure IoT, Datadog, Sentry
- UI Libraries: Emotion, Formik, Storybook, React Flow, Excalidraw
- Dev Tools: DBeaver, Dependabot, Semgrep, Percy, Chromatic
- AI/ML: GPT4All, Lepton AI, Ultralytics YOLOv8, Magenta
./scripts/update.sh# Update all llms.txt sites (339 sites in parallel)
python3 scripts/llms-txt-scraper.py
# Update single site
python3 scripts/llms-txt-scraper.py --site anthropic
# Update Git repository extractions
python3 scripts/extract_docs.py
# Update Claude Code SDK docs
python3 scripts/claude-code-sdk-docs.py-
Edit
scripts/llms-sites.yaml:- name: new-site base_url: https://example.com/ description: Site description
-
Download:
python3 scripts/llms-txt-scraper.py --site new-site
Central registry of all llms.txt-compliant documentation sources. Each entry specifies:
name- Unique identifier and output folder namebase_url- URL where llms.txt is locateddescription- Brief description of the documentationrate_limit_seconds(optional) - Delay between requests
Configuration for Git-based documentation extraction:
repo_url- GitHub repository URLsource_folder- Path to documentation within repotarget_folder- Output path underdocs/github-scraped/branch- Branch to clone (default: main/master)
- Smart Caching: 23-hour freshness window avoids redundant downloads
- Parallel Downloads: 15 concurrent workers for fast bulk updates
- Source Headers: Each file includes source URL for traceability
- Error Resilience: Individual failures don't stop bulk operations
- 339 llms.txt documentation sites
- 136 Git repository extractions
- 122 web-scraped documentation sources
- 43,000+ markdown/RST files
- 5.4GB total documentation
- Check if the site has llms.txt support (visit
{docs-url}/llms.txt) - Edit
scripts/llms-sites.yamlwith the new entry - Run
python3 scripts/llms-txt-scraper.py --site new-site - Verify extraction:
ls -lh docs/llms-txt/new-site/
- Edit
scripts/repo_config.yamlwith repo details - Run
python3 scripts/extract_docs.py
Check index.yaml under not_yet_fetched for libraries we've identified but haven't extracted.
Priority order:
- llms.txt - Highest quality, official AI-optimized format
- Git repos - Comprehensive but requires custom configuration
- Web scraping - Last resort for critical documentation
Maintained for AI-assisted development across multiple frameworks and tools.