You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An AI-powered CLI tool that automatically generates comprehensive technical documentation for any public GitHub repository — including component docs, architecture overviews, Mermaid diagrams, and interactive AST visualizations.
Sent once per file chunk (4000 chars max), temperature 0.2, max_tokens 2048:
You are a senior software architect. Analyze the following code and generate
detailed technical documentation for this component. Include:
- Purpose and responsibilities
- Key classes/functions
- Inputs/outputs
- Usage examples (if possible)
- Any dependencies or integration points
Code:
{chunk}
Architecture Documentation Prompt
Single call with all component summaries concatenated, max_tokens 4096:
You are a senior software architect. Given the following component documentation,
generate:
- An overall technical documentation for the repository
- A high-level architecture description
- A detailed flow of the system
- Optionally, a mermaid diagram for the architecture
Component Documentation:
{all_component_summaries}
$ python main.py https://github.com/fastapi/fastapi
2024-01-15 10:30:00 INFO Cloning https://github.com/fastapi/fastapi...
2024-01-15 10:30:12 INFO Cloned repo to /path/to/cloned_repo
2024-01-15 10:30:12 INFO Found 156 relevant files
2024-01-15 10:30:12 INFO Processing 89 Python files for AST analysis...
Processed fastapi/main.py in 0.34 seconds
Processed fastapi/routing.py in 0.51 seconds
Processed fastapi/applications.py in 0.28 seconds
...
2024-01-15 10:30:45 INFO Building consolidated AST graph...
2024-01-15 10:30:47 INFO Saved consolidated AST to docs/CONSOLIDATED_AST.html
2024-01-15 10:30:47 INFO Generating component documentation...
Analyzing: 100%|████████████████████████████| 156/156 [03:42<00:00, 1.43s/file]
2024-01-15 10:34:29 INFO Saved component docs to docs/COMPONENTS.md
2024-01-15 10:34:29 INFO Generating architecture documentation...
2024-01-15 10:34:52 INFO Saved architecture docs to docs/ARCHITECTURE.md
2024-01-15 10:34:52 INFO Documentation complete! Output in docs/
Step 4: View Results
# Open the interactive consolidated dependency graph
open docs/CONSOLIDATED_AST.html # macOS# xdg-open docs/CONSOLIDATED_AST.html # Linux# View architecture documentation
cat docs/ARCHITECTURE.md
# View component documentation
cat docs/COMPONENTS.md
# Browse per-file AST visualizations
ls docs/ast/
Note: There is a filter conflict — .md, .json, .yaml, .yml, .sh, .txt are in EXCLUDE_FILES, but some of these have no corresponding INCLUDE_EXTS entry. This means configuration files and shell scripts are excluded from documentation.
Example .env File
# Use OpenAI GPT-4
LLM_BACKEND=openai
LLM_MODEL=gpt-4
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxx
# Or use local Ollama (default — no .env needed)# LLM_BACKEND=ollama# LLM_MODEL=llama3# OLLAMA_URL=http://localhost:11434/api/generate
🔧 Design Decisions
Why These Choices Were Made
Decision
Alternatives Considered
Rationale
Dual LLM backend vs. OpenAI-only
Single provider lock-in
Flexibility — use free local Ollama or powerful cloud GPT
NetworkX + pydot vs. D3.js, Graphviz CLI
D3 requires JS runtime; CLI less portable
Python-native graph library with SVG export via pydot
ThreadPoolExecutor(8) vs. asyncio, multiprocessing
asyncio adds complexity; multiprocessing has pickle issues
Simple, effective parallelism for I/O-bound LLM calls
4000-char chunks vs. token-based splitting
Token counting requires tiktoken dependency
Character-based is simpler, good enough approximation
ast module (stdlib) vs. tree-sitter, LibCST
External parsers add dependencies
Zero-dependency Python AST parsing; sufficient for structure extraction
sys.argv vs. argparse, click, typer
CLI frameworks add complexity for single-arg tool
Minimal — only one required argument
Module-level LLM singletons vs. dependency injection
DI adds boilerplate for a CLI tool
Simple, direct instantiation; no server lifecycle to manage
Two-pass dependency resolution vs. import graph analysis
Import analysis misses dynamic calls
Function-level call graph captures actual usage patterns
Destructive clone (rmtree + re-clone) vs. git pull
Pull can fail on dirty state
Clean slate ensures consistent analysis
Architectural Tradeoffs
Simplicity ◀──────────────────────────────▶ Completeness
This project optimizes for:
✅ Easy to run (one command) ❌ Python AST only (no other languages)
✅ Minimal dependencies ❌ No incremental updates
✅ Clear pipeline stages ❌ Re-clones on every run
✅ Dual LLM backend ❌ No caching of LLM results
✅ Visual AST output ❌ No test suite
AST Graph Construction — Two-Pass Algorithm
The consolidated dependency graph uses a two-pass approach to resolve cross-file function calls:
Pass 1 — Function Registry Pass 2 — Call Resolution
┌──────────────────────┐ ┌──────────────────────────┐
│ For each .py file: │ │ For each .py file: │
│ Parse AST │ │ Walk ast.Call nodes │
│ Walk FunctionDef │ │ Extract callee name: │
│ Register: │ │ ast.Name → direct │
│ func_name → file │ │ ast.Attribute → method │
│ │ │ Lookup in registry │
│ Output: {name: file} │ │ If found → add edge │
└──────────────────────┘ │ caller_file → callee_file│
└──────────────────────────┘
🗄 Data Contracts
LLM Client Interface
classLLMClient:
def__init__(self):
""" Reads from environment: LLM_BACKEND: "openai" | "ollama" (default: "ollama") LLM_MODEL: model name (default: "llama3") OLLAMA_URL: Ollama endpoint (default: localhost:11434) OPENAI_API_KEY: API key (default: None) """defcomplete(self, prompt: str, max_tokens: int=2048, temperature: float=0.2) ->str:
"""Returns LLM completion text, stripped of whitespace."""
File Traverser Interface
defget_relevant_files(root_dir: str) ->list[str]:
"""Returns absolute paths of files matching inclusion filters."""defbuild_ast_graph(file_path: str) ->networkx.DiGraph:
"""Parses Python file → AST → NetworkX directed graph with labeled nodes."""defbuild_consolidated_ast_graph(file_paths: list[str]) ->networkx.DiGraph:
"""Two-pass cross-file dependency graph. Nodes = files, edges = function calls."""defextract_function_docs(file_path: str) ->str:
"""Returns markdown with function signatures, args, and docstrings."""
Output File Contracts
File
Format
Content
docs/ARCHITECTURE.md
Markdown
Architecture overview, system flow, optional Mermaid diagram
docs/COMPONENTS.md
Markdown
Per-file documentation with ## filename headers
docs/CONSOLIDATED_AST.html
HTML + embedded SVG
Interactive cross-file dependency visualization
docs/ast/{name}.ast.html
HTML + embedded SVG
Per-file AST tree visualization
docs/ast/{name}.md
Markdown
Function signatures and docstrings
📦 Dependencies
Package
Purpose
Used In
gitpython
Clone GitHub repositories via git.Repo.clone_from()
repo_cloner.py
openai
OpenAI API client for GPT chat completions
llm_client.py
python-dotenv
Load .env files into os.environ
llm_client.py
tqdm
Progress bars for parallel file processing
doc_generator.py
requests
HTTP client for Ollama REST API
llm_client.py
networkx
Directed graph data structures for AST and dependency graphs
file_traverser.py
matplotlib
Required by NetworkX for rendering support
file_traverser.py
pydot
Convert NetworkX graphs to DOT format → SVG via Graphviz