Skip to content
This repository was archived by the owner on Feb 14, 2026. It is now read-only.

Commit 283456c

Browse files
v0.9.0: Consolidate into .refdocs/ directory, replace search with manifest
Move config, manifest, and downloaded docs into a single .refdocs/ folder (.refdocs/config.json, .refdocs/manifest.json, .refdocs/docs/) following the .git/ convention. Remove search indexer, chunker, and eval harness — replaced by lightweight manifest-based discovery. Default download path changes from ref-docs/ to docs/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 97feb0f commit 283456c

30 files changed

Lines changed: 833 additions & 4572 deletions

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
node_modules/
22
dist/
3-
.refdocs-index.json
3+
.refdocs/
44
refdocs
55
*.tsbuildinfo
6-
ref-docs/

CLAUDE.md

Lines changed: 74 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
11
# refdocs
22

3-
A local CLI tool that indexes markdown documentation and exposes fast fuzzy search with intelligent chunking. Designed to give LLM coding agents efficient, token-conscious access to project documentation without MCP servers, network calls, or full-file context dumps.
3+
A local CLI tool that fetches, organizes, and catalogs markdown documentation. Generates a compact manifest that gives LLM coding agents efficient, token-conscious access to project documentation without MCP servers, network calls, or full-file context dumps.
44

55
## Architecture
66

77
```
88
refdocs/
99
├── src/
1010
│ ├── index.ts # CLI entrypoint (commander)
11-
│ ├── indexer.ts # Walks target dir, chunks md files, builds search index
12-
│ ├── chunker.ts # Splits markdown by heading hierarchy into right-sized chunks
13-
│ ├── search.ts # MiniSearch wrapper, query + rank + format results
14-
│ ├── config.ts # Reads/writes .refdocs.json config
15-
│ ├── github.ts # GitHub URL parsing + tarball download
16-
│ ├── add.ts # Orchestration for `refdocs add` (download, extract, config update)
17-
│ └── types.ts # Shared TypeScript interfaces
18-
├── .refdocs.json # Example config
11+
│ ├── manifest.ts # Walks target dirs, extracts headings/summaries, builds manifest
12+
│ ├── config.ts # Reads/writes .refdocs/config.json
13+
│ ├── github.ts # GitHub URL parsing + tarball download
14+
│ ├── add.ts # Orchestration for `refdocs add` (download, extract, config update)
15+
│ └── types.ts # Shared TypeScript interfaces
16+
├── .refdocs/
17+
│ ├── config.json # Project config
18+
│ ├── manifest.json # Generated manifest
19+
│ └── docs/ # Downloaded docs
1920
├── package.json
2021
├── tsconfig.json
2122
└── README.md
@@ -25,185 +26,133 @@ refdocs/
2526

2627
- **Runtime**: Node/Bun (target `bun build --compile` for single binary)
2728
- **Language**: TypeScript, strict mode
28-
- **Search engine**: MiniSearch — pure JS, ~7kb, fuzzy matching, field boosting, prefix search
2929
- **CLI framework**: Commander
30-
- **Markdown parsing**: markdown-it or remark for heading extraction (evaluate which is lighter)
31-
- **Zero external services** — no network calls, no API keys, everything local
30+
- **Zero external services** — no network calls at runtime, no API keys, everything local
3231

3332
## Config
3433

35-
`.refdocs.json` at project root:
34+
`.refdocs/config.json` at project root:
3635

3736
```json
3837
{
39-
"paths": ["ref-docs"],
40-
"index": ".refdocs-index.json",
41-
"chunkMaxTokens": 800,
42-
"chunkMinTokens": 100,
43-
"boostFields": {
44-
"title": 2,
45-
"headings": 1.5,
46-
"body": 1
47-
}
38+
"paths": ["docs"],
39+
"manifest": "manifest.json"
4840
}
4941
```
5042

51-
- `paths` — array of directories to index (relative to project root)
52-
- `index` — where to persist the serialized search index (gitignored)
53-
- `chunkMaxTokens` — upper bound for chunk size, rough estimate (chars / 4)
54-
- `chunkMinTokens` — minimum chunk size; merge small sections with their parent
55-
- `boostFields` — field relevance weights for search ranking
43+
- `paths` — array of directories to catalog (relative to `.refdocs/`)
44+
- `manifest` — where to persist the generated manifest (relative to `.refdocs/`)
5645
- `sources` — (managed by `refdocs add`) tracks GitHub repos added for future updates
5746

58-
## CLI Commands
47+
## Manifest
5948

60-
### `refdocs init`
49+
The manifest is a compact JSON file that summarizes all documented files. It replaces the old search index with a lightweight catalog that LLM agents can read directly.
6150

62-
Create a `.refdocs.json` config file with full defaults. Errors if the file already exists. Also auto-runs when `refdocs add` is called without an existing config.
51+
`.refdocs/manifest.json` structure:
6352

64-
### `refdocs index`
53+
```json
54+
{
55+
"generated": "2025-01-01T00:00:00.000Z",
56+
"sources": 1,
57+
"files": 12,
58+
"entries": [
59+
{
60+
"file": "docs/owner/repo/guide.md",
61+
"headings": ["Guide", "Installation", "Configuration"],
62+
"lines": 85,
63+
"summary": "Getting started with the project."
64+
}
65+
]
66+
}
67+
```
6568

66-
Walk all configured paths, chunk every `.md` file, build and persist the MiniSearch index.
69+
Each entry contains:
70+
- `file` — relative path to the markdown file
71+
- `headings` — h1-h3 headings extracted from the content
72+
- `lines` — total line count
73+
- `summary` — frontmatter description or first paragraph
6774

68-
- Parse each markdown file into chunks split by heading boundaries (h1 > h2 > h3)
69-
- Each chunk gets metadata: `{ id, file, title, headings, body, startLine, endLine }`
70-
- Small sections (below `chunkMinTokens`) merge into their parent heading's chunk
71-
- Large sections (above `chunkMaxTokens`) split at paragraph boundaries
72-
- Serialize index to `.refdocs-index.json`
73-
- Print summary: files indexed, chunks created, index size
75+
Target: entire manifest for 50 files should be ~500-800 tokens.
7476

75-
### `refdocs search <query>`
77+
## CLI Commands
7678

77-
Fuzzy search the index and return the top chunks.
79+
### `refdocs init`
7880

79-
- Load persisted index (error if not built yet)
80-
- Run MiniSearch with fuzzy matching (fuzzy: 0.2), prefix search enabled
81-
- Return top 3 results by default
82-
- Output format: each chunk preceded by a comment with source file and line range
81+
Create a `.refdocs/config.json` config file with full defaults. Errors if the file already exists. Also auto-runs when `refdocs add` is called without an existing config.
8382

84-
**Flags:**
85-
- `-n, --results <count>` — number of results (default: 3, max: 10)
86-
- `-f, --file <pattern>` — filter results to files matching glob
87-
- `--json` — output results as JSON array instead of formatted text
88-
- `--raw` — output chunk body only, no metadata header (for piping)
83+
### `refdocs manifest`
84+
85+
Walk all configured paths, extract headings and summaries from every markdown file, and generate the manifest.
86+
87+
- Parse each markdown file for h1-h3 headings via regex
88+
- Extract frontmatter `description` or first paragraph as summary
89+
- Count lines per file
90+
- Write to `.refdocs/manifest.json`
91+
- Print summary: files cataloged, sources tracked
8992

9093
### `refdocs add <source>`
9194

9295
Add a local path or download markdown docs from a GitHub repository.
9396

94-
- If source is a URL (`http://` or `https://`), download from GitHub as before
97+
- If source is a URL (`http://` or `https://`), download from GitHub
9598
- If source is a local path, verify it exists with `.md` files and add to `paths`
96-
- Update `.refdocs.json`: add path to `paths`, track source in `sources` (GitHub only)
97-
- Auto re-index unless `--no-index` is passed
99+
- Update `.refdocs/config.json`: add path to `paths`, track source in `sources` (GitHub only)
100+
- Auto regenerate manifest unless `--no-manifest` is passed
98101

99102
**Flags:**
100-
- `--path <dir>` — override local storage directory (default: `ref-docs/{repo}`, GitHub only)
103+
- `--path <dir>` — override local storage directory (default: `docs/{repo}`, GitHub only)
101104
- `--branch <branch>` — override branch detection from URL (GitHub only)
102-
- `--no-index` — skip auto re-indexing after adding
105+
- `--no-manifest` — skip auto manifest generation after adding
103106

104107
Auth via `GITHUB_TOKEN` env var for private repos.
105108

106109
### `refdocs remove <path>`
107110

108-
Remove a path from the index configuration.
111+
Remove a path from the configuration.
109112

110-
- Remove path from `paths` in `.refdocs.json`
113+
- Remove path from `paths` in `.refdocs/config.json`
111114
- If path has an associated source, remove from `sources` too
112-
- Auto re-index unless `--no-index` is passed
115+
- Auto regenerate manifest unless `--no-manifest` is passed
113116
- Does not delete files on disk
114117

115118
**Flags:**
116-
- `--no-index` — skip auto re-indexing after removal
119+
- `--no-manifest` — skip auto manifest generation after removal
117120

118121
### `refdocs list`
119122

120-
List all indexed files and their chunk counts. Useful for verifying what's in the index.
121-
122-
### `refdocs info <file>`
123-
124-
Show all chunks for a specific file with their headings and token estimates.
123+
List all documented files and their heading counts. Loads from manifest if available, otherwise scans filesystem directly.
125124

126125
### `refdocs update`
127126

128-
Re-pull all tracked sources from GitHub and re-index.
127+
Re-pull all tracked sources from GitHub and regenerate manifest.
129128

130-
- Iterates over `sources` in `.refdocs.json`
129+
- Iterates over `sources` in `.refdocs/config.json`
131130
- Downloads each repo tarball and extracts `.md` files, overwriting local copies
132-
- Auto re-index unless `--no-index` is passed
131+
- Auto regenerate manifest unless `--no-manifest` is passed
133132

134133
**Flags:**
135-
- `--no-index` — skip auto re-indexing after update
136-
137-
## Chunking Strategy
138-
139-
This is the core value of the tool. Chunks must be:
140-
141-
1. **Semantically coherent** — never split mid-section. Heading boundaries are the primary split points.
142-
2. **Right-sized for LLM context** — 100-800 tokens. Big enough to be useful, small enough to not waste context.
143-
3. **Hierarchical** — each chunk carries its full heading breadcrumb (e.g. `Configuration > Database > Connections`) so the LLM understands where the chunk fits.
144-
145-
Algorithm:
146-
1. Parse markdown into AST
147-
2. Walk AST and split at heading nodes (h1, h2, h3)
148-
3. Each section becomes a candidate chunk with its heading breadcrumb
149-
4. If chunk < minTokens, merge with previous sibling or parent
150-
5. If chunk > maxTokens, split at paragraph boundaries (double newline)
151-
6. Attach metadata: source file path, line range, heading trail
152-
153-
## Output Format
154-
155-
Default output for `refdocs search "data transformers"`:
156-
157-
```
158-
# [1] spatie-laravel-data/transformers.md:15-48
159-
# Transformers > Built-in Transformers
160-
161-
Transformers are used to convert data properties when...
162-
<chunk body here>
163-
164-
---
165-
166-
# [2] spatie-laravel-data/creating-data-objects.md:72-95
167-
# Creating Data Objects > Casting and Transforming
168-
169-
When creating a data object from a request...
170-
<chunk body here>
171-
```
172-
173-
JSON output (`--json`) returns:
174-
175-
```json
176-
[
177-
{
178-
"score": 12.45,
179-
"file": "spatie-laravel-data/transformers.md",
180-
"lines": [15, 48],
181-
"headings": ["Transformers", "Built-in Transformers"],
182-
"body": "..."
183-
}
184-
]
185-
```
134+
- `--no-manifest` — skip auto manifest generation after update
186135

187136
## Design Principles
188137

189138
- **No runtime dependencies beyond the binary** — everything bundles into one file
190-
- **Fast**indexing a typical ref-docs folder (50 files) should take <1s. Search should be <50ms.
191-
- **Deterministic** — same docs, same index. No embeddings, no ML, no probabilistic retrieval.
192-
- **Composable** — output is plain text or JSON. Pipe it wherever you want.
139+
- **Fast**manifest generation for a typical doc folder (50 files) should take <1s
140+
- **Deterministic** — same docs, same manifest. No embeddings, no ML, no probabilistic retrieval
141+
- **Composable** — output is plain text or JSON. Pipe it wherever you want
193142
- **Offline** — works air-gapped, on a plane, in a container with no egress
143+
- **Get out of the way** — fetch, organize, catalog, then let the agent read files directly
194144

195145
## Code Style
196146

197147
- Prefer fixing root causes over patching symptoms. If a workaround is needed, explain why the structural fix isn't feasible.
198148
- TypeScript strict mode, no `any`
199149
- Pure functions where possible, side effects at the edges (CLI entrypoint, file I/O)
200150
- No classes unless genuinely needed — prefer modules with exported functions
201-
- Error messages should be actionable: "Index not found. Run `refdocs index` first."
202-
- Tests with Vitest, focus on chunker logic and search relevance
151+
- Error messages should be actionable: "Manifest not found. Run `refdocs manifest` first."
152+
- Tests with Vitest, focus on manifest generation and file discovery
203153

204154
## Future Considerations (not MVP)
205155

206-
- `refdocs watch`rebuild index on file change
207-
- MCP server mode — expose search as an MCP tool for editors that prefer it
156+
- `refdocs watch`regenerate manifest on file change
157+
- MCP server mode — expose manifest as an MCP tool for editors that prefer it
208158
- Token counting with tiktoken instead of chars/4 estimate
209-
- Embedding-based search as optional mode (would require onnxruntime or similar)

0 commit comments

Comments
 (0)