Skip to content
This repository was archived by the owner on Feb 14, 2026. It is now read-only.

Commit 21a3bb7

Browse files
v0.6.0: Add incremental indexing, remove web crawling, fix casts and heading depth
- Incremental indexing via SHA-256 content hashes (only re-chunk changed files) - --full flag to force complete rebuild - Remove web crawling module and dependencies (cheerio, jsdom, readability, turndown) - Fix unsafe TypeScript double-casts in search.ts - Support heading depths h1-h6 in chunker (was h1-h3) - Read CLI version from package.json instead of hardcoding - Bump index version 2 → 3 with automatic migration - Add searchAllIndexes for merged local+global search - Add file glob filtering to search Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent c7c752d commit 21a3bb7

21 files changed

Lines changed: 782 additions & 2072 deletions

README.md

Lines changed: 3 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,6 @@ refdocs init
2626
# Add docs from anywhere
2727
refdocs add ./docs # local directory
2828
refdocs add https://github.com/laravel/docs --branch 11.x # GitHub repo
29-
refdocs add https://docs.example.com/api/reference # any web page
30-
refdocs add https://example.com/docs --crawl # crawl a whole site
3129

3230
# Search
3331
refdocs search "database connections"
@@ -64,10 +62,6 @@ refdocs init -g # create global config at ~/.refdocs/
6462
# Add sources
6563
refdocs add ./docs # local directory
6664
refdocs add https://github.com/org/repo # GitHub repo (downloads markdown files)
67-
refdocs add https://example.com/page # any URL (fetches + converts to markdown)
68-
refdocs add https://example.com/file.md # direct .md or .txt file URL
69-
refdocs add https://example.com --crawl # crawl a site (multiple pages)
70-
refdocs add https://example.com --crawl --max-pages 50 --depth 2
7165
refdocs add <source> -g # add to global ~/.refdocs/ store
7266

7367
# Search
@@ -93,21 +87,16 @@ refdocs remove ref-docs/laravel # remove a path from config
9387

9488
3. **Output** — human-readable by default, `--json` for structured consumption, `--raw` for piping. Each result includes source file, line range, and heading trail.
9589

96-
## Adding from the web
90+
## Adding sources
9791

98-
`refdocs add` auto-detects the source type:
92+
`refdocs add` supports two source types:
9993

10094
| Source | Behavior |
10195
|--------|----------|
10296
| Local path (`./docs`) | Adds directory to config |
10397
| GitHub URL | Downloads `.md` files from the repo tarball |
104-
| `.md` or `.txt` URL | Downloads the file directly |
105-
| Any other URL | Fetches the page and converts HTML to markdown |
106-
| Any URL + `--crawl` | Spiders the site, converting each page to markdown |
10798

108-
Single-page URLs are fetched with [Readability](https://github.com/mozilla/readability) (content extraction) + [Turndown](https://github.com/mixmark-io/turndown) (HTML-to-markdown). With `--crawl`, links are followed within the same origin up to `--depth` levels (default: 3) and `--max-pages` (default: 200).
109-
110-
All downloaded sources are tracked in `.refdocs.json` and can be re-pulled with `refdocs update`.
99+
GitHub sources are tracked in `.refdocs.json` and can be re-pulled with `refdocs update`.
111100

112101
## Global docs
113102

@@ -153,8 +142,5 @@ All fields optional. See [Configuration](docs/configuration.md) for details.
153142
| [mdast-util-from-markdown](https://github.com/syntax-tree/mdast-util-from-markdown) | Markdown AST parsing |
154143
| [picomatch](https://github.com/micromatch/picomatch) | Glob pattern matching |
155144
| [tar-stream](https://github.com/mafintosh/tar-stream) | Tarball extraction for GitHub sources |
156-
| [@mozilla/readability](https://github.com/mozilla/readability) | Content extraction from web pages |
157-
| [Turndown](https://github.com/mixmark-io/turndown) | HTML-to-markdown conversion |
158-
| [cheerio](https://github.com/cheeriojs/cheerio) | Link discovery for crawling |
159145

160146
Zero external services. Works offline, in containers, on planes.

0 commit comments

Comments
 (0)