A filesystem git catalog for your repository collection.
The filesystem is where your work lives. repoindex is where your work becomes legible.
repoindex indexes local git directories. The filesystem path IS the
canonical identity. External platforms (GitHub, Gitea, PyPI, CRAN, Zenodo,
npm, Cargo, and others) provide opt-in enrichment. Forge fields are
unified (stars, topics, is_archived, ...) with forge_id carrying
provenance; registry fields stay namespaced (pypi_version,
cran_version, citation_doi). The SQLite database at
~/.repoindex/index.db is a materialized view of the filesystem plus
external state. refresh is the only writer; every other command reads.
The primary consumer is an LLM using the MCP server. The CLI is the human secondary interface.
pip install repoindex # CLI only
pip install repoindex[mcp] # CLI + MCP serverFrom source:
git clone https://github.com/queelius/repoindex.git
cd repoindex
make installrepoindex config init # Create ~/.repoindex/config.yaml
repoindex refresh # Populate the database
repoindex status # Dashboard overviewrepoindex mcp # stdio transport for Claude / MCP-aware agentsThe server exposes six tools. SQL is the API:
| Tool | Purpose |
|---|---|
get_manifest() |
Tables, row counts, last refresh, language breakdown |
get_schema(table?) |
SQL DDL for the whole database or one table |
run_sql(query) |
Read-only SQL (SELECT / WITH), up to 500 rows |
refresh(github?, pypi?, cran?, external?, full?) |
Trigger a refresh |
tag(repo, action, tag) |
Add / remove / list user tags |
export(output_dir, language?, dirty?, tag?, recent?) |
Longecho-compliant arkiv archive |
The assistant can answer any collection-level question by:
get_schema()to see what tables exist.- Compose the SQL query it needs.
run_sql(query)to get results.
No pre-decided "list repos by language" / "get starred repos" / "find dirty repos" tools. The schema is self-describing; SQL does the rest.
Install the Claude Code plugin to wire the server automatically:
/plugin marketplace add queelius/claude-code-marketplace
/plugin install repoindex@queelius| Command | Purpose |
|---|---|
status |
Dashboard: counts, health, last refresh, next-action hints |
show <repo> |
Detailed view of a single repository |
events |
Query git events (commits, tags, branches, merges, releases) |
digest |
Summarize recent activity by repo |
sql |
Raw SQL + DB maintenance (--info, --schema, --reset, --vacuum) |
refresh |
Sync DB from filesystem (--github, --pypi, --external, --full) |
export |
Longecho-compliant arkiv archive (default) or format plugins |
copy |
Copy filtered repos to a destination directory |
link |
Symlink tree management (tree, refresh, status) |
ops |
Collection operations (git, generate, github, audit, mirror, wip-snapshot) |
tag |
Tag management (add, remove, list, tree, move) |
config |
Settings management |
mcp |
Start the MCP server |
Four shorthands available on every operation command:
repoindex export -o out/ --dirty
repoindex export -o out/ --language python
repoindex export -o out/ --tag "work/*"
repoindex export -o out/ --recent 7dFor anything more expressive, use SQL.
repoindex sql "SELECT name, forge_id, stars FROM repos WHERE stars > 0 ORDER BY stars DESC LIMIT 10"
repoindex sql "SELECT r.name, p.package_name, p.current_version
FROM publications p JOIN repos r ON p.repo_id = r.id
WHERE p.registry = 'pypi' AND p.published = 1"
repoindex sql --info
repoindex sql --schemaSchema documented in CLAUDE.md and inline via repoindex sql --schema.
The schema is part of the 2.x stable surface; see STABILITY.md.
repoindex events --since 7d
repoindex events --type git_tag --since 30d
repoindex events --repo myproject
repoindex events --stats
repoindex events --json --since 7d | jq -r '.type' | sort | uniq -c# Multi-repo git operations
repoindex ops git push --dry-run
repoindex ops git pull --language python
repoindex ops git status --dirty
# Metadata audit
repoindex ops audit --language python
repoindex ops audit --category essentials --severity critical
# Generate boilerplate files
repoindex ops generate license --license mit --dry-run
repoindex ops generate codemeta --language python --dry-run
repoindex ops generate gitignore --lang python --dry-run
# Cross-platform forge actions (works on GitHub, Gitea, Codeberg, ...)
repoindex ops set-topics dreamlog python logic prolog
repoindex ops set-description dreamlog "Prolog in S-expressions"
repoindex ops set-archived old-experiment true --dry-run
# Pull repos you own on configured forges
repoindex ops sync --all --dry-run
# Redundancy (push --mirror to forges with role: mirror)
repoindex ops mirror --to codeberg --language python --dry-run
# Dirty-tree snapshots
repoindex ops wip-snapshot --dry-run# Default: longecho-compliant arkiv archive with HTML browser
repoindex export -o ~/archives/repos/
# Format plugins
repoindex export csv -o repos.csv
repoindex export bibtex --language python > refs.bib
repoindex export --list-formatsSee docs/export.md.
repoindex refresh # Smart refresh (only changed repos)
repoindex refresh --full # Force full rescan
repoindex refresh --github # GitHub metadata
repoindex refresh --source pypi # One registry source
repoindex refresh --external # All external sources
repoindex refresh --since 30d # Events from last 30 days
repoindex sql --reset # Drop and recreate (then refresh --full)Built-in metadata sources: GitHub, Gitea, CITATION.cff, project keywords, local asset detection, PyPI, CRAN, npm, Cargo, Conda, Docker, RubyGems, Go, Zenodo.
Add your own: drop a Python file under
~/.repoindex/sources/scanners/, ~/.repoindex/sources/forges/, or
~/.repoindex/sources/registries/ exporting a module-level
source = <Subclass>(...). The Source family (LocalScanner,
GitForge, Registry) is part of the stable surface.
# ~/.repoindex/config.yaml
repository_directories:
- ~/github/**
- ~/work
github:
token: ghp_...
forges:
github:
token_env: GITHUB_TOKEN
role: primary
codeberg:
source_id: gitea
host: codeberg.org
role: mirror
token_env: CODEBERG_TOKEN
url_template: "https://codeberg.org/queelius/{repo}.git"
author:
name: "Alexander Towell"
alias: "Alex Towell"
email: "lex@metafunctor.com"
orcid: "0000-0001-6443-9897"
repository_tags:
/home/user/github/myproject:
- work/active
- topic:mlrepoindex config show
repoindex config get author.name
repoindex config set author.name "Your Name"
repoindex config unset refresh.external_sources.githubEnvironment variables:
REPOINDEX_CONFIG: override the config file pathGITHUB_TOKEN: used ifgithub.tokenis not set
Commands (CLI, Click)
repoindex/commands/*.py
|
v
Services (business logic)
repoindex/services/*.py
|
+---> Database (SQLite, schema migrations, query helpers)
+---> Infrastructure (git, GitHub, Zenodo, PyPI, files)
+---> Domain (frozen dataclasses)
- Domain:
Repository,Tag,Event,AuditCheck,OperationDetail. No I/O. - Database:
~/.repoindex/index.db. Read-only enforced at connection level. - Infrastructure: everything that reaches outside the process.
- Services: business logic. Internal, not a public API.
- Commands: thin Click wrappers. Pretty output default,
--jsonfor JSONL.
See DESIGN.md for the full philosophy and STABILITY.md for the
backward-compatibility contract.
make install # Create .venv and install in dev mode
make test # Run tests
pytest --cov=repoindex # With coverage
make build # Wheel and sdist1800+ tests in tests/. Target coverage above 86%.
CLAUDE.md: current architecture and operational referenceDESIGN.md: philosophy and architecture rationaleSTABILITY.md: v2.0 backward-compatibility contractdocs/index.md: docs landing pagedocs/events.md,docs/export.md,docs/ops.md: per-command references
Alexander Towell (Alex Towell) / GitHub / ORCID / PyPI / Blog
Contact: lex@metafunctor.com
MIT. See LICENSE.