- Repository scanning with
.gitignoreand.ctxignoresupport - Parallel AST parsing via tree-sitter across 9 languages
- Dependency graph construction using rustworkx
- Composite file scoring — fan-in centrality, git commit frequency, entry-point proximity, recency
- Percentile-based tier assignment (top 15% → structured summary, next 30% → signatures, rest → one-liners)
- Budget-driven inclusion — no static file caps, budget exhaustion is the only stopping condition
- AST-driven structured summaries for Tier 1 non-entry-point files
- Full source for entry point files (cli.py, main.py, app.py, etc.)
- Hard token budget enforcement with partial truncation on overflow
- Deterministic output — identical repositories produce byte-identical CONTEXT.md
Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, Ruby.
codectx analyze .— generate contextcodectx watch .— regenerate on file changescodectx search <query>— semantic file search (requires[semantic]extra)--tokens— custom token budget--output— custom output path--since— include recent git changes section--task— ranking profile:default,debug,feature,architecture,refactor--query— semantic similarity ranking via sentence-transformers + lancedb--layers— emit separate REPO_MAP.md and CORE_CONTEXT.md--no-git— skip git metadata, use filesystem fallback--verbose— debug logging
ARCHITECTURE, ENTRY_POINTS, SYMBOL_INDEX, IMPORTANT_CALL_PATHS, CORE_MODULES, SUPPORTING_MODULES, DEPENDENCY_GRAPH, RANKED_FILES, PERIPHERY
- JSON-based parse cache with SHA-256 file hashing
- Cache export/import for CI sharing
- Symlink-safe walker
- Proper
.venvand non-source directory exclusion - Safety check for sensitive files with interactive prompt
- Tests for walker, parser, ranker, compressor, formatter, cache, resolver, git metadata edge cases
- Go import resolver does not strip module prefix from import paths — dependency graph has no edges for Go repos until
go.modparsing is added - Dynamic imports and runtime-generated classes are not detected (tree-sitter is static analysis only)
codectx watch .regenerates the full pipeline on every change — no incremental updates yet__version__in__init__.pyshould be read fromimportlib.metadatarather than hardcoded
- Semantic similarity ranking via
--query— scaffolding complete, needs embedding cache invalidation on file change - Call path analysis — multiple paths per entry point, function-level annotations in IMPORTANT_CALL_PATHS
- Symbol cross-referencing — track where symbols defined in one file are used in others
- Improved type annotation extraction — surface return types and parameter types in structured summaries
- Constants section in structured summaries — files with no functions (config, defaults) currently emit only a purpose line
- JSON output format via
--format json— machine-readable context for programmatic agent use - Filter config files (pyproject.toml, package.json, Cargo.toml) from SUPPORTING_MODULES — they are not source files
- Debounced watch mode — 3-second inactivity window before triggering regeneration, skip non-source file changes
- Incremental parse cache utilisation — only re-parse changed files on watch, patch affected CONTEXT.md sections
- Go resolver: parse
go.modto extract module name, strip it from import paths before file lookup __version__sync: read fromimportlib.metadatawith fallbackpyproject.tomlappearing in SUPPORTING_MODULES — force config files to periphery
Swift, Kotlin, C#, PHP. Ruby support is present but shallow — improve method and module resolution.
- LLM-based summarization for Tier 3 files via
--llmflag — Anthropic and OpenAI providers already scaffolded incompressor/summarizer.py - Architecture diagram export — Mermaid diagram as a standalone file, not just embedded in CONTEXT.md
- Test coverage overlay — mark untested modules in RANKED_FILES
- Detailed cyclic dependency report — currently detected but only flagged, not explained
- Monorepo support — analyze interdependent packages within a single repository root
- Cross-repo dependency graph for workspace contexts
- VS Code extension — real-time context generation on save
- Neovim plugin
Role-based presets for common agent workflows — code reviewer, debugger, architect, onboarding. Each preset tunes ranking weights and output sections for the specific task an agent is performing.
- Analyze public GitHub repositories without cloning via the GitHub API
- CI/CD integration — GitHub Actions workflow that regenerates CONTEXT.md on push to main and commits it
- Data flow analysis — trace how data moves between modules
- Architectural pattern detection — identify MVC, CQRS, event-driven patterns and surface them in ARCHITECTURE section
- REST/GraphQL API extraction — identify and document API surfaces automatically
- Streaming output — emit CONTEXT.md sections as each pipeline stage completes rather than at the end
- Distributed processing for repositories exceeding 100k files
v0.1.x — stability and bug fixes
v0.2.x — enhanced ranking, output formats, performance
v0.3.x — language expansion, IDE integration, multi-repo
v1.0 — stable public API, agent-specific presets, remote analysis
Semantic versioning. Minor versions may introduce non-breaking features. No stable API guarantee until v1.0.