Skip to content

This solution will help you (and your AI agent) understand any given codebase with ease, conciseness and precision.

Notifications You must be signed in to change notification settings

deepakdgupta1/KnowCode

Repository files navigation

KnowCode

Know a codebase using KnowCode. Ask questions and get responses in natural language about a codebase to learn more about it. Provide accurate, relevant context to your AI coding agent and make its token usage limits last 10x longer.

codecov CI/CD Pipeline

Overview

KnowCode analyzes your codebase and builds a semantic graph of entities (functions, classes, modules) and their relationships (calls, imports, dependencies). This structured knowledge enables:

  • Accurate context synthesis for AI assistants
  • Token-efficient context generation (only what's needed)
  • Local-first querying without LLM dependency
  • Traceability back to source code

Installation

# Create and activate virtual environment
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install KnowCode (with dev dependencies)
uv sync --dev

# Set API keys (only needed for the features you use; see aimodels.yaml)
export VOYAGE_API_KEY_1="..."   # embeddings + reranking (semantic search)
export OPENAI_API_KEY="..."     # embeddings (alternative to VoyageAI)
export GOOGLE_API_KEY_1="..."   # LLM (Gemini) for `knowcode ask`

Quick Start

# 1. Analyze your codebase
knowcode analyze src/

# 2. Query the knowledge store
knowcode query search "MyClass"
knowcode query callers "my_function"
knowcode query callees "MyClass.method"

# 3. Generate context for an entity
knowcode context "MyClass.important_method"

# 4. Export documentation
knowcode export -o docs/

# 5. Build semantic search index
knowcode index src/

# 6. Perform semantic search
knowcode semantic-search "How does parsing work?"

# 7. Start the intelligence server with watch mode
knowcode server --port 8080 --watch

# 8. Start MCP server for IDE integration
knowcode mcp-server --store .

# 9. View statistics
knowcode stats

Commands

analyze

Scan and parse a directory to build the knowledge store.

knowcode analyze <directory> [--output <path>] [--ignore <pattern>]

Example:

knowcode analyze src/ --ignore "tests/*" --ignore "*.pyc"

query

Query the knowledge store for relationships.

knowcode query <type> <target> [--store <path>] [--json]

Query types:

  • search <pattern> - Search entities by name
  • callers <entity> - Find what calls this entity
  • callees <entity> - Find what this entity calls
  • deps <entity> - Get all dependencies

Example:

knowcode query search "Parser"
knowcode query callers "GraphBuilder.build_from_directory"
knowcode query deps "PythonParser" --json

context

Generate a context bundle for an entity (ready for AI consumption).

knowcode context <entity> [--store <path>] [--max-chars <n>]

Example:

knowcode context "GraphBuilder.build_from_directory" --max-chars 4000

export

Export the knowledge store as Markdown documentation.

knowcode export [--store <path>] [--output <dir>]

Example:

knowcode export -o docs/

stats

Show statistics about the knowledge store.

knowcode stats [--store <path>]

index

Build a semantic search index for your codebase.

knowcode index <directory> [--output <path>] [--config <path>]

semantic-search

Perform a natural language search against the semantic index.

knowcode semantic-search <query> [--index <path>] [--store <path>] [--config <path>] [--limit <n>]

Example:

knowcode semantic-search "Where is the graph built?"

server

Start the FastAPI intelligence server. This is the preferred way for locally hosted AI agents (IDEs) to interact with KnowCode.

knowcode server [--host <host>] [--port <port>] [--store <path>] [--watch]

Example:

knowcode server --port 8080

Once running, you can access endpoints like:

  • GET /api/v1/context?target=MyClass&task_type=debug
  • GET /api/v1/search?q=parser (lexical search)
  • POST /api/v1/context/query (semantic search)
  • GET /api/v1/trace_calls/{entity_id}?direction=callers&depth=3 (multi-hop call graph)
  • GET /api/v1/impact/{entity_id} (deletion impact analysis)
  • POST /api/v1/reload (to refresh data after a new analyze run)

history

Show git history for the codebase or specific entities. Requires analysis with --temporal.

knowcode history [target] [--limit <n>]

Example:

# Show recent project history
knowcode history --limit 5

# Show history for a specific class
knowcode history "KnowledgeStore"

ask

Ask questions about the codebase using an LLM agent. Requires an API key for at least one configured model in aimodels.yaml.

knowcode ask <question> [--config <path>]

Configuration: KnowCode looks for a configuration file in the following order:

  1. --config argument
  2. aimodels.yaml in current directory
  3. ~/.aimodels.yaml

Example aimodels.yaml:

natural_language_models:
  - name: gemini-2.5-flash
    provider: google
    api_key_env: GOOGLE_API_KEY_1

Example:

knowcode ask "How does the graph builder work?"

mcp-server

Start an MCP (Model Context Protocol) server for IDE agent integration.

knowcode mcp-server [--store <path>] [--config <path>]

Tools Exposed:

  • search_codebase - Search for code entities by name
  • get_entity_context - Get detailed context for an entity
  • trace_calls - Trace call graph (callers/callees) with depth
  • retrieve_context_for_query - Unified query→retrieval→context bundle (same pipeline as knowcode ask)

MCP Client Configuration (Claude Desktop, VS Code, etc.):

{
  "knowcode": {
    "command": "knowcode",
    "args": ["mcp-server", "--store", "/path/to/project"]
  }
}

Installation with MCP support:

pip install "knowcode[mcp]"

IDE Agent Integration

KnowCode enables token-efficient IDE agent workflows. When your IDE agent needs context, it invokes KnowCode's MCP tools to retrieve relevant code context locally before calling expensive external LLMs.

How It Works:

  1. IDE agent receives user query
  2. Agent invokes retrieve_context_for_query via MCP
  3. KnowCode returns context + sufficiency_score (0.0-1.0)
  4. Score ≥ 0.8: Answer locally (zero external tokens)
  5. Score < 0.8: Use returned context with external LLM

Antigravity Configuration (.gemini/mcp_servers.json):

{
  "mcpServers": {
    "knowcode": {
      "command": "knowcode",
      "args": ["mcp-server", "--store", "/path/to/your/project"]
    }
  }
}

Token Savings:

  • Simple "locate" queries → 100% savings (answered locally)
  • Code explanations → 60-80% savings (precise context only)

Supported Languages (MVP)

  • Python (.py) - Full AST parsing (Supports Python 3.9 - 3.12)
  • JavaScript / TypeScript (.js, .ts) - Classes, functions, imports (via tree-sitter)
  • Java (.java) - Classes, methods, imports, inheritance (via tree-sitter)
  • Markdown (.md) - Document structure with heading hierarchy
  • YAML (.yaml, .yml) - Configuration keys with nested structure

Architecture

KnowCode follows a layered architecture:

  1. Scanner - Discovers files with gitignore support
  2. Parsers - Language-specific parsing (Python AST, Tree-sitter for others)
  3. Graph Builder - Constructs semantic graph with entities and relationships
  4. Knowledge Store - In-memory graph with JSON persistence
  5. Indexer - Vector embedding and hybrid retrieval engine (FAISS + BM25)
  6. Context Synthesizer - Generates token-efficient context bundles with priority ranking
  7. CLI - User interface for all operations

See KnowCode.md for the complete reference architecture.

Configuration

aimodels.yaml supports:

# LLM models for 'ask' command
natural_language_models:
  - name: gemini-2.0-flash-lite
    provider: google
    api_key_env: GOOGLE_API_KEY_1

# Embedding models
embedding_models:
  - name: voyage-3-lite
    provider: voyageai
    api_key_env: VOYAGE_API_KEY_1

# Reranking models (cross-encoder)
reranking_models:
  - name: rerank-2.5
    provider: voyageai
    api_key_env: VOYAGE_API_KEY_1

# Config
config:
  sufficiency_threshold: 0.8  # For local-first answering

Optional dependencies:

pip install "knowcode[mcp]"      # MCP server support
pip install "knowcode[voyageai]" # VoyageAI embeddings + reranking

Example Output

Stats:

Total Entities: 98
  class: 15
  function: 6
  method: 66
  module: 11

Total Relationships: 616
  calls: 478
  contains: 87
  imports: 47
  inherits: 4

Context Bundle:

# Method: `GraphBuilder.build_from_directory`

**File**: `/path/to/graph_builder.py`
**Lines**: 24-45

## Description
Build graph by scanning and parsing a directory.

## Signature
def build_from_directory(self, root_dir: str | Path, ...) -> 'GraphBuilder'

## Source Code
[full source code]

## Called By
- `main`
- `analyze_command`

## Calls
- `Scanner.__init__`
- `Scanner.scan_all`

Development

# Run tests
pytest

# Type checking
mypy src/

# Linting
ruff check src/

# Format
ruff format src/

Roadmap

See KnowCode.md for the full vision. The MVP focuses on:

  • ✅ Single monorepo support
  • ✅ Python, Markdown, YAML parsing
  • ✅ Snapshot-only analysis (no temporal tracking)
  • ✅ Local CLI tool

Released:

  • ✅ v1.1: Additional languages (JavaScript, TypeScript, Java)
  • ✅ v1.2: Git history integration, temporal tracking
  • ✅ v1.3: Token budget optimization, priority ranking
  • ✅ v1.4: Runtime signal integration
  • ✅ v2.0: Intelligence Server mode (local API for local IDE agents)
  • ✅ v2.1: Semantic search with embeddings, hybrid retrieval, and watch mode
  • ✅ v2.2: Developer Q&A & IDE Agent Integration:
    • Query classification and task-specific templates
    • Multi-hop trace_calls() and impact analysis
    • Local-first smart_answer() with sufficiency scoring
    • MCP server for IDE integration
    • VoyageAI cross-encoder reranking

Future releases:

  • v3.0: Team sharing & Enterprise features (RBAC, SSO, etc.)

License

MIT

About

This solution will help you (and your AI agent) understand any given codebase with ease, conciseness and precision.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •