PRISM Architecture Summary

Based on Codebase Analysis - 2025-01-13

This document provides a comprehensive summary of PRISM's architecture based on actual implementation analysis.

System Architecture Overview

PRISM is a token-optimizing RAG system with 6 main phases:

User Query → Intent Detection → Vector Search → Relevance Scoring → 
Chunk Selection → Adaptive Compression → Model Routing → Optimized Prompt

Core Components

1. Token Optimizer (`src/token-optimizer/`)

Purpose: Orchestrate the complete optimization pipeline

Key Classes:

TokenOptimizer - Main orchestrator
IntentDetector - Query analysis
ChunkSelector - Greedy selection
SimpleTokenCounter - Token estimation

Pipeline:

async reconstructPrompt(
  originalPrompt: string,
  chunks: CodeChunk[],
  budget: number,
  context?: Partial<ScoringContext>
): Promise<OptimizedPrompt>

Phases:

Detect intent (bug_fix, feature_add, explain, etc.)
Score chunks (5-feature weighted)
Select within budget (score density)
Compress selected (4 levels)
Reconstruct prompt (structured format)
Choose model (cost optimization)

2. Intent Detector (`src/token-optimizer/IntentDetector.ts`)

Purpose: Analyze queries to determine optimization strategy

Intent Types:

type IntentType =
  | 'bug_fix'      // Fix errors, crashes
  | 'feature_add'  // Implement features
  | 'explain'      // Explain code
  | 'refactor'     // Improve quality
  | 'test'         // Testing/verification
  | 'search'       // Find code
  | 'general';     // Default

Detection Method:

Pattern matching with keyword weights
Entity extraction (symbols, files, types)
Complexity estimation (0-1 scale)
Budget estimation based on intent

Output:

interface QueryIntent {
  type: IntentType;
  query: string;
  embedding: number[];
  entities: QueryEntity[];
  scope: QueryScope;
  complexity: number;
  requiresHistory: boolean;
  estimatedBudget: number;
  options: OptimizationOptions;
}

3. Relevance Scorer (`src/scoring/scores/RelevanceScorer.ts`)

Purpose: Multi-feature relevance scoring

Scoring Formula:

score =
  0.40 * semantic_similarity +  // Cosine similarity
  0.25 * symbol_match +         // Name matching
  0.20 * file_proximity +       // Path distance
  0.10 * recency +              // Last modified
  0.05 * usage_frequency;       // Historical usage

Features:

Semantic (40%): Cosine similarity of embeddings
Symbol (25%): Fuzzy matching of names
Proximity (20%): Path hierarchy distance
Recency (10%): Exponential decay (30-day half-life)
Frequency (5%): Helpful usage ratio

4. Chunk Selector (`src/token-optimizer/ChunkSelector.ts`)

Purpose: Greedy chunk selection using score density

Algorithm:

Sort by score density (score / token_cost)
Filter by minimum relevance
Greedily select highest density chunks
Allow 10% overage for high-value chunks
Optionally diversify across files

Selection Options:

interface SelectionOptions {
  overageAllowance?: number;    // Default: 0.10
  minRelevance?: number;        // Default: 0.0
  maxChunks?: number;           // Default: unlimited
  preferDiversity?: boolean;    // Default: false
}

5. Adaptive Compressor (`src/compression/AdaptiveCompressor.ts`)

Purpose: Progressive compression to meet token targets

Compression Levels:

Level	Description	Ratio	Preserve
Light	Remove comments/blanks	1.2-1.5x	All content
Medium	Collapse whitespace	1.5-3x	Structure
Aggressive	Extract signature + structure	3-10x	Types, imports
Signature-only	Only signature	10-30x	Signature only

Strategy:

// Try progressive levels
for (const level of ['light', 'medium', 'aggressive', 'signature-only']) {
  const compressed = compressAtLevel(chunk, level);
  if (estimateTokens(compressed) <= targetTokens) {
    return compressed;
  }
}

6. Model Router (`src/model-router/ModelRouter.ts`)

Purpose: Select optimal AI model based on cost and complexity

Decision Tree:

if (tokens < 8000 && complexity < 0.6 && ollamaAvailable) {
  return 'ollama:deepseek-coder-v2';  // Free, local
}
if (cloudflareAvailable && canAfford && tokens < 50000) {
  return 'cloudflare:@cf/meta/llama-3.1-8b-instruct';  // Free tier
}
if (tokens < 50000 && complexity < 0.6) {
  return 'anthropic:claude-3-haiku';  // $0.25/M input
}
if (tokens < 100000) {
  return 'anthropic:claude-3.5-sonnet';  // $3/M input
}
return 'anthropic:claude-3-opus';  // $15/M input

Model Specifications:

Model	Provider	Context	Input Cost	Use Case
deepseek-coder-v2	Ollama	16K	$0	Simple coding
llama-3.1-8b	Cloudflare	128K	$0	General (free)
claude-3-haiku	Anthropic	200K	$0.25/M	Fast, simple
claude-3.5-sonnet	Anthropic	200K	$3/M	Balanced
claude-3-opus	Anthropic	200K	$15/M	Complex

Budget Tracking:

class BudgetTracker {
  canAfford(model: string, tokens: number): boolean;
  trackUsage(model: string, tokens: number): void;
  reset(): void;
}

7. Vector Database (`src/vector-db/MemoryVectorDB.ts`)

Purpose: In-memory vector storage and search

Implementation:

Storage: Map<string, VectorEntry>
Search: Brute-force cosine similarity
Suitable for: < 10K chunks

Schema:

interface VectorEntry {
  chunk: CodeChunk;
  embedding: number[];
  timestamp: Date;
  accessCount: number;
  lastAccessed: Date;
}

Performance:

Insert: O(1)
Search: O(n) where n = chunks
Memory: ~2KB per chunk

Limitations:

No persistence
Linear search scaling
Single-node only

8. WASM Indexer (`src/indexer/WasmIndexer.ts`)

Purpose: Fast code parsing using Rust compiled to WASM

Supported Languages:

TypeScript/JavaScript
Python
Rust
Go
Java

Chunking Strategy:

Extract functions, classes, methods
Preserve type signatures
Track dependencies
Generate line numbers

Performance:

10-50x faster than JS parsers
Memory-efficient (WASM sandbox)
Portable across platforms

Core Types

CodeChunk

interface CodeChunk {
  id: string;
  filePath: string;
  name: string;
  kind: 'function' | 'class' | 'method' | 'variable' | 'interface';
  startLine: number;
  endLine: number;
  content: string;
  signature?: string;
  language: string;
  embedding?: number[];
  metadata: CodeChunkMetadata;
}

ScoredChunk

interface ScoredChunk extends CodeChunk {
  relevanceScore: number;
  scoreBreakdown: ScoreBreakdown;
}

interface ScoreBreakdown {
  semantic: number;
  proximity: number;
  symbol: number;
  recency: number;
  frequency: number;
}

OptimizedPrompt

interface OptimizedPrompt {
  prompt: string;
  tokensUsed: number;
  chunks: CompressedChunk[];
  model: string;
  savings: SavingsMetrics;
  routing?: {
    provider: string;
    reason: string;
    estimatedCost: number;
  };
}

Performance Characteristics

Time Complexity

Phase	Complexity
Intent Detection	O(q)
Vector Search	O(n * d)
Scoring	O(n * d)
Selection	O(n log n)
Compression	O(c)
Total	*O(n (d + log n))**

Where:

n = number of chunks
d = embedding dimension (384)
q = query length
c = chunk size

Space Complexity

Component	Complexity
Chunks	O(n * c)
Embeddings	O(n * d)
Scores	O(n)
Total	*O(n (c + d))**

Typical Performance

Codebase Size	Time
Small (< 1K chunks)	50-100ms
Medium (1K-10K)	100-400ms
Large (> 10K)	400-1000ms

Key Metrics

Metric	Target	Actual
Token Savings	>90%	92-95%
Indexing Speed	<30s for 1M LOC	~20s
Memory Usage	<100MB for 1M LOC	~80MB
Search Latency	<500ms	200-400ms
Compression Ratio	5-10x	3-30x (adaptive)

Known Limitations

Current Limitations

Linear Vector Search
- O(n) complexity
- Slow for > 10K chunks
- Planned: SQLite FTS5
No Persistence
- In-memory only
- Data lost on restart
- Planned: SQLite storage
No Learning
- Static weights
- No adaptation
- Planned: Reinforcement learning
Limited Compression
- 4 levels only
- Language-agnostic
- Planned: AST-aware compression

Edge Cases

Empty Results → Fallback to keyword search
Over-Budget → Include high-value chunks anyway
No Embeddings → Rely on other features
Zero Budget → Return empty selection

Architecture Principles

Free Tier First - Design for Cloudflare free tier
Progressive Enhancement - Core works without advanced features
Local-First - Prioritize local operations
Interface-Based - Clean, testable abstractions
Error Transparency - Structured error handling

Technology Stack

Component	Technology
Runtime	Node.js 18+
Language	TypeScript 5.7
Indexer	Rust + WASM
Embeddings	Cloudflare Workers AI
Vector DB	In-memory (Map)
LLM	Ollama + Cloudflare + Claude

File Structure

src/
├── core/
│   ├── types/index.ts          # Core type definitions
│   ├── interfaces/index.ts     # Service interfaces
│   └── utils/                  # Utility functions
├── token-optimizer/
│   ├── TokenOptimizer.ts       # Main orchestrator
│   ├── IntentDetector.ts       # Query analysis
│   ├── ChunkSelector.ts        # Greedy selection
│   └── SimpleTokenCounter.ts   # Token estimation
├── scoring/
│   └── scores/
│       └── RelevanceScorer.ts  # Multi-feature scoring
├── compression/
│   └── AdaptiveCompressor.ts   # Progressive compression
├── model-router/
│   ├── ModelRouter.ts          # Model selection
│   ├── BudgetTracker.ts        # Free tier tracking
│   ├── ComplexityAnalyzer.ts   # Query complexity
│   ├── OllamaClient.ts         # Local LLM
│   └── CloudflareClient.ts     # Cloudflare AI
├── vector-db/
│   └── MemoryVectorDB.ts       # In-memory storage
├── indexer/
│   ├── WasmIndexer.ts          # WASM-based parser
│   └── IndexStorage.ts         # Index persistence
└── embeddings/
    └── EmbeddingService.ts     # Embedding generation

Conclusion

PRISM achieves 90%+ token savings through:

Intelligent intent detection
Multi-feature relevance scoring
Budget-aware chunk selection
Adaptive compression
Smart model routing

The system is designed for:

Free tier optimization
Progressive enhancement
Local-first operation
Interface-based modularity

For detailed documentation, see:

/docs/architecture/00-readme.md - Architecture index
/docs/architecture/01-system-overview.md - System overview
/docs/architecture/02-token-optimizer.md - Token optimizer
/docs/architecture/06-model-router-architecture.md - Model router

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRISM Architecture Summary

System Architecture Overview

Core Components

1. Token Optimizer (`src/token-optimizer/`)

2. Intent Detector (`src/token-optimizer/IntentDetector.ts`)

3. Relevance Scorer (`src/scoring/scores/RelevanceScorer.ts`)

4. Chunk Selector (`src/token-optimizer/ChunkSelector.ts`)

5. Adaptive Compressor (`src/compression/AdaptiveCompressor.ts`)

6. Model Router (`src/model-router/ModelRouter.ts`)

7. Vector Database (`src/vector-db/MemoryVectorDB.ts`)

8. WASM Indexer (`src/indexer/WasmIndexer.ts`)

Core Types

CodeChunk

ScoredChunk

OptimizedPrompt

Performance Characteristics

Time Complexity

Space Complexity

Typical Performance

Key Metrics

Known Limitations

Current Limitations

Edge Cases

Architecture Principles

Technology Stack

File Structure

Conclusion

FilesExpand file tree

ARCHITECTURE_SUMMARY.md

Latest commit

History

ARCHITECTURE_SUMMARY.md

File metadata and controls

PRISM Architecture Summary

System Architecture Overview

Core Components

1. Token Optimizer (src/token-optimizer/)

2. Intent Detector (src/token-optimizer/IntentDetector.ts)

3. Relevance Scorer (src/scoring/scores/RelevanceScorer.ts)

4. Chunk Selector (src/token-optimizer/ChunkSelector.ts)

5. Adaptive Compressor (src/compression/AdaptiveCompressor.ts)

6. Model Router (src/model-router/ModelRouter.ts)

7. Vector Database (src/vector-db/MemoryVectorDB.ts)

8. WASM Indexer (src/indexer/WasmIndexer.ts)

Core Types

CodeChunk

ScoredChunk

OptimizedPrompt

Performance Characteristics

Time Complexity

Space Complexity

Typical Performance

Key Metrics

Known Limitations

Current Limitations

Edge Cases

Architecture Principles

Technology Stack

File Structure

Conclusion

1. Token Optimizer (`src/token-optimizer/`)

2. Intent Detector (`src/token-optimizer/IntentDetector.ts`)

3. Relevance Scorer (`src/scoring/scores/RelevanceScorer.ts`)

4. Chunk Selector (`src/token-optimizer/ChunkSelector.ts`)

5. Adaptive Compressor (`src/compression/AdaptiveCompressor.ts`)

6. Model Router (`src/model-router/ModelRouter.ts`)

7. Vector Database (`src/vector-db/MemoryVectorDB.ts`)

8. WASM Indexer (`src/indexer/WasmIndexer.ts`)