Configuration Parameters

⚙️ Configuration Guide

Every knob, dial, and lever in Spector — with sensible defaults and expert tuning advice. Whether you're optimizing for recall, latency, throughput, or memory, this page has you covered.

🎯 Core Parameters

Parameter	Default	Range	Description
`dimensions`	384	1–2048	Vector dimensionality (must match your embedding model)
`capacity`	100,000	1–10,000,000	Maximum document count
`similarityFunction`	COSINE	COSINE, DOT_PRODUCT, EUCLIDEAN	Distance metric

Tip

Quick model reference:

Model	Dimensions
all-MiniLM-L6-v2	384
e5-base-v2	768
text-embedding-ada-002	1536
nomic-embed-text	768

Choosing a similarity function:

COSINE — Normalized embeddings (most models)
DOT_PRODUCT — Unnormalized embeddings where magnitude matters
EUCLIDEAN — Spatial/geometric data

🗜️ Quantization Parameters

Parameter	Default	Range	Description
`quantization`	NONE	NONE, SCALAR_INT8, SCALAR_INT4, SCALAR_INT2, IVF_PQ	Quantization type
`oversamplingFactor`	auto	1–20	Rescore oversampling (auto: INT8→1, INT4→3, INT2→5)

🎛️ Quantization Profiles

Priority	Type	Oversampling	Compression	Recall	Use Case
🎯 Max recall	INT8	1 (none)	4×	95–99%	Quality-critical search
⚖️ Balanced	INT4	3	8×	85–95%	Best compression/recall ratio
💾 Memory-first	INT2	5	16×	75–90%	Fit large datasets in RAM
🚀 Billion-scale	IVF_PQ	—	32×	75–90%	Massive datasets

Tip

Start with INT4 for most workloads. It gives 8× compression with excellent recall when paired with the default 3× rescore. Only go to INT2 if memory is the binding constraint, or IVF-PQ if you're at billion scale.

Oversampling Tuning

The oversamplingFactor controls how many extra candidates are retrieved before rescoring with exact distances:

1 — No rescore (fastest, quantized scores returned directly)
3 — Good balance for INT4 (retrieves 3×K candidates, rescores to top-K)
5 — Recommended for INT2 (compensates for aggressive quantization)
10+ — Diminishing returns; use only if recall is still insufficient

// INT4 with custom oversampling
var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(50_000_000)
    .withQuantization(QuantizationType.SCALAR_INT4)
    .withRescore(5);  // Higher oversampling = better recall, slightly slower

🌐 HNSW Index Parameters

Parameter	Default	Range	Description
`M`	16	4–64	Max connections per node per layer
`efConstruction`	200	16–800	Construction beam width
`efSearch`	50	10–500	Search beam width

🎛️ Tuning Profiles

Priority	M	efConstruction	efSearch	Trade-off
🎯 High recall	32–64	400–800	200–500	More memory, slower build/search
⚖️ Balanced	16	200	50	Good recall with fast performance
⚡ Low latency	8–12	100	20–30	Faster search, lower recall
💾 Memory-constrained	4–8	100	20	Minimal memory, lower recall

Important

efSearch should be ≥ topK for meaningful results. Setting efSearch < topK means you're asking for more results than the algorithm explores.

📝 BM25 Parameters

Parameter	Default	Range	Description
`k1`	1.2	0.0–3.0	Term frequency saturation
`b`	0.75	0.0–1.0	Document length normalization

Corpus Type	Recommended k1	Recommended b
Short docs (tweets, titles)	1.2	0.3
Medium docs (articles)	1.2	0.75
Long docs (books, papers)	1.5–2.0	0.75
Mixed lengths	1.2	0.5

🧬 Hybrid Search (RRF)

Parameter	Default	Range	Description
`RRF k`	60	1–1000	Reciprocal Rank Fusion constant

k = 60 — Original paper recommendation, works well generally
Lower k (10–30) — Emphasizes top-ranked results more strongly
Higher k (100+) — Flattens rank importance

🎮 GPU Configuration

Parameter	Default	Range	Description
`gpuEnabled`	false	true/false	Enable CUDA GPU acceleration
`gpuMemoryBudget`	256 MB	256 MB – GPU max	Maximum GPU memory allocation
`gpuBatchWindow`	10 ms	1–100 ms	Batching window for query collection
`gpuMaxBatchSize`	1024	1–1024	Maximum queries per GPU batch

Note

Enable GPU for batch workloads with >10K vectors. Single queries are often faster on CPU SIMD due to zero kernel launch overhead. For INT4/INT2 quantization, GPU acceleration requires dimensions to be a multiple of 32. Non-aligned dimensions automatically fall back to CPU/SIMD.

🤖 Reranker Configuration

Parameter	Default	Range	Description
`rerankerEnabled`	false	true/false	Enable LLM re-ranking via Ollama
`rerankerModel`	—	Any Ollama model	Model name (e.g., "llama3.2")
`rerankerEndpoint`	http://localhost:11434	URL	Ollama API endpoint
`rerankerMaxCandidates`	20	1–100	Max docs sent to LLM

Warning

Re-ranking adds 100–500ms latency per query. Use only when precision is critical and latency budget allows.

🖥️ Server Configuration

Parameter	Default	Description
`port`	7070	HTTP server port
`apiKey`	—	Optional API key (empty = no auth)
`corsOrigins`	*	Allowed CORS origins

# Format: port dimensions apiKey
mvn exec:java -pl spector-node \
  -Dexec.mainClass="com.spectrayan.spector.server.SpectorNode" \
  -Dexec.args="7070 384 my-secret-key"

🌐 Cluster Configuration

Parameter	Default	Range	Description
`shardCount`	2	2–256	Number of data shards
`replicaCount`	1	1–5	Replicas per shard
`heartbeatInterval`	2s	500ms–30s	Cluster heartbeat interval
`heartbeatTimeout`	10s	3s–120s	Node unavailability timeout
`queryTimeout`	10s	1s–60s	Per-shard query timeout

Tip

Rule of thumb: 100K–500K docs per shard for optimal balance. Set heartbeatTimeout to at least 5× heartbeatInterval.

🧠 Memory Configuration

Operating Mode

Parameter	Default	Options	Description
`mode`	`SEARCH`	`SEARCH`, `MEMORY`, `HYBRID`	Which subsystems to initialize

Mode	Engine	Memory	MCP Tools
`SEARCH`	✅	❌	6 engine tools
`MEMORY`	❌	✅	11 memory tools
`HYBRID`	✅	✅	All 17 tools

Memory Tier Parameters

Parameter	Default	Range	Description
`nodesPerPartition`	10,000	1,000–1,000,000	Records per semantic partition file
`workingCapacity`	100	10–10,000	Working memory slots (volatile circular buffer)
`episodicPartitionCapacity`	10,000	1,000–100,000	Records per episodic partition
`semanticCapacity`	5,000	100–1,000,000	Single-file semantic capacity (in-memory mode)
`proceduralCapacity`	500	10–100,000	Procedural memory slots

Partitioned Semantic Storage

When using DISK persistence mode, semantic memories are stored in rolling partition files:

.spector/memory/semantic/
  semantic-000.mem     ← partition 0 (oldest, immutable)
  semantic-001.mem     ← partition 1 (immutable)
  semantic-002.mem     ← partition 2 (active, accepts writes)

Tuning nodesPerPartition:

Smaller partitions (1K–5K) → faster compaction, more parallel search threads, more files
Larger partitions (10K–50K) → fewer files, slightly lower overhead per partition
Default (10K) → good balance for most workloads

Tip

Existing single-file semantic.mem stores are automatically migrated to the partitioned format on first startup. No manual migration needed.

Cluster Replication for Partitions

Parameter	Default	Description
`partitionReplicationEnabled`	false	Enable file-level partition snapshot shipping
`replicaCount`	1	Replicas per shard (1–5)

When enabled, immutable semantic partitions are shipped as snapshots to replica nodes. Only the active (mutable) partition requires WAL-based delta replication.

🤖 RAG Pipeline Configuration

Parameter	Default	Range	Description
`maxTokens`	512	1–8192	Max tokens per chunk
`overlapTokens`	50	0–maxTokens-1	Overlap between chunks
`embeddingBatchSize`	32	1–256	Batch size for embedding generation
`embeddingRetries`	3	0–10	Retry count for failed batches
`contextTokenLimit`	4096	256–131072	Max tokens in assembled context

🎯 Configuration Examples

🎯 High-Recall Setup

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(500_000)
    .withQuantization(QuantizationType.SCALAR_INT8)
    .withM(32)
    .withEfConstruction(400)
    .withEfSearch(200);

🗜️ Balanced Compression (INT4)

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(50_000_000)
    .withQuantization(QuantizationType.SCALAR_INT4)
    .withRescore(3);  // default for INT4

💾 Maximum Compression (INT2)

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withCapacity(200_000_000)
    .withQuantization(QuantizationType.SCALAR_INT2)
    .withRescore(5);  // default for INT2

⚡ Low-Latency Setup

var config = SpectorConfig.DEFAULT
    .withDimensions(128)
    .withCapacity(100_000)
    .withM(12)
    .withEfConstruction(100)
    .withEfSearch(30);

🎮 GPU-Accelerated Batch Processing

var config = SpectorConfig.DEFAULT
    .withDimensions(768)
    .withCapacity(1_000_000)
    .withGpu(true)
    .withGpuMemoryBudget(2048);  // 2 GB

🤖 RAG Pipeline

var config = SpectorConfig.DEFAULT
    .withDimensions(384)
    .withMaxTokens(1024)
    .withOverlapTokens(100)
    .withEmbeddingBatchSize(64);

🔗 See Also

Performance Tuning — Benchmarks and optimization strategies
Architecture Overview — How configuration affects system behavior
Distributed Mode — Cluster-specific configuration
GPU Acceleration — GPU setup requirements

🏠 Home

Home
Getting Started
Cognitive Memory
- Overview
- Getting Started
- Use Cases & Configuration
- API Reference
- Architecture
- The 6-Phase Scoring Pipeline
- Cognitive Profiles
- Biological Systems
- Performance & Internals
- Cognitive Evaluation
Architecture
- System Overview
- Core Concepts
- Ingestion Pipeline
- RAG Pipeline
- MCP Integration
- Distributed Mode
- GPU Acceleration
- Performance Tuning
- Test Framework & LLM Judge
- JDK API Status
- Deep Dives
- Cortex Dashboard
  - Overview
Community
- Contributing
- FAQ
- Roadmap
- 🔬 Labs
  - Labs
  - Research Roadmap

Uh oh!

Configuration Parameters

⚙️ Configuration Guide

🎯 Core Parameters

🗜️ Quantization Parameters

🎛️ Quantization Profiles

Oversampling Tuning

🌐 HNSW Index Parameters

🎛️ Tuning Profiles

📝 BM25 Parameters

🧬 Hybrid Search (RRF)

🎮 GPU Configuration

🤖 Reranker Configuration

🖥️ Server Configuration

🌐 Cluster Configuration

🧠 Memory Configuration

Operating Mode

Memory Tier Parameters

Partitioned Semantic Storage

Cluster Replication for Partitions

🤖 RAG Pipeline Configuration

🎯 Configuration Examples

🎯 High-Recall Setup

🗜️ Balanced Compression (INT4)

💾 Maximum Compression (INT2)

⚡ Low-Latency Setup

🎮 GPU-Accelerated Batch Processing

🤖 RAG Pipeline

🔗 See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!