-
-
Notifications
You must be signed in to change notification settings - Fork 0
Configuration Parameters
Every knob, dial, and lever in Spector — with sensible defaults and expert tuning advice. Whether you're optimizing for recall, latency, throughput, or memory, this page has you covered.
| Parameter | Default | Range | Description |
|---|---|---|---|
dimensions |
384 | 1–2048 | Vector dimensionality (must match your embedding model) |
capacity |
100,000 | 1–10,000,000 | Maximum document count |
similarityFunction |
COSINE | COSINE, DOT_PRODUCT, EUCLIDEAN | Distance metric |
Tip
Quick model reference:
| Model | Dimensions |
|---|---|
| all-MiniLM-L6-v2 | 384 |
| e5-base-v2 | 768 |
| text-embedding-ada-002 | 1536 |
| nomic-embed-text | 768 |
Choosing a similarity function:
-
COSINE — Normalized embeddings (most models)
-
DOT_PRODUCT — Unnormalized embeddings where magnitude matters
-
EUCLIDEAN — Spatial/geometric data
| Parameter | Default | Range | Description |
|---|---|---|---|
quantization |
NONE | NONE, SCALAR_INT8, SCALAR_INT4, SCALAR_INT2, IVF_PQ | Quantization type |
oversamplingFactor |
auto | 1–20 | Rescore oversampling (auto: INT8→1, INT4→3, INT2→5) |
| Priority | Type | Oversampling | Compression | Recall | Use Case |
|---|---|---|---|---|---|
| 🎯 Max recall | INT8 | 1 (none) | 4× | 95–99% | Quality-critical search |
| ⚖️ Balanced | INT4 | 3 | 8× | 85–95% | Best compression/recall ratio |
| 💾 Memory-first | INT2 | 5 | 16× | 75–90% | Fit large datasets in RAM |
| 🚀 Billion-scale | IVF_PQ | — | 32× | 75–90% | Massive datasets |
Tip
Start with INT4 for most workloads. It gives 8× compression with excellent recall when paired with the default 3× rescore. Only go to INT2 if memory is the binding constraint, or IVF-PQ if you're at billion scale.
The oversamplingFactor controls how many extra candidates are retrieved before rescoring with exact distances:
-
1 — No rescore (fastest, quantized scores returned directly)
-
3 — Good balance for INT4 (retrieves 3×K candidates, rescores to top-K)
-
5 — Recommended for INT2 (compensates for aggressive quantization)
-
10+ — Diminishing returns; use only if recall is still insufficient
// INT4 with custom oversampling
var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withCapacity(50_000_000)
.withQuantization(QuantizationType.SCALAR_INT4)
.withRescore(5); // Higher oversampling = better recall, slightly slower| Parameter | Default | Range | Description |
|---|---|---|---|
M |
16 | 4–64 | Max connections per node per layer |
efConstruction |
200 | 16–800 | Construction beam width |
efSearch |
50 | 10–500 | Search beam width |
| Priority | M | efConstruction | efSearch | Trade-off |
|---|---|---|---|---|
| 🎯 High recall | 32–64 | 400–800 | 200–500 | More memory, slower build/search |
| ⚖️ Balanced | 16 | 200 | 50 | Good recall with fast performance |
| ⚡ Low latency | 8–12 | 100 | 20–30 | Faster search, lower recall |
| 💾 Memory-constrained | 4–8 | 100 | 20 | Minimal memory, lower recall |
Important
efSearch should be ≥ topK for meaningful results. Setting efSearch < topK means you're asking for more results than the algorithm explores.
| Parameter | Default | Range | Description |
|---|---|---|---|
k1 |
1.2 | 0.0–3.0 | Term frequency saturation |
b |
0.75 | 0.0–1.0 | Document length normalization |
| Corpus Type | Recommended k1 | Recommended b |
|---|---|---|
| Short docs (tweets, titles) | 1.2 | 0.3 |
| Medium docs (articles) | 1.2 | 0.75 |
| Long docs (books, papers) | 1.5–2.0 | 0.75 |
| Mixed lengths | 1.2 | 0.5 |
| Parameter | Default | Range | Description |
|---|---|---|---|
RRF k |
60 | 1–1000 | Reciprocal Rank Fusion constant |
-
k = 60— Original paper recommendation, works well generally -
Lower
k(10–30) — Emphasizes top-ranked results more strongly -
Higher
k(100+) — Flattens rank importance
| Parameter | Default | Range | Description |
|---|---|---|---|
gpuEnabled |
false | true/false | Enable CUDA GPU acceleration |
gpuMemoryBudget |
256 MB | 256 MB – GPU max | Maximum GPU memory allocation |
gpuBatchWindow |
10 ms | 1–100 ms | Batching window for query collection |
gpuMaxBatchSize |
1024 | 1–1024 | Maximum queries per GPU batch |
Note
Enable GPU for batch workloads with >10K vectors. Single queries are often faster on CPU SIMD due to zero kernel launch overhead. For INT4/INT2 quantization, GPU acceleration requires dimensions to be a multiple of 32. Non-aligned dimensions automatically fall back to CPU/SIMD.
| Parameter | Default | Range | Description |
|---|---|---|---|
rerankerEnabled |
false | true/false | Enable LLM re-ranking via Ollama |
rerankerModel |
— | Any Ollama model | Model name (e.g., "llama3.2") |
rerankerEndpoint |
http://localhost:11434 | URL | Ollama API endpoint |
rerankerMaxCandidates |
20 | 1–100 | Max docs sent to LLM |
Warning
Re-ranking adds 100–500ms latency per query. Use only when precision is critical and latency budget allows.
| Parameter | Default | Description |
|---|---|---|
port |
7070 | HTTP server port |
apiKey |
— | Optional API key (empty = no auth) |
corsOrigins |
* | Allowed CORS origins |
# Format: port dimensions apiKey
mvn exec:java -pl spector-node \
-Dexec.mainClass="com.spectrayan.spector.server.SpectorNode" \
-Dexec.args="7070 384 my-secret-key"| Parameter | Default | Range | Description |
|---|---|---|---|
shardCount |
2 | 2–256 | Number of data shards |
replicaCount |
1 | 1–5 | Replicas per shard |
heartbeatInterval |
2s | 500ms–30s | Cluster heartbeat interval |
heartbeatTimeout |
10s | 3s–120s | Node unavailability timeout |
queryTimeout |
10s | 1s–60s | Per-shard query timeout |
Tip
Rule of thumb: 100K–500K docs per shard for optimal balance. Set heartbeatTimeout to at least 5× heartbeatInterval.
| Parameter | Default | Options | Description |
|---|---|---|---|
mode |
SEARCH |
SEARCH, MEMORY, HYBRID
|
Which subsystems to initialize |
| Mode | Engine | Memory | MCP Tools |
|---|---|---|---|
SEARCH |
✅ | ❌ | 6 engine tools |
MEMORY |
❌ | ✅ | 11 memory tools |
HYBRID |
✅ | ✅ | All 17 tools |
| Parameter | Default | Range | Description |
|---|---|---|---|
nodesPerPartition |
10,000 | 1,000–1,000,000 | Records per semantic partition file |
workingCapacity |
100 | 10–10,000 | Working memory slots (volatile circular buffer) |
episodicPartitionCapacity |
10,000 | 1,000–100,000 | Records per episodic partition |
semanticCapacity |
5,000 | 100–1,000,000 | Single-file semantic capacity (in-memory mode) |
proceduralCapacity |
500 | 10–100,000 | Procedural memory slots |
When using DISK persistence mode, semantic memories are stored in rolling partition files:
.spector/memory/semantic/
semantic-000.mem ← partition 0 (oldest, immutable)
semantic-001.mem ← partition 1 (immutable)
semantic-002.mem ← partition 2 (active, accepts writes)
Tuning nodesPerPartition:
- Smaller partitions (1K–5K) → faster compaction, more parallel search threads, more files
- Larger partitions (10K–50K) → fewer files, slightly lower overhead per partition
- Default (10K) → good balance for most workloads
Tip
Existing single-file semantic.mem stores are automatically migrated to the partitioned format on first startup. No manual migration needed.
| Parameter | Default | Description |
|---|---|---|
partitionReplicationEnabled |
false | Enable file-level partition snapshot shipping |
replicaCount |
1 | Replicas per shard (1–5) |
When enabled, immutable semantic partitions are shipped as snapshots to replica nodes. Only the active (mutable) partition requires WAL-based delta replication.
| Parameter | Default | Range | Description |
|---|---|---|---|
maxTokens |
512 | 1–8192 | Max tokens per chunk |
overlapTokens |
50 | 0–maxTokens-1 | Overlap between chunks |
embeddingBatchSize |
32 | 1–256 | Batch size for embedding generation |
embeddingRetries |
3 | 0–10 | Retry count for failed batches |
contextTokenLimit |
4096 | 256–131072 | Max tokens in assembled context |
var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withCapacity(500_000)
.withQuantization(QuantizationType.SCALAR_INT8)
.withM(32)
.withEfConstruction(400)
.withEfSearch(200);var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withCapacity(50_000_000)
.withQuantization(QuantizationType.SCALAR_INT4)
.withRescore(3); // default for INT4var config = SpectorConfig.DEFAULT
.withDimensions(384)
.withCapacity(200_000_000)
.withQuantization(QuantizationType.SCALAR_INT2)
.withRescore(5); // default for INT2var config = SpectorConfig.DEFAULT
.withDimensions(128)
.withCapacity(100_000)
.withM(12)
.withEfConstruction(100)
.withEfSearch(30);var config = SpectorConfig.DEFAULT
.withDimensions(768)
.withCapacity(1_000_000)
.withGpu(true)
.withGpuMemoryBudget(2048); // 2 GBvar config = SpectorConfig.DEFAULT
.withDimensions(384)
.withMaxTokens(1024)
.withOverlapTokens(100)
.withEmbeddingBatchSize(64);-
Performance Tuning — Benchmarks and optimization strategies
-
Architecture Overview — How configuration affects system behavior
-
Distributed Mode — Cluster-specific configuration
-
GPU Acceleration — GPU setup requirements
- Home
- Getting Started
-
Cognitive Memory
- Overview
- Getting Started
- Use Cases & Configuration
- API Reference
- Architecture
- The 6-Phase Scoring Pipeline
- Cognitive Profiles
-
Biological Systems
- Overview
- Cortex — Tier Stores
- Hippocampus — Sleep Consolidation
- Synapse — Tags & Scoring
- Dopamine — Surprise Detection
- Amygdala — Emotional Valence
- 3-Layer Cognitive Graph
- Habituation — Anti-Filter Bubble
- Inhibition — Suppression
- Interference — Deduplication
- Prospective — Future Intents
- Metamemory — Self-Reflection
- Sync — Persistence & Replication
- Performance & Internals
- Cognitive Evaluation
- Architecture
-
Community
- Contributing
- FAQ
- Roadmap
- 🔬 Labs