feat(indexer): content-based importance scoring at indexing time by Mathews-Tom · Pull Request #46 · Mathews-Tom/VaultMind

Mathews-Tom · 2026-03-25T18:16:23Z

Summary

Add content-based importance scoring computed at indexing time, inspired by memora-lab/memory-service-public's observation-level importance scoring pattern. Each note chunk now carries a pre-computed importance_score (0.0-1.0) based on content richness signals, providing a ranking boost for well-connected, well-tagged, link-rich content without additional query-time computation.

Importance Score Formula

Four factors with equal weight (0.25 each):

Factor	Calculation	Rationale
Entity density	`min(entity_count / 5, 1.0)`	Notes mentioning more graph entities are more connected
Link density	`min(wikilink_count / 10, 1.0)`	Notes with more `[[wikilinks]]` are better integrated
Content length	`min(word_count / 500, 1.0)`	Longer notes (up to cap) tend to be more substantive
Tag count	`min(tag_count / 5, 1.0)`	More tags indicate broader relevance

importance_score = (entity_score + link_score + length_score + tag_score) / 4.0

Ranking Integration

The importance score provides up to 15% boost to the final composite score:

if importance > 0:
    final *= 1.0 + 0.15 * importance

This is applied as a post-multiplier after composite scoring (semantic + recency + density + activation + type weights), similar to how mode and status multipliers work.

Changes

New Files

src/vaultmind/indexer/importance.py (34 lines) — compute_importance(content, tags, entities) function. Uses regex for wikilink extraction, word splitting for length. All factors capped at 1.0, combined as equal-weight average
tests/test_importance.py (135 lines) — 15 tests across 4 classes

Modified Files

src/vaultmind/vault/models.py — Added importance_score: float = 0.0 field to NoteChunk. Updated to_chroma_metadata() return type to dict[str, str | int | float] and included importance_score in output
src/vaultmind/vault/parser.py — Computes note-level importance in chunk_note() via compute_importance() and propagates to all chunks created from that note
src/vaultmind/indexer/ranking.py — Added importance_score: float = 0.0 to RankedResult. Extracts importance from chunk metadata in rank_results() and applies 15% boost to final score
src/vaultmind/config.py — Added importance_scoring_enabled: bool = True to RankingConfig
config/default.toml — Added importance_scoring_enabled = true to [ranking]

Backward Compatibility

importance_score defaults to 0.0 on NoteChunk — existing chunks without the field rank as before
Importance boost is multiplicative with a floor of 1.0 (0 importance = no boost)
to_chroma_metadata() return type widened from dict[str, str | int] to dict[str, str | int | float] — strictly additive
All existing ranking/composite tests pass unchanged

Test plan

15 new tests in test_importance.py across 4 classes:
- compute_importance (10): empty/rich content, entity/link/tag/length factors, capping at 1.0, None handling, score range validation
- NoteChunk metadata (2): importance in to_chroma_metadata(), default value
- RankedResult field (2): default 0.0, explicit population
- RankingConfig (1): importance_scoring_enabled default
All existing ranking tests pass unchanged (60 in test_composite_ranking.py + test_ranking.py)
Full suite: 981/981 tests pass, 0 regressions
ruff check — clean
mypy --ignore-missing-imports — clean
Manual: re-index vault, verify importance_score appears in ChromaDB metadata
Manual: confirm link-rich permanent notes rank higher than sparse fleeting notes

New module indexer/importance.py computes importance_score (0.0-1.0) from four equal-weight factors: entity density, wikilink density, content length, and tag count. Score stored in NoteChunk and persisted to ChromaDB metadata for use in ranking. Add importance_score field to NoteChunk and to_chroma_metadata(). Add importance_scoring_enabled toggle to RankingConfig.

Compute note-level importance in chunk_note() and propagate to all chunks. In rank_results(), extract importance_score from metadata and apply up to 15% boost to the final composite score for high-importance notes. Add importance_score field to RankedResult.

15 tests across 4 classes: compute_importance function (10) covering empty/rich content, entity/link/tag/length factors, capping, None handling; NoteChunk metadata (2); RankedResult field (2); RankingConfig toggle (1).

Mathews-Tom added 3 commits March 25, 2026 23:43

test(indexer): add importance scoring tests

0f1ccc5

15 tests across 4 classes: compute_importance function (10) covering empty/rich content, entity/link/tag/length factors, capping, None handling; NoteChunk metadata (2); RankedResult field (2); RankingConfig toggle (1).

Mathews-Tom merged commit c8d2672 into main Mar 25, 2026
3 checks passed

Mathews-Tom deleted the feat/importance-scoring branch March 25, 2026 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(indexer): content-based importance scoring at indexing time#46

feat(indexer): content-based importance scoring at indexing time#46
Mathews-Tom merged 3 commits into
mainfrom
feat/importance-scoring

Mathews-Tom commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mathews-Tom commented Mar 25, 2026

Summary

Importance Score Formula

Ranking Integration

Changes

New Files

Modified Files

Backward Compatibility

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant