Skip to content

feat(indexer): content-based importance scoring at indexing time#46

Merged
Mathews-Tom merged 3 commits into
mainfrom
feat/importance-scoring
Mar 25, 2026
Merged

feat(indexer): content-based importance scoring at indexing time#46
Mathews-Tom merged 3 commits into
mainfrom
feat/importance-scoring

Conversation

@Mathews-Tom

Copy link
Copy Markdown
Owner

Summary

Add content-based importance scoring computed at indexing time, inspired by memora-lab/memory-service-public's observation-level importance scoring pattern. Each note chunk now carries a pre-computed importance_score (0.0-1.0) based on content richness signals, providing a ranking boost for well-connected, well-tagged, link-rich content without additional query-time computation.

Importance Score Formula

Four factors with equal weight (0.25 each):

Factor Calculation Rationale
Entity density min(entity_count / 5, 1.0) Notes mentioning more graph entities are more connected
Link density min(wikilink_count / 10, 1.0) Notes with more [[wikilinks]] are better integrated
Content length min(word_count / 500, 1.0) Longer notes (up to cap) tend to be more substantive
Tag count min(tag_count / 5, 1.0) More tags indicate broader relevance
importance_score = (entity_score + link_score + length_score + tag_score) / 4.0

Ranking Integration

The importance score provides up to 15% boost to the final composite score:

if importance > 0:
    final *= 1.0 + 0.15 * importance

This is applied as a post-multiplier after composite scoring (semantic + recency + density + activation + type weights), similar to how mode and status multipliers work.

Changes

New Files

  • src/vaultmind/indexer/importance.py (34 lines) — compute_importance(content, tags, entities) function. Uses regex for wikilink extraction, word splitting for length. All factors capped at 1.0, combined as equal-weight average
  • tests/test_importance.py (135 lines) — 15 tests across 4 classes

Modified Files

  • src/vaultmind/vault/models.py — Added importance_score: float = 0.0 field to NoteChunk. Updated to_chroma_metadata() return type to dict[str, str | int | float] and included importance_score in output
  • src/vaultmind/vault/parser.py — Computes note-level importance in chunk_note() via compute_importance() and propagates to all chunks created from that note
  • src/vaultmind/indexer/ranking.py — Added importance_score: float = 0.0 to RankedResult. Extracts importance from chunk metadata in rank_results() and applies 15% boost to final score
  • src/vaultmind/config.py — Added importance_scoring_enabled: bool = True to RankingConfig
  • config/default.toml — Added importance_scoring_enabled = true to [ranking]

Backward Compatibility

  • importance_score defaults to 0.0 on NoteChunk — existing chunks without the field rank as before
  • Importance boost is multiplicative with a floor of 1.0 (0 importance = no boost)
  • to_chroma_metadata() return type widened from dict[str, str | int] to dict[str, str | int | float] — strictly additive
  • All existing ranking/composite tests pass unchanged

Test plan

  • 15 new tests in test_importance.py across 4 classes:
    • compute_importance (10): empty/rich content, entity/link/tag/length factors, capping at 1.0, None handling, score range validation
    • NoteChunk metadata (2): importance in to_chroma_metadata(), default value
    • RankedResult field (2): default 0.0, explicit population
    • RankingConfig (1): importance_scoring_enabled default
  • All existing ranking tests pass unchanged (60 in test_composite_ranking.py + test_ranking.py)
  • Full suite: 981/981 tests pass, 0 regressions
  • ruff check — clean
  • mypy --ignore-missing-imports — clean
  • Manual: re-index vault, verify importance_score appears in ChromaDB metadata
  • Manual: confirm link-rich permanent notes rank higher than sparse fleeting notes

New module indexer/importance.py computes importance_score (0.0-1.0)
from four equal-weight factors: entity density, wikilink density,
content length, and tag count. Score stored in NoteChunk and persisted
to ChromaDB metadata for use in ranking.

Add importance_score field to NoteChunk and to_chroma_metadata().
Add importance_scoring_enabled toggle to RankingConfig.
Compute note-level importance in chunk_note() and propagate to all
chunks. In rank_results(), extract importance_score from metadata and
apply up to 15% boost to the final composite score for high-importance
notes. Add importance_score field to RankedResult.
15 tests across 4 classes: compute_importance function (10) covering
empty/rich content, entity/link/tag/length factors, capping, None
handling; NoteChunk metadata (2); RankedResult field (2); RankingConfig
toggle (1).
@Mathews-Tom Mathews-Tom merged commit c8d2672 into main Mar 25, 2026
3 checks passed
@Mathews-Tom Mathews-Tom deleted the feat/importance-scoring branch March 25, 2026 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant