feat(indexer): content-based importance scoring at indexing time#46
Merged
Conversation
New module indexer/importance.py computes importance_score (0.0-1.0) from four equal-weight factors: entity density, wikilink density, content length, and tag count. Score stored in NoteChunk and persisted to ChromaDB metadata for use in ranking. Add importance_score field to NoteChunk and to_chroma_metadata(). Add importance_scoring_enabled toggle to RankingConfig.
Compute note-level importance in chunk_note() and propagate to all chunks. In rank_results(), extract importance_score from metadata and apply up to 15% boost to the final composite score for high-importance notes. Add importance_score field to RankedResult.
15 tests across 4 classes: compute_importance function (10) covering empty/rich content, entity/link/tag/length factors, capping, None handling; NoteChunk metadata (2); RankedResult field (2); RankingConfig toggle (1).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add content-based importance scoring computed at indexing time, inspired by memora-lab/memory-service-public's observation-level importance scoring pattern. Each note chunk now carries a pre-computed
importance_score(0.0-1.0) based on content richness signals, providing a ranking boost for well-connected, well-tagged, link-rich content without additional query-time computation.Importance Score Formula
Four factors with equal weight (0.25 each):
min(entity_count / 5, 1.0)min(wikilink_count / 10, 1.0)[[wikilinks]]are better integratedmin(word_count / 500, 1.0)min(tag_count / 5, 1.0)Ranking Integration
The importance score provides up to 15% boost to the final composite score:
This is applied as a post-multiplier after composite scoring (semantic + recency + density + activation + type weights), similar to how mode and status multipliers work.
Changes
New Files
src/vaultmind/indexer/importance.py(34 lines) —compute_importance(content, tags, entities)function. Uses regex for wikilink extraction, word splitting for length. All factors capped at 1.0, combined as equal-weight averagetests/test_importance.py(135 lines) — 15 tests across 4 classesModified Files
src/vaultmind/vault/models.py— Addedimportance_score: float = 0.0field toNoteChunk. Updatedto_chroma_metadata()return type todict[str, str | int | float]and includedimportance_scorein outputsrc/vaultmind/vault/parser.py— Computes note-level importance inchunk_note()viacompute_importance()and propagates to all chunks created from that notesrc/vaultmind/indexer/ranking.py— Addedimportance_score: float = 0.0toRankedResult. Extracts importance from chunk metadata inrank_results()and applies 15% boost to final scoresrc/vaultmind/config.py— Addedimportance_scoring_enabled: bool = TruetoRankingConfigconfig/default.toml— Addedimportance_scoring_enabled = trueto[ranking]Backward Compatibility
importance_scoredefaults to 0.0 onNoteChunk— existing chunks without the field rank as beforeto_chroma_metadata()return type widened fromdict[str, str | int]todict[str, str | int | float]— strictly additiveTest plan
test_importance.pyacross 4 classes:ruff check— cleanmypy --ignore-missing-imports— clean