Skip to content

feat: exam-memory V2 — numpy-based semantic retrieval, question bank, knowledge sources#6

Merged
Tenstu merged 1 commit into
mainfrom
clean-main
Jun 17, 2026
Merged

feat: exam-memory V2 — numpy-based semantic retrieval, question bank, knowledge sources#6
Tenstu merged 1 commit into
mainfrom
clean-main

Conversation

@Tenstu

@Tenstu Tenstu commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Summary

  • exam-memory V2: replace ChromaDB with numpy-based bge-m3 semantic retrieval (CPU-only, Python 3.14 compatible)
  • FTS + hybrid search: lexical fts_store + RRF-based hybrid_search integration
  • QuestionBank: three extraction modes (train/mock/live) + ReviewGate dedup + full test coverage
  • KnowledgeSource Protocol: DirConnector + SourceRegistry + sources.yaml + MCP mount tools
  • Restructured codebase: move exam_memory/shared/exam_memory/, organize targets, cheatsheets, and skill references
  • Clean public branch: all private/dev-only content excluded via .gitignore

Test coverage

  • 118 tests across 6 test modules (question bank, knowledge source, source connector, vector store, FTS, hybrid search, chunking, security)
  • All tests passing

Key changes

67 files changed, 4185 insertions(+), 1263 deletions(-)

Copilot AI review requested due to automatic review settings June 17, 2026 12:36

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

There are confirmed functional issues in newly added target assets and exam-memory V2 configuration/packaging that will break intended workflows (window match update logic, source mount path resolution, and test/entrypoint inconsistencies).

Pull request overview

This PR upgrades the repo into a multi-target exam-prep harness with exam-memory V2 (numpy vector store + SQLite FTS + hybrid search), introduces a KnowledgeSource mounting/retrieval layer, and restructures content under shared/ + targets/ to separate reusable assets from target-specific materials.

Changes:

  • Add exam-memory V2 building blocks: frontmatter parsing, chunking, FTS5 store, weighted-RRF hybrid search, and source mount/fetch tools.
  • Introduce target-specific configs/prompts/cheatsheets and update Skills to reference targets/{target}/... paths instead of legacy algorithms/ and daily/.
  • Add unit tests for SourceConnector/KnowledgeSource/SourceRegistry and adjust pytest configuration.
File summaries
File Description
tests/test_source_connector.py Adds SourceConnector routing/error-boundary/parameter-forwarding tests
tests/test_knowledge_source.py Adds DirConnector + SourceRegistry behavior and edge-case tests
tests/conftest.py Adds pytest config/fixtures and ensures shared/ is importable in direct test runs
pytest.ini Configures pytest discovery and base temp directory
shared/exam_memory/source_connector.py Adds unified input adapter (text/file/chunks/source registry)
shared/exam_memory/knowledge_source.py Adds KnowledgeSource protocol + local dir connector implementation
shared/exam_memory/source_registry.py Adds mount/unmount/list/fetch lifecycle manager for knowledge sources
shared/exam_memory/sources.yaml Declares mountable knowledge sources (DirConnector-based)
shared/exam_memory/server.py Extends MCP server with hybrid retrieval and source mount/fetch tools
shared/exam_memory/vector_store.py Updates numpy vector store to reuse shared frontmatter parsing and add canonical keys
shared/exam_memory/frontmatter.py Introduces shared YAML frontmatter parsing + body extraction
shared/exam_memory/chunking.py Adds paragraph-preserving chunking utilities (OneFind-aligned sizing)
shared/exam_memory/fts_store.py Adds SQLite FTS5 lexical indexing/search layer
shared/exam_memory/hybrid_search.py Adds weighted RRF fusion across FTS and vector search
shared/exam_memory/embedding.py Adds embedder pooling verification + logging and OneFind alignment notes
shared/exam_memory/rebuild_index.py Adds CLI to rebuild vector + FTS indexes
shared/exam_memory/pyproject.toml Defines exam-memory package metadata and optional dependency groups
shared/exam_memory/init.py Exposes exam-memory V2 public API + version
shared/exam_memory/bank/README.md Adds generated bank index README scaffold
shared/exam_memory/vectorstore/.gitkeep Ensures vectorstore directory exists in repo
shared/cheatsheets/llm_core_cheatsheet.md Adds shared LLM core quick reference
shared/cheatsheets/sft_lora_miniproject.md Adds minimal LoRA/SFT mini-project guide
shared/cheatsheets/agent_project_pitch.md Adds interview pitch framing notes
shared/cheatsheets/.gitkeep Keeps shared cheatsheets directory
targets/exam_config_template.md Adds template for target exam configuration
targets/ai-lab/exam_config.md Adds AI Lab exam config (counts/points/timing/knowledge sources)
targets/ai-lab/sources/source_index.md Adds curated source index with usage notes
targets/ai-lab/prompts/mock_exam_prompt.md Adds target prompt for mock exam generation
targets/ai-lab/prompts/daily_review_prompt.md Adds daily review prompt template
targets/ai-lab/cheatsheets/math_fundamentals.md Adds math fundamentals quick reference
targets/ai-lab/cheatsheets/gnn_diffusion_cheatsheet.md Adds GNN/diffusion quick reference
targets/ai-lab/cheatsheets/ai_lab_context.md Adds org/context briefing notes
targets/pdd-algo/exam_config.md Adds PDD target exam config scaffold
targets/pdd-algo/topic_checklist.md Adds algorithm topic checklist for the PDD target
targets/pdd-algo/practice/sliding_window.py Adds a sliding-window practice implementation
targets/pdd-algo/practice/bfs_grid.py Adds a BFS grid shortest-path practice implementation
targets/pdd-algo/practice/.gitkeep Keeps practice directory
targets/pdd-algo/cheatsheets/.gitkeep Keeps cheatsheets directory
START_HERE.md Updates bootstrap instructions to new shared/ + targets/ layout
README.md Updates repo naming and documents new directory structure/target workflow
README_CN.md Same as README.md (Chinese)
prompts/new_session_prompt.md Updates session prompt to new target-aware layout
skills/init-guide.md Updates onboarding to write target config and new paths
skills/review-tracker.md Updates tracker data sources to target-aware paths
skills/solve-skeleton/SKILL.md Updates references to target-scoped mistake logs
skills/solve-skeleton/references/exam-patterns.md Generalizes patterns beyond a single lab
skills/solve-analyze/SKILL.md Updates MCP tool names and mistake-log paths
skills/solve-analyze/references/root-cause-tags.md Updates tag alignment to target-scoped mistake logs
skills/solve-analyze/references/comparison-template.md Renames mastery level usage (partial→struggling) and updates MCP naming
skills/algo-annotation.md Updates mistake log path reference to target-scoped location
skills/exam-assistant.md Updates MCP tool names to full-prefixed forms
skills/choice-q-drill.md Updates scoring config sourcing and target-scoped paths
.gitignore Updates ignores for new layout and adds patterns for temp/dev-only artifacts
llm/transformer-review.md Deletes legacy cheatsheet file (moved/replaced by shared cheatsheets)
llm/transformer-forward-pass.md Deletes legacy cheatsheet file (moved/replaced by shared cheatsheets)
exam_memory/rebuild_index.py Removes legacy V1 CLI (migrated to shared/exam_memory)
exam_memory/pyproject.toml Removes legacy V1 packaging (migrated to shared/exam_memory)
daily/README.md Removes legacy daily directory README (daily now under shared/ and targets/)
algorithms/solutions/example_ring_substring.md Removes legacy example solution (content reorganized under targets/)
algorithms/mock_exam_log.md Removes legacy mock log (now target-scoped)
algorithms/mistake_log.md Removes legacy mistake log (now target-scoped)

Copilot's findings

  • Files reviewed: 47/67 changed files
  • Comments generated: 3

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +35 to +37
[project.scripts]
exam-memory = "exam_memory.server:main"

Comment on lines +163 to +177
def _looks_like_path(s: str) -> bool:
"""启发式判断字符串是否像文件路径。

保守策略:仅当有已知文件扩展名或路径实际存在时才判定为路径。
避免 "算法/数据结构" 这类含 / 的中文文本被误判。
"""
if not s:
return False
p = Path(s)
_KNOWN_EXTS = {".md", ".txt", ".py", ".json", ".yaml", ".yml", ".csv", ".html"}
if p.suffix.lower() in _KNOWN_EXTS:
return True
if p.exists():
return True
return False
Comment on lines +8 to +13
sources:
- name: "pdd-algo-notes"
type: "local_dir"
config:
path: "targets/pdd-algo/cheatsheets/"
glob: "*.md"
@Tenstu Tenstu merged commit 6f63196 into main Jun 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants