feat: exam-memory V2 — numpy-based semantic retrieval, question bank, knowledge sources#6
Conversation
There was a problem hiding this comment.
⚠️ Not ready to approve
There are confirmed functional issues in newly added target assets and exam-memory V2 configuration/packaging that will break intended workflows (window match update logic, source mount path resolution, and test/entrypoint inconsistencies).
Pull request overview
This PR upgrades the repo into a multi-target exam-prep harness with exam-memory V2 (numpy vector store + SQLite FTS + hybrid search), introduces a KnowledgeSource mounting/retrieval layer, and restructures content under shared/ + targets/ to separate reusable assets from target-specific materials.
Changes:
- Add exam-memory V2 building blocks: frontmatter parsing, chunking, FTS5 store, weighted-RRF hybrid search, and source mount/fetch tools.
- Introduce target-specific configs/prompts/cheatsheets and update Skills to reference
targets/{target}/...paths instead of legacyalgorithms/anddaily/. - Add unit tests for SourceConnector/KnowledgeSource/SourceRegistry and adjust pytest configuration.
File summaries
| File | Description |
|---|---|
| tests/test_source_connector.py | Adds SourceConnector routing/error-boundary/parameter-forwarding tests |
| tests/test_knowledge_source.py | Adds DirConnector + SourceRegistry behavior and edge-case tests |
| tests/conftest.py | Adds pytest config/fixtures and ensures shared/ is importable in direct test runs |
| pytest.ini | Configures pytest discovery and base temp directory |
| shared/exam_memory/source_connector.py | Adds unified input adapter (text/file/chunks/source registry) |
| shared/exam_memory/knowledge_source.py | Adds KnowledgeSource protocol + local dir connector implementation |
| shared/exam_memory/source_registry.py | Adds mount/unmount/list/fetch lifecycle manager for knowledge sources |
| shared/exam_memory/sources.yaml | Declares mountable knowledge sources (DirConnector-based) |
| shared/exam_memory/server.py | Extends MCP server with hybrid retrieval and source mount/fetch tools |
| shared/exam_memory/vector_store.py | Updates numpy vector store to reuse shared frontmatter parsing and add canonical keys |
| shared/exam_memory/frontmatter.py | Introduces shared YAML frontmatter parsing + body extraction |
| shared/exam_memory/chunking.py | Adds paragraph-preserving chunking utilities (OneFind-aligned sizing) |
| shared/exam_memory/fts_store.py | Adds SQLite FTS5 lexical indexing/search layer |
| shared/exam_memory/hybrid_search.py | Adds weighted RRF fusion across FTS and vector search |
| shared/exam_memory/embedding.py | Adds embedder pooling verification + logging and OneFind alignment notes |
| shared/exam_memory/rebuild_index.py | Adds CLI to rebuild vector + FTS indexes |
| shared/exam_memory/pyproject.toml | Defines exam-memory package metadata and optional dependency groups |
| shared/exam_memory/init.py | Exposes exam-memory V2 public API + version |
| shared/exam_memory/bank/README.md | Adds generated bank index README scaffold |
| shared/exam_memory/vectorstore/.gitkeep | Ensures vectorstore directory exists in repo |
| shared/cheatsheets/llm_core_cheatsheet.md | Adds shared LLM core quick reference |
| shared/cheatsheets/sft_lora_miniproject.md | Adds minimal LoRA/SFT mini-project guide |
| shared/cheatsheets/agent_project_pitch.md | Adds interview pitch framing notes |
| shared/cheatsheets/.gitkeep | Keeps shared cheatsheets directory |
| targets/exam_config_template.md | Adds template for target exam configuration |
| targets/ai-lab/exam_config.md | Adds AI Lab exam config (counts/points/timing/knowledge sources) |
| targets/ai-lab/sources/source_index.md | Adds curated source index with usage notes |
| targets/ai-lab/prompts/mock_exam_prompt.md | Adds target prompt for mock exam generation |
| targets/ai-lab/prompts/daily_review_prompt.md | Adds daily review prompt template |
| targets/ai-lab/cheatsheets/math_fundamentals.md | Adds math fundamentals quick reference |
| targets/ai-lab/cheatsheets/gnn_diffusion_cheatsheet.md | Adds GNN/diffusion quick reference |
| targets/ai-lab/cheatsheets/ai_lab_context.md | Adds org/context briefing notes |
| targets/pdd-algo/exam_config.md | Adds PDD target exam config scaffold |
| targets/pdd-algo/topic_checklist.md | Adds algorithm topic checklist for the PDD target |
| targets/pdd-algo/practice/sliding_window.py | Adds a sliding-window practice implementation |
| targets/pdd-algo/practice/bfs_grid.py | Adds a BFS grid shortest-path practice implementation |
| targets/pdd-algo/practice/.gitkeep | Keeps practice directory |
| targets/pdd-algo/cheatsheets/.gitkeep | Keeps cheatsheets directory |
| START_HERE.md | Updates bootstrap instructions to new shared/ + targets/ layout |
| README.md | Updates repo naming and documents new directory structure/target workflow |
| README_CN.md | Same as README.md (Chinese) |
| prompts/new_session_prompt.md | Updates session prompt to new target-aware layout |
| skills/init-guide.md | Updates onboarding to write target config and new paths |
| skills/review-tracker.md | Updates tracker data sources to target-aware paths |
| skills/solve-skeleton/SKILL.md | Updates references to target-scoped mistake logs |
| skills/solve-skeleton/references/exam-patterns.md | Generalizes patterns beyond a single lab |
| skills/solve-analyze/SKILL.md | Updates MCP tool names and mistake-log paths |
| skills/solve-analyze/references/root-cause-tags.md | Updates tag alignment to target-scoped mistake logs |
| skills/solve-analyze/references/comparison-template.md | Renames mastery level usage (partial→struggling) and updates MCP naming |
| skills/algo-annotation.md | Updates mistake log path reference to target-scoped location |
| skills/exam-assistant.md | Updates MCP tool names to full-prefixed forms |
| skills/choice-q-drill.md | Updates scoring config sourcing and target-scoped paths |
| .gitignore | Updates ignores for new layout and adds patterns for temp/dev-only artifacts |
| llm/transformer-review.md | Deletes legacy cheatsheet file (moved/replaced by shared cheatsheets) |
| llm/transformer-forward-pass.md | Deletes legacy cheatsheet file (moved/replaced by shared cheatsheets) |
| exam_memory/rebuild_index.py | Removes legacy V1 CLI (migrated to shared/exam_memory) |
| exam_memory/pyproject.toml | Removes legacy V1 packaging (migrated to shared/exam_memory) |
| daily/README.md | Removes legacy daily directory README (daily now under shared/ and targets/) |
| algorithms/solutions/example_ring_substring.md | Removes legacy example solution (content reorganized under targets/) |
| algorithms/mock_exam_log.md | Removes legacy mock log (now target-scoped) |
| algorithms/mistake_log.md | Removes legacy mistake log (now target-scoped) |
Copilot's findings
- Files reviewed: 47/67 changed files
- Comments generated: 3
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [project.scripts] | ||
| exam-memory = "exam_memory.server:main" | ||
|
|
| def _looks_like_path(s: str) -> bool: | ||
| """启发式判断字符串是否像文件路径。 | ||
|
|
||
| 保守策略:仅当有已知文件扩展名或路径实际存在时才判定为路径。 | ||
| 避免 "算法/数据结构" 这类含 / 的中文文本被误判。 | ||
| """ | ||
| if not s: | ||
| return False | ||
| p = Path(s) | ||
| _KNOWN_EXTS = {".md", ".txt", ".py", ".json", ".yaml", ".yml", ".csv", ".html"} | ||
| if p.suffix.lower() in _KNOWN_EXTS: | ||
| return True | ||
| if p.exists(): | ||
| return True | ||
| return False |
| sources: | ||
| - name: "pdd-algo-notes" | ||
| type: "local_dir" | ||
| config: | ||
| path: "targets/pdd-algo/cheatsheets/" | ||
| glob: "*.md" |
Summary
bge-m3semantic retrieval (CPU-only, Python 3.14 compatible)fts_store+ RRF-basedhybrid_searchintegrationReviewGatededup + full test coverageexam_memory/→shared/exam_memory/, organize targets, cheatsheets, and skill references.gitignoreTest coverage
Key changes
67 files changed, 4185 insertions(+), 1263 deletions(-)