Skip to content

fix(rag): vectorize chapters from epub toc structure#430

Open
codedogQBY wants to merge 1 commit into
mainfrom
codex/fix-rag-chapter-structure
Open

fix(rag): vectorize chapters from epub toc structure#430
codedogQBY wants to merge 1 commit into
mainfrom
codex/fix-rag-chapter-structure

Conversation

@codedogQBY

Copy link
Copy Markdown
Owner

Summary

  • Add a reusable RAG helper that maps EPUB TOC leaves to logical chapter section ranges, using all TOC nodes as boundaries.
  • Update desktop book extraction to vectorize logical TOC chapters instead of treating every spine section as a chapter.
  • Return a human-readable number from ragToc while keeping index as the value to pass into chapter tools.

Analysis

Before this change, desktop vectorization iterated book.sections and created one ChapterData per spine section. That means front matter, volume title pages, split XHTML files, and other non-chapter sections could become AI-visible “chapters”. For EPUBs with multi-volume TOCs, asking about a numbered chapter could lead the model to inspect the wrong section or a fallback Section N title instead of the actual book structure.

The new grouping follows the EPUB TOC: leaf TOC entries become logical chapters, parent entries such as volume nodes are used as boundaries, and multiple spine sections between chapter anchors are merged into one vectorized chapter. If an EPUB has no usable TOC anchors, extraction falls back to the previous one-section-per-entry behavior.

Fixes #411

Verification

  • git diff --check
  • pnpm --filter @readany/core test -- chapter-structure tools
  • pnpm --filter @readany/core exec tsc --noEmit
  • pnpm --filter app exec tsc --noEmit
  • pnpm exec biome check packages/core/src/rag/chapter-structure.ts packages/core/src/rag/chapter-structure.test.ts packages/core/src/rag/index.ts packages/core/src/ai/tools/rag-tools.ts packages/core/src/ai/__tests__/tools.test.ts packages/app/src/lib/rag/book-extractor.ts (passes with existing noExplicitAny warnings in legacy test/extractor code)

@codedogQBY codedogQBY added bug Something isn't working priority:p1 High: important feature broken or major platform/workflow regression area:ai AI, model configuration, vectorization, citations, prompts area:reader Reader, pagination, scrolling, layout, TOC labels Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:ai AI, model configuration, vectorization, citations, prompts area:reader Reader, pagination, scrolling, layout, TOC bug Something isn't working priority:p1 High: important feature broken or major platform/workflow regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] 对书本提问时,无法正确理解和定位章节

1 participant