fix(mobile): pass PDF format into vectorization extractor by codedogQBY · Pull Request #451 · codedogQBY/ReadAny

codedogQBY · 2026-06-13T05:11:19Z

Analysis

Issue #301 reports that small EPUB files can be vectorized, but searchable/text-layer PDFs cannot. The mobile vectorization flow extracts chapters through the hidden reader WebView before handing text to the core vectorization pipeline.

The manual mobile vectorization queue was always calling the extractor as if the file were an EPUB:

useVectorizationQueue passed application/epub+zip for every book format.
ExtractorWebView always sent fileName: "book.epub" to the reader asset.

That means a PDF selected for vectorization could enter the hidden reader/extractor with EPUB identity instead of PDF identity. The reader can rely on both MIME and filename extension for format-specific loading paths, so PDF chapter extraction could fail before vectorization ever receives text.

Changes

Add mobile vectorization MIME mapping by book format in useVectorizationQueue.
Pass application/pdf for PDF books instead of hard-coded EPUB MIME.
Derive the hidden extractor filename from MIME in ExtractorWebView, e.g. book.pdf for application/pdf.
Keep existing EPUB behavior as the fallback for unknown formats.
Remove an existing non-null queue assertion while touching the file so targeted Biome stays clean.

Scope Notes

Desktop already has an explicit PDF text extraction path in packages/app/src/lib/rag/book-extractor.ts; this PR targets the mobile hidden-reader extraction path used by manual vectorization and fallback content extraction. The broader large-file/mobile memory constraints are covered by the separate large-file PR work, while this PR fixes the format-routing bug that blocks text-layer PDFs from reaching the vectorization pipeline.

Verification

pnpm --filter @readany/app-expo exec tsc --noEmit
pnpm exec biome check packages/app-expo/src/components/rag/ExtractorWebView.tsx packages/app-expo/src/screens/library/useVectorizationQueue.ts
git diff --check

Fixes #301

fix(mobile): pass book format into vectorization extractor

5b51ccb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mobile): pass PDF format into vectorization extractor#451

fix(mobile): pass PDF format into vectorization extractor#451
codedogQBY wants to merge 1 commit into
mainfrom
codex/fix-mobile-pdf-vectorization-extraction

codedogQBY commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codedogQBY commented Jun 13, 2026

Analysis

Changes

Scope Notes

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant