feat: send PDFs as native document attachments#154
Closed
pranavp311 wants to merge 1 commit into
Closed
Conversation
Co-Authored-By: blackfloofie-a codegraff agent <265516171+blackfloofie@users.noreply.github.com>
8e2f8ef to
8c783af
Compare
Contributor
Author
|
Closing this PR: the PDF provider-shape changes are no longer wanted and should not be reviewed or merged. Co-Authored-By: blackfloofie-a codegraff agent 265516171+blackfloofie@users.noreply.github.com |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pivot PDF handling from local text extraction/OCR to native provider document blocks, removing liteParse/PDFium/Tesseract while preserving PDF attachment support in harness, codegraff TUI, and graff REPL flows.
Context
The previous approach extracted PDF text locally through liteParse, which introduced heavyweight native/network-fetching dependencies through vendored PDFium and Tesseract OCR. The underlying bug was that PDFs were carried as
Content::Imagewithapplication/pdf, then serialized as image blocks even though providers only acceptimage/*for image payloads.Changes
forge_pdfcrate and the liteParse-based local PDF extraction path..pdfattachments asapplication/pdfand returned fs_read PDFs as byte carriers instead of extracted text./image <path>, direct path paste, and drag/drop can attach PDFs as document attachments.Key Implementation Details
PDFs intentionally reuse the existing image byte carrier because it already stores MIME type. Provider serialization branches on
application/pdf: supported providers emit document/file blocks, while regular images continue through image serialization. This avoids threading a new exhaustive content variant through the codebase.TUI and REPL behavior is covered through the existing
@[path]attachment syntax:/image <path>and through pasted/dropped file paths.@[path], which then uses the same attachment classification path that now maps.pdftoapplication/pdf.Testing
Validated locally in the PR worktree:
Also verified by search that the active tree has no remaining
liteparse,liteParse,pdfium,tesseract, orforge_pdfreferences.Known Limitations
input_file.