Skip to content

feat: send PDFs as native document attachments#154

Closed
pranavp311 wants to merge 1 commit into
justrach:mainfrom
pranavp311:feat/pdf-attachment-parsing
Closed

feat: send PDFs as native document attachments#154
pranavp311 wants to merge 1 commit into
justrach:mainfrom
pranavp311:feat/pdf-attachment-parsing

Conversation

@pranavp311

@pranavp311 pranavp311 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Pivot PDF handling from local text extraction/OCR to native provider document blocks, removing liteParse/PDFium/Tesseract while preserving PDF attachment support in harness, codegraff TUI, and graff REPL flows.

Context

The previous approach extracted PDF text locally through liteParse, which introduced heavyweight native/network-fetching dependencies through vendored PDFium and Tesseract OCR. The underlying bug was that PDFs were carried as Content::Image with application/pdf, then serialized as image blocks even though providers only accept image/* for image payloads.

Changes

  • Removed the forge_pdf crate and the liteParse-based local PDF extraction path.
  • Kept PDFs on the existing byte-carrier path and added PDF detection/capability metadata.
  • Classified .pdf attachments as application/pdf and returned fs_read PDFs as byte carriers instead of extracted text.
  • Serialized PDFs as native document/file parts for supported providers.
  • Added explicit unsupported-PDF errors for provider paths that do not currently support native PDF document blocks.
  • Updated codegraff TUI attachment handling so /image <path>, direct path paste, and drag/drop can attach PDFs as document attachments.
  • Left GUI handling out of scope for a separate PR.

Key Implementation Details

PDFs intentionally reuse the existing image byte carrier because it already stores MIME type. Provider serialization branches on application/pdf: supported providers emit document/file blocks, while regular images continue through image serialization. This avoids threading a new exhaustive content variant through the codebase.

TUI and REPL behavior is covered through the existing @[path] attachment syntax:

  • codegraff TUI accepts PDFs through /image <path> and through pasted/dropped file paths.
  • graff REPL drag/drop and direct path paste are already normalized into @[path], which then uses the same attachment classification path that now maps .pdf to application/pdf.

Testing

Validated locally in the PR worktree:

cargo test -p forge_domain
cargo test -p forge_services
cargo test -p forge_repo
cargo test -p codegraff image_command_accepts_pdf_as_document_attachment
cargo test -p forge_main wrap_pasted_text
cargo check

Also verified by search that the active tree has no remaining liteparse, liteParse, pdfium, tesseract, or forge_pdf references.

Known Limitations

  • Provider gating is provider-level, not per-model; text-only models on supported providers may return provider API errors.
  • OpenAI Responses still rejects PDFs and can be enhanced later with input_file.
  • Snapshot runner was unavailable locally, so snapshot verification used the standard affected-crate test fallback.
  • Existing unrelated branch issues remain outside this PR: forge_app unit-test breakage and forge_sdk_python stale interruption reason references.

@github-actions github-actions Bot added the type: feature Brand new functionality, features, pages, workflows, endpoints, etc. label Jun 2, 2026
@pranavp311 pranavp311 changed the title feat: parse PDF attachments as read-tool text input feat: send PDFs as native document attachments Jun 2, 2026
@github-actions github-actions Bot added the type: docs Related to documentation and information. label Jun 2, 2026
Co-Authored-By: blackfloofie-a codegraff agent <265516171+blackfloofie@users.noreply.github.com>
@pranavp311 pranavp311 force-pushed the feat/pdf-attachment-parsing branch from 8e2f8ef to 8c783af Compare June 2, 2026 10:56
@pranavp311

Copy link
Copy Markdown
Contributor Author

Closing this PR: the PDF provider-shape changes are no longer wanted and should not be reviewed or merged.

Co-Authored-By: blackfloofie-a codegraff agent 265516171+blackfloofie@users.noreply.github.com

@pranavp311 pranavp311 closed this Jun 2, 2026
@pranavp311 pranavp311 deleted the feat/pdf-attachment-parsing branch June 2, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: docs Related to documentation and information. type: feature Brand new functionality, features, pages, workflows, endpoints, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant