fix: improve error handling and status updates for document processing and add OCR requirement check for image files by ricofurtado · Pull Request #1881 · langflow-ai/openrag

ricofurtado · 2026-06-15T19:31:37Z

Supported File Types by Ingestion Path

There are currently two different supported file type lists depending on how content is ingested into OpenRAG.

Direct Upload (Frontend File Picker)

Defined in knowledge-dropdown.tsx (lines 60–76).

Category	Extensions
Documents	`.txt`, `.md`, `.pdf`, `.docx`, `.html`, `.htm`, `.adoc`, `.asciidoc`, `.asc`
Spreadsheets	`.csv`, `.xlsx`
Presentations	`.pptx`

Connector Ingestion (Backend Validator)

Defined in processors.py (lines 645–668).

Includes all file types supported by Direct Upload, plus additional Office formats, XHTML, and image formats.

Category	Extensions
Documents	`.txt`, `.md`, `.pdf`, `.docx`, `.html`, `.htm`, `.adoc`, `.asciidoc`, `.asc`
Additional Office Documents	`.dotx`, `.dotm`, `.docm`
Spreadsheets	`.csv`, `.xlsx`, `.xls`
Presentations	`.pptx`, `.potx`, `.ppsx`, `.pptm`, `.potm`, `.ppsm`
Web Documents	`.xhtml`
Images	`.jpg`, `.jpeg`, `.png`, `.tiff`, `.bmp`, `.webp`

Root Cause of the Reported Defect

The image formats (.jpg, .jpeg, .png, .tiff, .bmp, .webp) are the source of the reported issue.

What Happens

The connector ingestion validator accepts image files based on extension.
The files are sent through the ingestion pipeline.
Images only produce extractable text when Docling OCR is enabled.
When OCR is disabled, no text content is extracted.
No chunks are generated for indexing.
The updated ingestion logic now correctly marks these files as Failed rather than Active.

Existing Frontend Limitation

The frontend intentionally excludes several file types that are accepted by connector ingestion.

A TODO comment in knowledge-dropdown.tsx (line 57) states:

Re-add other MIME/extension groups (images, xlsx/xls/ppt, rtf/odt, etc.) once ingestion is verified end-to-end in Langflow.

This indicates that support for images and additional document types was intentionally withheld from the file picker until end-to-end ingestion behavior was fully validated.

Gap Between Frontend and Connector Validation

Area	Images Allowed?
Frontend File Picker	❌ No
Connector Ingestion Validator	✅ Yes

This inconsistency allows image files to be ingested through connectors while bypassing the frontend restrictions, creating the gap that exposed the bug.

Summary by CodeRabbit

Bug Fixes

Improved validation of document processing results to ensure failures are correctly detected and reported when content is not properly indexed
Enhanced error messages for image files to provide specific guidance when OCR processing is required for content extraction

…g and add OCR requirement check for image files

coderabbitai · 2026-06-15T19:31:48Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2589f73b-0035-4332-8dc7-fa1048d1b7a1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cpd-files-with-unsupported-extensions-are-incorrectly-displayed-as-Active-Completed-during-connector-ingestion

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix: improve error handling and status updates for document processin…

960b02a

…g and add OCR requirement check for image files

github-actions Bot added the backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) label Jun 15, 2026

style: ruff autofix (auto)

795337d

ricofurtado requested a review from lucaseduoli June 15, 2026 19:32

github-actions Bot added bug 🔴 Something isn't working. tests and removed bug 🔴 Something isn't working. labels Jun 15, 2026

ricofurtado force-pushed the cpd-files-with-unsupported-extensions-are-incorrectly-displayed-as-Active-Completed-during-connector-ingestion branch from 5b36bb9 to 795337d Compare June 15, 2026 22:19

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve error handling and status updates for document processing and add OCR requirement check for image files#1881

fix: improve error handling and status updates for document processing and add OCR requirement check for image files#1881
ricofurtado wants to merge 2 commits into
release-cpdfrom
cpd-files-with-unsupported-extensions-are-incorrectly-displayed-as-Active-Completed-during-connector-ingestion

ricofurtado commented Jun 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ricofurtado commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Supported File Types by Ingestion Path

Direct Upload (Frontend File Picker)

Connector Ingestion (Backend Validator)

Root Cause of the Reported Defect

What Happens

Existing Frontend Limitation

Gap Between Frontend and Connector Validation

Summary by CodeRabbit

Bug Fixes

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ricofurtado commented Jun 15, 2026 •

edited

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading