Skip to content

Create Corpus: skip empty documents#1130

Open
leskovecg wants to merge 1 commit into
biolab:masterfrom
leskovecg:fix-skip-empty-documents
Open

Create Corpus: skip empty documents#1130
leskovecg wants to merge 1 commit into
biolab:masterfrom
leskovecg:fix-skip-empty-documents

Conversation

@leskovecg

Copy link
Copy Markdown
Collaborator

Issue

Fixes #1104 — Create Corpus widget should not output completely empty documents.

Description of changes

This PR updates the commit() method in the Create Corpus widget to filter out documents with empty text ("" or whitespace). Only documents with non-empty content are sent to the output.

If no valid documents remain, the widget sends None instead of an empty corpus.

Changes

  • commit() logic updated to skip empty documents
  • Title fallback ("?") preserved if missing
  • Existing test test_output_skips_empty_documents updated accordingly

Includes

  • Code changes
  • Tests
  • Documentation

@leskovecg leskovecg force-pushed the fix-skip-empty-documents branch from c01e8e8 to 0eea60f Compare July 14, 2025 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Corpus: do not send out empty data

1 participant