Skip to content

fix: Document deletion now properly updates the stale knowledge filter source count#1929

Open
Vchen7629 wants to merge 10 commits into
mainfrom
stale-knowledge-filter-source-count-fix
Open

fix: Document deletion now properly updates the stale knowledge filter source count#1929
Vchen7629 wants to merge 10 commits into
mainfrom
stale-knowledge-filter-source-count-fix

Conversation

@Vchen7629

@Vchen7629 Vchen7629 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1254 . Knowledge filter containing specific documents used to show stale source counts even though the document was deleted and would show inflated counts if more sources are added even after the page was refreshed. This is because the source count was computed from query_data.filters.data_sources, a saved snapshot of filenames that's never pruned when a document is deleted.

Demo (Before)

screen-capture.5.webm

Demo (After)

screen-capture.3.webm

Changes

  • search_knowledge_filters now computes active_source_count per filter by checking which of its data_sources filenames still have indexed documents (one batched OpenSearch query via the admin client, so the count is consistent for every viewer of a shared filter)
  • Added build_existing_filenames_agg_body helper in opensearch_queries.py
  • knowledge-filter-list.tsx badge now uses active_source_count, falling back to the old .length if absent
  • knowledge-filter-panel.tsx source dropdown now filters its selected sources against the currently-active sources before display, so its count matches too
  • Added unit tests for search_knowledge_filters

Summary by CodeRabbit

  • New Features
    • Filters now show an accurate “active sources” count based on which sources currently have indexed documents, with correct pluralization.
  • Bug Fixes
    • Filter source selections are now sanitized before display, preserving wildcard behavior while removing invalid/unknown selections.
  • Tests
    • Added coverage for active source counting, malformed filter metadata handling, and deduping shared sources into fewer admin lookups.

@github-actions github-actions Bot added frontend 🟨 Issues related to the UI/UX backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) tests labels Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 080cfa71-cc72-49c2-a973-0ba12c0901e0

📥 Commits

Reviewing files that changed from the base of the PR and between 15627df and 0939eb9.

📒 Files selected for processing (1)
  • src/services/knowledge_filter_service.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/services/knowledge_filter_service.py

Walkthrough

Adds an active_source_count field to knowledge filters by introducing a backend OpenSearch aggregation that determines which referenced filenames still have indexed documents. The service enriches filter results in place; the frontend type is updated and the filter list and panel UI consume the live count.

Changes

Active source count for knowledge filters

Layer / File(s) Summary
OpenSearch query builder and service enrichment method
src/utils/opensearch_queries.py, src/services/knowledge_filter_service.py
build_existing_filenames_agg_body constructs a terms aggregation body; _attach_active_source_counts parses each filter's query_data, deduplicates filenames, runs a single batched admin aggregation, and mutates filters with active_source_count; search_knowledge_filters now awaits this method before returning.
Service tests for active\_source\_count behavior
tests/unit/test_knowledge_filter_service.py
Adds _filter and _setup_search helpers with monkeypatched OpenSearch clients, and three async tests covering zero count on deleted documents, silent handling of malformed query_data, and deduplication of shared filenames into one admin query.
Frontend type and UI updates
frontend/app/api/queries/useGetFiltersSearchQuery.ts, frontend/components/knowledge-filter-list.tsx, frontend/components/knowledge-filter-panel.tsx
Adds optional active_source_count to KnowledgeFilter; the filter list prefers it over dataSources.length; the panel's MultiSelect filters selected data source values to only those present in sourceOptions unless the wildcard "*" is first.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

enhancement

Suggested reviewers

  • edwinjosechittilappilly
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing stale knowledge filter source counts after document deletion. It is specific, clear, and directly related to the primary objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch stale-knowledge-filter-source-count-fix

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 18, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/services/knowledge_filter_service.py`:
- Around line 68-72: The zip() call that iterates over filters and
data_sources_by_filter is missing the strict=True parameter, which could allow
silent truncation bugs if the iterables have mismatched lengths during future
refactoring. Add strict=True as a parameter to the zip() function call to ensure
both iterables have matching lengths and raise a ValueError if they don't,
improving code safety and maintainability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0f46164f-0076-4d85-9a9c-7937deea4236

📥 Commits

Reviewing files that changed from the base of the PR and between 89e0400 and 15627df.

📒 Files selected for processing (6)
  • frontend/app/api/queries/useGetFiltersSearchQuery.ts
  • frontend/components/knowledge-filter-list.tsx
  • frontend/components/knowledge-filter-panel.tsx
  • src/services/knowledge_filter_service.py
  • src/utils/opensearch_queries.py
  • tests/unit/test_knowledge_filter_service.py

Comment thread src/services/knowledge_filter_service.py Outdated
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. frontend 🟨 Issues related to the UI/UX tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Knowledge filter shows 1 source even when filter contains no documents

1 participant