Skip to content

fix: add warning logs for 10k aggregation limit in sync operations#1912

Open
yogichipalkatti wants to merge 1 commit into
langflow-ai:release-0.5.1from
yogichipalkatti:fix/1548aggregationcapped
Open

fix: add warning logs for 10k aggregation limit in sync operations#1912
yogichipalkatti wants to merge 1 commit into
langflow-ai:release-0.5.1from
yogichipalkatti:fix/1548aggregationcapped

Conversation

@yogichipalkatti

Copy link
Copy Markdown
Collaborator

Added warning logs to alert operators when OpenSearch terms aggregations
hit the 10,000 bucket limit in connector sync operations. This helps
detect potential data truncation issues in workspaces with >10k unique
document IDs, filenames, or connector file IDs.

Affected functions:

  • get_synced_file_ids_for_connector(): Warns when connector_file_id,
    document_id, or filename aggregations reach the limit
  • get_synced_id_to_filename_map(): Warns when document_id aggregation
    reaches the limit

This is an interim solution to surface the issue. A future enhancement
should implement composite aggregation with pagination to handle
arbitrarily large result sets without truncation.

Related: Connector sync may miss documents when workspace exceeds 10k
unique values per field

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 321cef32-fef6-471c-b094-8d9d01919360

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Warning

.coderabbit.yaml has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)
Validation error: Too big: expected number to be <=900000 at "reviews.tools.github-checks.timeout_ms"
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added community documentation 📘 Improvements or additions to documentation frontend 🟨 Issues related to the UI/UX backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) ci ⬛ CI/CD, build, and infrastructure issues docker tests bug 🔴 Something isn't working. and removed community documentation 📘 Improvements or additions to documentation labels Jun 17, 2026
@yogichipalkatti yogichipalkatti force-pushed the fix/1548aggregationcapped branch from 84ace72 to e7531b7 Compare June 17, 2026 18:39
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 17, 2026

@richardz403 richardz403 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

The additional code correctly alerts and warns when connector file IDs, document IDs, or filename aggregations hit 10k items.

Ran locally to check for other conflicting issues, none were found.

@Vchen7629 Vchen7629 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, verified that all the tests pass when running locally

@NishadA05 NishadA05 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds useful warning logs when the OpenSearch aggregations hit the 10,000 bucket limit, which makes possible truncation visible instead of silently returning incomplete sync data.

One note: this does not fully remove the 10k limit, but it is a helpful improvement for now.

@Wallgau Wallgau left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, please create an issue for the enhanced solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. ci ⬛ CI/CD, build, and infrastructure issues docker frontend 🟨 Issues related to the UI/UX tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants