fix: add warning logs for 10k aggregation limit in sync operations#1912
fix: add warning logs for 10k aggregation limit in sync operations#1912yogichipalkatti wants to merge 1 commit into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Warning
|
84ace72 to
e7531b7
Compare
richardz403
left a comment
There was a problem hiding this comment.
Looks good to me.
The additional code correctly alerts and warns when connector file IDs, document IDs, or filename aggregations hit 10k items.
Ran locally to check for other conflicting issues, none were found.
NishadA05
left a comment
There was a problem hiding this comment.
This adds useful warning logs when the OpenSearch aggregations hit the 10,000 bucket limit, which makes possible truncation visible instead of silently returning incomplete sync data.
One note: this does not fully remove the 10k limit, but it is a helpful improvement for now.
Wallgau
left a comment
There was a problem hiding this comment.
Great work, please create an issue for the enhanced solution
Added warning logs to alert operators when OpenSearch terms aggregations
hit the 10,000 bucket limit in connector sync operations. This helps
detect potential data truncation issues in workspaces with >10k unique
document IDs, filenames, or connector file IDs.
Affected functions:
document_id, or filename aggregations reach the limit
reaches the limit
This is an interim solution to surface the issue. A future enhancement
should implement composite aggregation with pagination to handle
arbitrarily large result sets without truncation.
Related: Connector sync may miss documents when workspace exceeds 10k
unique values per field