bug: OpenSearch terms aggregation capped at 10,000 — silent data truncation in connector sync

## Summary

`src/api/connectors.py` uses an OpenSearch `terms` aggregation with `"size": 10000` to retrieve unique `document_id` and `filename` buckets. When a workspace contains more than 10,000 unique values, results are silently truncated and some documents/filenames will be invisible to the sync logic, potentially causing missed syncs or stale data.

## Location

`src/api/connectors.py` — approximately lines 44 and 50 (pre-existing on `main`)

## Expected behaviour

All unique `document_id` / `filename` values should be retrieved. Either:
- Switch to a [composite aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html) with pagination to handle arbitrarily large result sets, or
- At minimum, emit a warning log when the returned bucket count equals 10,000 so operators are alerted to the cap.

## Actual behaviour

Silently returns at most 10,000 buckets. Workspaces with more than 10,000 unique document IDs or filenames will have the excess silently dropped.

## Impact

Silent data loss during connector sync for large workspaces. No error or warning is surfaced to the operator.

## References

- Identified during review of PR #1538 (https://github.com/langflow-ai/openrag/pull/1538) by @edwinjosechittilappilly
- Pre-existing issue on `main`, not introduced by that PR


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: OpenSearch terms aggregation capped at 10,000 — silent data truncation in connector sync #1548

Summary

Location

Expected behaviour

Actual behaviour

Impact

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: OpenSearch terms aggregation capped at 10,000 — silent data truncation in connector sync #1548

Description

Summary

Location

Expected behaviour

Actual behaviour

Impact

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions