fix: improve pagination and scroll to the file row that is currently processing the ingestion by Wallgau · Pull Request #1884 · langflow-ai/openrag

Wallgau · 2026-06-15T20:06:50Z

Screen.Recording.2026-06-15.at.4.03.46.PM.mov

Summary
When ingesting files on the Knowledge page, the grid now automatically jumps to the correct pagination page so users can see processing status without manually hunting through pages.

Overwrite (file already indexed): navigates to the page containing that file’s existing row
New file: navigates to the last page where the processing overlay is appended
Cloud uploads (Google Drive, OneDrive, SharePoint): focus targets persist across the redirect from /upload/[provider] → /knowledge
Problem
The Knowledge table uses AG Grid with client-side pagination (25 rows/page). After uploading or overwriting a file, the grid stayed on the current page — e.g. staying on page 1 while an overwritten file on page 2 entered processing. Local uploads and cloud connector uploads both had this gap; cloud uploads additionally lost focus events because navigation unmounted the Knowledge page before the event could fire.

Test plan

Overwrite on another page: On page 1, overwrite a file whose row is on page 2 → grid jumps to page 2, row shows processing

New local upload: On page 1, upload a new file → grid jumps to last page when processing row appears

Google Drive upload: Ingest from /upload/google_drive, redirected to Knowledge → grid jumps to correct page

Google Drive overwrite: Pick an existing file, confirm overwrite → grid jumps to that file’s page

Already on correct page: Overwrite while viewing the file’s page → stays on same page, status updates

No duplicate/missing rows: Paginate through all pages after mixed uploads; no ghost or duplicate rows

Summary by CodeRabbit

New Features
- Knowledge grid now automatically navigates to and focuses newly added or processing ingest rows during ingestion, including restore after pagination/row updates; focus is triggered from both upload and sync actions.
Improvements
- Task overlays now match files using multiple filename/source URL variations with priority-based status resolution (processing over failed) for more reliable row mapping.
- Ingest-focus state is persisted for the session and applied on page load.
Other
- Added additional AG Grid modules to improve row navigation.
- Backend connector sync now propagates connector type into file processing.

…ssing the ingestion

…ssing the ingestion, make sure it works for connectors too

coderabbitai · 2026-06-15T20:07:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds a new knowledge-grid-pagination.ts module with helpers to locate, paginate to, and scroll-focus knowledge grid rows by identity. Refactors buildKnowledgeTableRows to use multi-key alias matching for taskFile overlays and updated status fallback logic. Integrates focus queuing into upload flows, registers additional AgGrid modules, creates a useKnowledgeIngestFocus hook to coordinate pending-row focusing, wires the hook into the knowledge page via grid callbacks. Threads connector_type parameter through backend file processors and service calls.

Changes

Auto-focus pending ingest rows in knowledge grid

Layer / File(s)	Summary
Alias-key file matching for taskFile overlays `frontend/lib/knowledge-table-state.ts`	`buildKnowledgeTableRows` now generates and indexes taskFile overlays across multiple alias keys (filename, source_url, and basename) with priority ordering. Merged status uses taskFile value only for `processing`/`failed`, otherwise falls back to backend status. Backend filtering excludes task files whose alias keys already exist in the backend set.
Grid pagination and row identity helpers `frontend/lib/knowledge-grid-pagination.ts`	New module defines `IngestFocusMode` type and session persistence utilities for ingest-focus targets. Exports `dispatchKnowledgeIngestFocus` to emit events and `queueKnowledgeIngestFocusForCloudFiles` to persist focus targets. Implements row resolution logic to match by alias-key sets, infer focus mode, locate visible matching rows after filter/sort, and collect resolvable identities. Exports `collectNewIngestFocusIdentities` and `collectProcessingFocusIdentities` to extract identity targets from status transitions. Exports `goToGridRowIdentities` to compute focus mode and jump/scroll to rows, and `focusPendingIngestRows` as the high-level focus routine.
Upload flow integration `frontend/app/upload/[provider]/page.tsx`, `frontend/components/knowledge-dropdown.tsx`	Cloud and dropdown file uploads now dispatch/queue knowledge ingest focus actions using the new helpers, passing through the replace/overwrite flag to signal focus mode.
AgGrid module registration `frontend/components/AgGrid/registerAgGridModules.ts`	Registers `ClientSideRowModelApiModule`, `RowApiModule`, and `ScrollApiModule` to support programmatic pagination and row scrolling.
useKnowledgeIngestFocus hook `frontend/hooks/useKnowledgeIngestFocus.ts`	New React hook coordinates focusing ingest-related rows in an AG Grid instance. Maintains a pending-focus queue via refs, resolves focus via `focusPendingIngestRows` with double `requestAnimationFrame` scheduling, loads and queues persisted focus targets on mount, listens for window `KNOWLEDGE_INGEST_FOCUS_EVENT` events, detects newly-processing identities on data changes, and exposes `gridRowsSelectionKey` and callbacks for grid-ready and row-data-updated events.
Knowledge page focus orchestration `frontend/app/knowledge/page.tsx`	Integrates the `useKnowledgeIngestFocus` hook into the knowledge page. Replaces local `gridRowsSelectionKey` computation with hook-provided values and callbacks. Wires `onKnowledgeGridReady` and `onKnowledgeRowDataUpdated` handlers on both cloud-brand and non-cloud AgGrid instances.

Backend connector type parameter threading

Layer / File(s)	Summary
ConnectorFileProcessor type parameter initialization `src/models/processors.py`	`ConnectorFileProcessor` constructor now accepts an optional `connector_type: str \| None` parameter, allowing callers to override the effective connector type used during processing.
Connector type derivation in processing pipeline `src/models/processors.py`	`process_item` derives a local `connector_type` value from the instance override or falls back to the connection's type, then threads it through Langflow ingestion, standard OpenRAG processing, and connector-metadata update calls.
ConnectorService type wiring `src/connectors/service.py`	`ConnectorService` now passes `connector_type` (from `connector.CONNECTOR_TYPE` or defaulting to `"local"`) into `ConnectorFileProcessor` in both `sync_connector_files` and `sync_specific_files`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

langflow-ai/openrag#1706: Both PRs modify frontend/app/upload/[provider]/page.tsx around submitSync/the replaceDuplicates flag—main PR adds queueKnowledgeIngestFocusForCloudFiles inside submitSync, while retrieved PR changes the duplicate-checking flow that determines when/how submitSync(..., false) is called—so the changes are directly coupled in that function.
langflow-ai/openrag#1663: Both PRs touch the upload provider ingest flow (frontend/app/upload/[provider]/page.tsx) by extending submitSync around the replaceDuplicates/replace_duplicates behavior—main PR uses it to enqueue ingest-focus actions, while the retrieved PR uses it to drive filename duplicate replacement during sync.

Suggested reviewers

lucaseduoli
ricofurtado

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 48.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: improving pagination and scroll behavior to navigate to files currently processing ingestion.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pagination-improvement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 73-75: The code at lines 73-75 queues unseen identities for focus
without verifying their status, which can trigger unwanted pagination jumps for
identities that have already reached `active` or other non-processing states.
Add a status check to ensure the identity is in `processing` state before
pushing it to the identities array in the condition that checks if (!prev). Only
queue identities that are actively processing to maintain the intended
pagination behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de265f7b-29cf-435b-86e3-5b9ce21f0f08

📥 Commits

Reviewing files that changed from the base of the PR and between 6f982a9 and cfb6afb.

📒 Files selected for processing (3)

frontend/app/knowledge/page.tsx
frontend/lib/knowledge-grid-pagination.ts
frontend/lib/knowledge-table-state.ts

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

frontend/lib/knowledge-grid-pagination.ts (1)

127-131: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Expand the queued identity before inferring "new" vs "existing".

This only checks the raw queued identity against each row’s aliases. If the event queued a filename but the existing backend row is only identifiable by source_url/basename, overwrites are misclassified as "new" and jump to the last page instead of the existing row.

🐛 Proposed fix

 export function inferIngestFocusMode(
   identity: string,
   rowData: GridRowLike[],
 ): IngestFocusMode {
-  const identitySet = new Set([identity]);
+  const identitySet = new Set(
+    getKnowledgeFileAliasKeys({ filename: identity, source_url: identity }),
+  );
   const hasMatch = rowData.some((row) =>
     rowMatchesIdentitySet(row, identitySet),
   );

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/lib/knowledge-grid-pagination.ts` around lines 127 - 131, The
function only checks the raw queued identity value against each row's aliases,
which causes misclassification when the identity format differs between the
queued event and existing backend rows (e.g., filename vs source_url/basename).
Before creating the identitySet on line 127, expand the raw identity to include
all its possible identifying variants (such as source_url and basename
equivalents) so that rowMatchesIdentitySet can properly detect existing rows
regardless of which identifier format was used. Look for an existing expand or
normalize function that converts a single identity into its full set of
equivalent identifiers.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 231-267: In the `collectResolvableIdentities` function, when an
alias key is found to match an identity in the `identities` set, the function
currently adds that alias key to the `found` result. However, the function
should instead return the original pending identity that corresponds to the
matched alias. Modify both the loop within `api.forEachNodeAfterFilterAndSort`
(around the `getKnowledgeFileAliasKeys(data)` iteration) and the loop iterating
over `rowData` (around the second `getKnowledgeFileAliasKeys(row)` iteration) so
that when `identities.has(key)` is true, you add the original identity itself to
the `found` set rather than the alias key. This ensures that callers receive the
original pending ids and avoid repeated pagination attempts when rows are
matched through their aliases.

In `@frontend/lib/knowledge-table-state.ts`:
- Around line 152-157: The backendIdentityKeys are being built from backendFiles
which has been mutated with aliases copied from taskFile at lines 139-140. This
causes unrelated pending tasks to incorrectly match as already represented and
get dropped. To fix this, build the backendIdentityKeys from the original raw
backend files data before any aliases were merged in, rather than from the
mutated backendFiles variable. This ensures the dedupe keys only contain the
authentic backend identities and prevents false matches with pending tasks.

---

Duplicate comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 127-131: The function only checks the raw queued identity value
against each row's aliases, which causes misclassification when the identity
format differs between the queued event and existing backend rows (e.g.,
filename vs source_url/basename). Before creating the identitySet on line 127,
expand the raw identity to include all its possible identifying variants (such
as source_url and basename equivalents) so that rowMatchesIdentitySet can
properly detect existing rows regardless of which identifier format was used.
Look for an existing expand or normalize function that converts a single
identity into its full set of equivalent identifiers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 29a27e4d-2415-4b2d-beac-24f14e237f6b

📥 Commits

Reviewing files that changed from the base of the PR and between cfb6afb and d01d60c.

📒 Files selected for processing (6)

frontend/app/knowledge/page.tsx
frontend/app/upload/[provider]/page.tsx
frontend/components/AgGrid/registerAgGridModules.ts
frontend/components/knowledge-dropdown.tsx
frontend/lib/knowledge-grid-pagination.ts
frontend/lib/knowledge-table-state.ts

🚧 Files skipped from review as they are similar to previous changes (1)

frontend/app/knowledge/page.tsx

…ndFiles, so dedupe only reflects authentic backend identities.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/hooks/useKnowledgeIngestFocus.ts`:
- Around line 67-70: The mode value is being unconditionally set at the
pending.modes.set call, which overwrites any previously stored explicit mode
with an inferred mode when a later queue call doesn't provide an explicit mode
parameter. Instead of always using the nullish coalescing operator to fall back
to inferIngestFocusMode, preserve the existing mode for an identity if one is
already stored in pending.modes. Check if the identity already exists in
pending.modes before deciding whether to use the provided mode, infer a new one,
or keep the existing one. This ensures that an explicit mode set in a prior
queue call is not replaced by an inferred mode in subsequent calls for the same
identity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0774bc52-6437-4c61-95af-c8359978d3a4

📥 Commits

Reviewing files that changed from the base of the PR and between 3c53962 and abd286e.

📒 Files selected for processing (4)

frontend/app/knowledge/page.tsx
frontend/hooks/useKnowledgeIngestFocus.ts
frontend/lib/knowledge-grid-pagination.ts
frontend/lib/knowledge-table-state.ts

🚧 Files skipped from review as they are similar to previous changes (2)

frontend/lib/knowledge-grid-pagination.ts
frontend/lib/knowledge-table-state.ts

correct source icon, and fix connector overwrite so re-ingests merge onto existing indexed rows instead of duplicating. - Pass connectorType from connector ingest call sites into addTask - Store per-task connector type in task-context and resolve overlays via inferTaskFileConnectorType - Restore main icon helpers (inferTaskFileConnectorType, resolveKnowledgeRowConnectorType) in knowledge-table-state - Match task overlays to backend rows via filename aliases and cleanConnectorFilename; prefer backend filename/source_url when merging onto indexed rows so identity fields stay paired - Use cleaned connector filenames for cloud ingest focus targets - Use file_task.filename for connector duplicate check/delete in the backend

pagination/overwrite alias matching alongside main's connector icon helpers.

The two connector sync entry points passed `getattr(connector, "CONNECTOR_TYPE", None) or "local"` into ConnectorFileProcessor. The `or "local"` made the value always truthy, so the processor's `self.connector_type or connection.connector_type` fallback never engaged — a connector with CONNECTOR_TYPE unset would be mislabeled "local" instead of falling through to the connection's real stored type. Pass `connector.CONNECTOR_TYPE` directly instead: `connector` is already guarded non-None at both sites and CONNECTOR_TYPE is a declared BaseConnector attribute, so the getattr default was unreachable. Result precedence is now connector type → connection type, with no "local" masking. Real connectors (SharePoint/OneDrive/Drive/S3) set CONNECTOR_TYPE so their resolved type is unchanged.

lucaseduoli

Tested and approved

Olfa Maslah added 2 commits June 15, 2026 13:59

improve pagination and scroll to the file row that is currently proce…

cfb6afb

…ssing the ingestion

improve pagination and scroll to the file row that is currently proce…

a1b7856

…ssing the ingestion, make sure it works for connectors too

github-actions Bot added the frontend 🟨 Issues related to the UI/UX label Jun 15, 2026

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread frontend/lib/knowledge-grid-pagination.ts

adressed coderabbit comment1

05609ab

Wallgau force-pushed the pagination-improvement branch from e1e140f to 05609ab Compare June 15, 2026 20:25

Merge branch 'main' into pagination-improvement

d01d60c

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread frontend/lib/knowledge-grid-pagination.ts

Comment thread frontend/lib/knowledge-table-state.ts

fix connector ingest error

9af6032

github-actions Bot added backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

Build backendIdentityKeys from raw searchData instead of merged backe…

3c53962

…ndFiles, so dedupe only reflects authentic backend identities.

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread frontend/hooks/useKnowledgeIngestFocus.ts

Preserve stored mode when no explicit mode is passed

4517407

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

Resolve task-context and knowledge-table-state conflicts by keeping

4806b76

pagination/overwrite alias matching alongside main's connector icon helpers.

github-actions Bot added tests bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026

style: ruff autofix (auto)

29c523f

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026

Merge remote-tracking branch 'origin/main' into pagination-improvement

8c98ff7

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026

lucaseduoli approved these changes Jun 16, 2026

View reviewed changes

github-actions Bot added the lgtm label Jun 16, 2026

Merge branch 'main' into pagination-improvement

652a902

github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve pagination and scroll to the file row that is currently processing the ingestion#1884

fix: improve pagination and scroll to the file row that is currently processing the ingestion#1884
Wallgau wants to merge 14 commits into
mainfrom
pagination-improvement

Wallgau commented Jun 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

lucaseduoli left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Wallgau commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucaseduoli left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Wallgau commented Jun 15, 2026 •

edited

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading