Skip to content

fix: improve pagination and scroll to the file row that is currently processing the ingestion#1884

Open
Wallgau wants to merge 14 commits into
mainfrom
pagination-improvement
Open

fix: improve pagination and scroll to the file row that is currently processing the ingestion#1884
Wallgau wants to merge 14 commits into
mainfrom
pagination-improvement

Conversation

@Wallgau

@Wallgau Wallgau commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator
Screen.Recording.2026-06-15.at.4.03.46.PM.mov

Summary
When ingesting files on the Knowledge page, the grid now automatically jumps to the correct pagination page so users can see processing status without manually hunting through pages.

Overwrite (file already indexed): navigates to the page containing that file’s existing row
New file: navigates to the last page where the processing overlay is appended
Cloud uploads (Google Drive, OneDrive, SharePoint): focus targets persist across the redirect from /upload/[provider] → /knowledge
Problem
The Knowledge table uses AG Grid with client-side pagination (25 rows/page). After uploading or overwriting a file, the grid stayed on the current page — e.g. staying on page 1 while an overwritten file on page 2 entered processing. Local uploads and cloud connector uploads both had this gap; cloud uploads additionally lost focus events because navigation unmounted the Knowledge page before the event could fire.

Test plan

Overwrite on another page: On page 1, overwrite a file whose row is on page 2 → grid jumps to page 2, row shows processing

New local upload: On page 1, upload a new file → grid jumps to last page when processing row appears

Google Drive upload: Ingest from /upload/google_drive, redirected to Knowledge → grid jumps to correct page

Google Drive overwrite: Pick an existing file, confirm overwrite → grid jumps to that file’s page

Already on correct page: Overwrite while viewing the file’s page → stays on same page, status updates

No duplicate/missing rows: Paginate through all pages after mixed uploads; no ghost or duplicate rows

Summary by CodeRabbit

  • New Features
    • Knowledge grid now automatically navigates to and focuses newly added or processing ingest rows during ingestion, including restore after pagination/row updates; focus is triggered from both upload and sync actions.
  • Improvements
    • Task overlays now match files using multiple filename/source URL variations with priority-based status resolution (processing over failed) for more reliable row mapping.
    • Ingest-focus state is persisted for the session and applied on page load.
  • Other
    • Added additional AG Grid modules to improve row navigation.
    • Backend connector sync now propagates connector type into file processing.

Olfa Maslah added 2 commits June 15, 2026 13:59
@github-actions github-actions Bot added the frontend 🟨 Issues related to the UI/UX label Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new knowledge-grid-pagination.ts module with helpers to locate, paginate to, and scroll-focus knowledge grid rows by identity. Refactors buildKnowledgeTableRows to use multi-key alias matching for taskFile overlays and updated status fallback logic. Integrates focus queuing into upload flows, registers additional AgGrid modules, creates a useKnowledgeIngestFocus hook to coordinate pending-row focusing, wires the hook into the knowledge page via grid callbacks. Threads connector_type parameter through backend file processors and service calls.

Changes

Auto-focus pending ingest rows in knowledge grid

Layer / File(s) Summary
Alias-key file matching for taskFile overlays
frontend/lib/knowledge-table-state.ts
buildKnowledgeTableRows now generates and indexes taskFile overlays across multiple alias keys (filename, source_url, and basename) with priority ordering. Merged status uses taskFile value only for processing/failed, otherwise falls back to backend status. Backend filtering excludes task files whose alias keys already exist in the backend set.
Grid pagination and row identity helpers
frontend/lib/knowledge-grid-pagination.ts
New module defines IngestFocusMode type and session persistence utilities for ingest-focus targets. Exports dispatchKnowledgeIngestFocus to emit events and queueKnowledgeIngestFocusForCloudFiles to persist focus targets. Implements row resolution logic to match by alias-key sets, infer focus mode, locate visible matching rows after filter/sort, and collect resolvable identities. Exports collectNewIngestFocusIdentities and collectProcessingFocusIdentities to extract identity targets from status transitions. Exports goToGridRowIdentities to compute focus mode and jump/scroll to rows, and focusPendingIngestRows as the high-level focus routine.
Upload flow integration
frontend/app/upload/[provider]/page.tsx, frontend/components/knowledge-dropdown.tsx
Cloud and dropdown file uploads now dispatch/queue knowledge ingest focus actions using the new helpers, passing through the replace/overwrite flag to signal focus mode.
AgGrid module registration
frontend/components/AgGrid/registerAgGridModules.ts
Registers ClientSideRowModelApiModule, RowApiModule, and ScrollApiModule to support programmatic pagination and row scrolling.
useKnowledgeIngestFocus hook
frontend/hooks/useKnowledgeIngestFocus.ts
New React hook coordinates focusing ingest-related rows in an AG Grid instance. Maintains a pending-focus queue via refs, resolves focus via focusPendingIngestRows with double requestAnimationFrame scheduling, loads and queues persisted focus targets on mount, listens for window KNOWLEDGE_INGEST_FOCUS_EVENT events, detects newly-processing identities on data changes, and exposes gridRowsSelectionKey and callbacks for grid-ready and row-data-updated events.
Knowledge page focus orchestration
frontend/app/knowledge/page.tsx
Integrates the useKnowledgeIngestFocus hook into the knowledge page. Replaces local gridRowsSelectionKey computation with hook-provided values and callbacks. Wires onKnowledgeGridReady and onKnowledgeRowDataUpdated handlers on both cloud-brand and non-cloud AgGrid instances.

Backend connector type parameter threading

Layer / File(s) Summary
ConnectorFileProcessor type parameter initialization
src/models/processors.py
ConnectorFileProcessor constructor now accepts an optional connector_type: str | None parameter, allowing callers to override the effective connector type used during processing.
Connector type derivation in processing pipeline
src/models/processors.py
process_item derives a local connector_type value from the instance override or falls back to the connection's type, then threads it through Langflow ingestion, standard OpenRAG processing, and connector-metadata update calls.
ConnectorService type wiring
src/connectors/service.py
ConnectorService now passes connector_type (from connector.CONNECTOR_TYPE or defaulting to "local") into ConnectorFileProcessor in both sync_connector_files and sync_specific_files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

  • langflow-ai/openrag#1706: Both PRs modify frontend/app/upload/[provider]/page.tsx around submitSync/the replaceDuplicates flag—main PR adds queueKnowledgeIngestFocusForCloudFiles inside submitSync, while retrieved PR changes the duplicate-checking flow that determines when/how submitSync(..., false) is called—so the changes are directly coupled in that function.
  • langflow-ai/openrag#1663: Both PRs touch the upload provider ingest flow (frontend/app/upload/[provider]/page.tsx) by extending submitSync around the replaceDuplicates/replace_duplicates behavior—main PR uses it to enqueue ingest-focus actions, while the retrieved PR uses it to drive filename duplicate replacement during sync.

Suggested reviewers

  • lucaseduoli
  • ricofurtado
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: improving pagination and scroll behavior to navigate to files currently processing ingestion.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pagination-improvement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 73-75: The code at lines 73-75 queues unseen identities for focus
without verifying their status, which can trigger unwanted pagination jumps for
identities that have already reached `active` or other non-processing states.
Add a status check to ensure the identity is in `processing` state before
pushing it to the identities array in the condition that checks if (!prev). Only
queue identities that are actively processing to maintain the intended
pagination behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de265f7b-29cf-435b-86e3-5b9ce21f0f08

📥 Commits

Reviewing files that changed from the base of the PR and between 6f982a9 and cfb6afb.

📒 Files selected for processing (3)
  • frontend/app/knowledge/page.tsx
  • frontend/lib/knowledge-grid-pagination.ts
  • frontend/lib/knowledge-table-state.ts

Comment thread frontend/lib/knowledge-grid-pagination.ts
@Wallgau Wallgau force-pushed the pagination-improvement branch from e1e140f to 05609ab Compare June 15, 2026 20:25
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
frontend/lib/knowledge-grid-pagination.ts (1)

127-131: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Expand the queued identity before inferring "new" vs "existing".

This only checks the raw queued identity against each row’s aliases. If the event queued a filename but the existing backend row is only identifiable by source_url/basename, overwrites are misclassified as "new" and jump to the last page instead of the existing row.

🐛 Proposed fix
 export function inferIngestFocusMode(
   identity: string,
   rowData: GridRowLike[],
 ): IngestFocusMode {
-  const identitySet = new Set([identity]);
+  const identitySet = new Set(
+    getKnowledgeFileAliasKeys({ filename: identity, source_url: identity }),
+  );
   const hasMatch = rowData.some((row) =>
     rowMatchesIdentitySet(row, identitySet),
   );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/lib/knowledge-grid-pagination.ts` around lines 127 - 131, The
function only checks the raw queued identity value against each row's aliases,
which causes misclassification when the identity format differs between the
queued event and existing backend rows (e.g., filename vs source_url/basename).
Before creating the identitySet on line 127, expand the raw identity to include
all its possible identifying variants (such as source_url and basename
equivalents) so that rowMatchesIdentitySet can properly detect existing rows
regardless of which identifier format was used. Look for an existing expand or
normalize function that converts a single identity into its full set of
equivalent identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 231-267: In the `collectResolvableIdentities` function, when an
alias key is found to match an identity in the `identities` set, the function
currently adds that alias key to the `found` result. However, the function
should instead return the original pending identity that corresponds to the
matched alias. Modify both the loop within `api.forEachNodeAfterFilterAndSort`
(around the `getKnowledgeFileAliasKeys(data)` iteration) and the loop iterating
over `rowData` (around the second `getKnowledgeFileAliasKeys(row)` iteration) so
that when `identities.has(key)` is true, you add the original identity itself to
the `found` set rather than the alias key. This ensures that callers receive the
original pending ids and avoid repeated pagination attempts when rows are
matched through their aliases.

In `@frontend/lib/knowledge-table-state.ts`:
- Around line 152-157: The backendIdentityKeys are being built from backendFiles
which has been mutated with aliases copied from taskFile at lines 139-140. This
causes unrelated pending tasks to incorrectly match as already represented and
get dropped. To fix this, build the backendIdentityKeys from the original raw
backend files data before any aliases were merged in, rather than from the
mutated backendFiles variable. This ensures the dedupe keys only contain the
authentic backend identities and prevents false matches with pending tasks.

---

Duplicate comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 127-131: The function only checks the raw queued identity value
against each row's aliases, which causes misclassification when the identity
format differs between the queued event and existing backend rows (e.g.,
filename vs source_url/basename). Before creating the identitySet on line 127,
expand the raw identity to include all its possible identifying variants (such
as source_url and basename equivalents) so that rowMatchesIdentitySet can
properly detect existing rows regardless of which identifier format was used.
Look for an existing expand or normalize function that converts a single
identity into its full set of equivalent identifiers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 29a27e4d-2415-4b2d-beac-24f14e237f6b

📥 Commits

Reviewing files that changed from the base of the PR and between cfb6afb and d01d60c.

📒 Files selected for processing (6)
  • frontend/app/knowledge/page.tsx
  • frontend/app/upload/[provider]/page.tsx
  • frontend/components/AgGrid/registerAgGridModules.ts
  • frontend/components/knowledge-dropdown.tsx
  • frontend/lib/knowledge-grid-pagination.ts
  • frontend/lib/knowledge-table-state.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/app/knowledge/page.tsx

Comment thread frontend/lib/knowledge-grid-pagination.ts
Comment thread frontend/lib/knowledge-table-state.ts
@github-actions github-actions Bot added backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026
…ndFiles, so dedupe only reflects authentic backend identities.
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/hooks/useKnowledgeIngestFocus.ts`:
- Around line 67-70: The mode value is being unconditionally set at the
pending.modes.set call, which overwrites any previously stored explicit mode
with an inferred mode when a later queue call doesn't provide an explicit mode
parameter. Instead of always using the nullish coalescing operator to fall back
to inferIngestFocusMode, preserve the existing mode for an identity if one is
already stored in pending.modes. Check if the identity already exists in
pending.modes before deciding whether to use the provided mode, infer a new one,
or keep the existing one. This ensures that an explicit mode set in a prior
queue call is not replaced by an inferred mode in subsequent calls for the same
identity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0774bc52-6437-4c61-95af-c8359978d3a4

📥 Commits

Reviewing files that changed from the base of the PR and between 3c53962 and abd286e.

📒 Files selected for processing (4)
  • frontend/app/knowledge/page.tsx
  • frontend/hooks/useKnowledgeIngestFocus.ts
  • frontend/lib/knowledge-grid-pagination.ts
  • frontend/lib/knowledge-table-state.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • frontend/lib/knowledge-grid-pagination.ts
  • frontend/lib/knowledge-table-state.ts

Comment thread frontend/hooks/useKnowledgeIngestFocus.ts
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026
correct source icon, and fix connector overwrite so re-ingests merge onto
existing indexed rows instead of duplicating.

- Pass connectorType from connector ingest call sites into addTask
- Store per-task connector type in task-context and resolve overlays via
  inferTaskFileConnectorType
- Restore main icon helpers (inferTaskFileConnectorType,
  resolveKnowledgeRowConnectorType) in knowledge-table-state
- Match task overlays to backend rows via filename aliases and
  cleanConnectorFilename; prefer backend filename/source_url when merging
  onto indexed rows so identity fields stay paired
- Use cleaned connector filenames for cloud ingest focus targets
- Use file_task.filename for connector duplicate check/delete in the backend
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 15, 2026
pagination/overwrite alias matching alongside main's connector icon helpers.
@github-actions github-actions Bot added tests bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026
  The two connector sync entry points passed
  `getattr(connector, "CONNECTOR_TYPE", None) or "local"` into
  ConnectorFileProcessor. The `or "local"` made the value always truthy,
  so the processor's `self.connector_type or connection.connector_type`
  fallback never engaged — a connector with CONNECTOR_TYPE unset would be
  mislabeled "local" instead of falling through to the connection's real
  stored type.

  Pass `connector.CONNECTOR_TYPE` directly instead: `connector` is already
  guarded non-None at both sites and CONNECTOR_TYPE is a declared
  BaseConnector attribute, so the getattr default was unreachable. Result
  precedence is now connector type → connection type, with no "local"
  masking. Real connectors (SharePoint/OneDrive/Drive/S3) set
  CONNECTOR_TYPE so their resolved type is unchanged.
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026

@lucaseduoli lucaseduoli left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and approved

@github-actions github-actions Bot added the lgtm label Jun 16, 2026
@github-actions github-actions Bot added bug 🔴 Something isn't working. and removed bug 🔴 Something isn't working. labels Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend 🔷 Issues related to backend services (OpenSearch, Langflow, APIs) bug 🔴 Something isn't working. frontend 🟨 Issues related to the UI/UX lgtm tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants