fix: improve pagination and scroll to the file row that is currently processing the ingestion#1884
fix: improve pagination and scroll to the file row that is currently processing the ingestion#1884Wallgau wants to merge 14 commits into
Conversation
…ssing the ingestion
…ssing the ingestion, make sure it works for connectors too
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new ChangesAuto-focus pending ingest rows in knowledge grid
Backend connector type parameter threading
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 73-75: The code at lines 73-75 queues unseen identities for focus
without verifying their status, which can trigger unwanted pagination jumps for
identities that have already reached `active` or other non-processing states.
Add a status check to ensure the identity is in `processing` state before
pushing it to the identities array in the condition that checks if (!prev). Only
queue identities that are actively processing to maintain the intended
pagination behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: de265f7b-29cf-435b-86e3-5b9ce21f0f08
📒 Files selected for processing (3)
frontend/app/knowledge/page.tsxfrontend/lib/knowledge-grid-pagination.tsfrontend/lib/knowledge-table-state.ts
e1e140f to
05609ab
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
frontend/lib/knowledge-grid-pagination.ts (1)
127-131:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExpand the queued identity before inferring
"new"vs"existing".This only checks the raw queued
identityagainst each row’s aliases. If the event queued a filename but the existing backend row is only identifiable bysource_url/basename, overwrites are misclassified as"new"and jump to the last page instead of the existing row.🐛 Proposed fix
export function inferIngestFocusMode( identity: string, rowData: GridRowLike[], ): IngestFocusMode { - const identitySet = new Set([identity]); + const identitySet = new Set( + getKnowledgeFileAliasKeys({ filename: identity, source_url: identity }), + ); const hasMatch = rowData.some((row) => rowMatchesIdentitySet(row, identitySet), );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/lib/knowledge-grid-pagination.ts` around lines 127 - 131, The function only checks the raw queued identity value against each row's aliases, which causes misclassification when the identity format differs between the queued event and existing backend rows (e.g., filename vs source_url/basename). Before creating the identitySet on line 127, expand the raw identity to include all its possible identifying variants (such as source_url and basename equivalents) so that rowMatchesIdentitySet can properly detect existing rows regardless of which identifier format was used. Look for an existing expand or normalize function that converts a single identity into its full set of equivalent identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 231-267: In the `collectResolvableIdentities` function, when an
alias key is found to match an identity in the `identities` set, the function
currently adds that alias key to the `found` result. However, the function
should instead return the original pending identity that corresponds to the
matched alias. Modify both the loop within `api.forEachNodeAfterFilterAndSort`
(around the `getKnowledgeFileAliasKeys(data)` iteration) and the loop iterating
over `rowData` (around the second `getKnowledgeFileAliasKeys(row)` iteration) so
that when `identities.has(key)` is true, you add the original identity itself to
the `found` set rather than the alias key. This ensures that callers receive the
original pending ids and avoid repeated pagination attempts when rows are
matched through their aliases.
In `@frontend/lib/knowledge-table-state.ts`:
- Around line 152-157: The backendIdentityKeys are being built from backendFiles
which has been mutated with aliases copied from taskFile at lines 139-140. This
causes unrelated pending tasks to incorrectly match as already represented and
get dropped. To fix this, build the backendIdentityKeys from the original raw
backend files data before any aliases were merged in, rather than from the
mutated backendFiles variable. This ensures the dedupe keys only contain the
authentic backend identities and prevents false matches with pending tasks.
---
Duplicate comments:
In `@frontend/lib/knowledge-grid-pagination.ts`:
- Around line 127-131: The function only checks the raw queued identity value
against each row's aliases, which causes misclassification when the identity
format differs between the queued event and existing backend rows (e.g.,
filename vs source_url/basename). Before creating the identitySet on line 127,
expand the raw identity to include all its possible identifying variants (such
as source_url and basename equivalents) so that rowMatchesIdentitySet can
properly detect existing rows regardless of which identifier format was used.
Look for an existing expand or normalize function that converts a single
identity into its full set of equivalent identifiers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 29a27e4d-2415-4b2d-beac-24f14e237f6b
📒 Files selected for processing (6)
frontend/app/knowledge/page.tsxfrontend/app/upload/[provider]/page.tsxfrontend/components/AgGrid/registerAgGridModules.tsfrontend/components/knowledge-dropdown.tsxfrontend/lib/knowledge-grid-pagination.tsfrontend/lib/knowledge-table-state.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- frontend/app/knowledge/page.tsx
…ndFiles, so dedupe only reflects authentic backend identities.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@frontend/hooks/useKnowledgeIngestFocus.ts`:
- Around line 67-70: The mode value is being unconditionally set at the
pending.modes.set call, which overwrites any previously stored explicit mode
with an inferred mode when a later queue call doesn't provide an explicit mode
parameter. Instead of always using the nullish coalescing operator to fall back
to inferIngestFocusMode, preserve the existing mode for an identity if one is
already stored in pending.modes. Check if the identity already exists in
pending.modes before deciding whether to use the provided mode, infer a new one,
or keep the existing one. This ensures that an explicit mode set in a prior
queue call is not replaced by an inferred mode in subsequent calls for the same
identity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0774bc52-6437-4c61-95af-c8359978d3a4
📒 Files selected for processing (4)
frontend/app/knowledge/page.tsxfrontend/hooks/useKnowledgeIngestFocus.tsfrontend/lib/knowledge-grid-pagination.tsfrontend/lib/knowledge-table-state.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- frontend/lib/knowledge-grid-pagination.ts
- frontend/lib/knowledge-table-state.ts
correct source icon, and fix connector overwrite so re-ingests merge onto existing indexed rows instead of duplicating. - Pass connectorType from connector ingest call sites into addTask - Store per-task connector type in task-context and resolve overlays via inferTaskFileConnectorType - Restore main icon helpers (inferTaskFileConnectorType, resolveKnowledgeRowConnectorType) in knowledge-table-state - Match task overlays to backend rows via filename aliases and cleanConnectorFilename; prefer backend filename/source_url when merging onto indexed rows so identity fields stay paired - Use cleaned connector filenames for cloud ingest focus targets - Use file_task.filename for connector duplicate check/delete in the backend
pagination/overwrite alias matching alongside main's connector icon helpers.
The two connector sync entry points passed `getattr(connector, "CONNECTOR_TYPE", None) or "local"` into ConnectorFileProcessor. The `or "local"` made the value always truthy, so the processor's `self.connector_type or connection.connector_type` fallback never engaged — a connector with CONNECTOR_TYPE unset would be mislabeled "local" instead of falling through to the connection's real stored type. Pass `connector.CONNECTOR_TYPE` directly instead: `connector` is already guarded non-None at both sites and CONNECTOR_TYPE is a declared BaseConnector attribute, so the getattr default was unreachable. Result precedence is now connector type → connection type, with no "local" masking. Real connectors (SharePoint/OneDrive/Drive/S3) set CONNECTOR_TYPE so their resolved type is unchanged.
lucaseduoli
left a comment
There was a problem hiding this comment.
Tested and approved
Screen.Recording.2026-06-15.at.4.03.46.PM.mov
Summary
When ingesting files on the Knowledge page, the grid now automatically jumps to the correct pagination page so users can see processing status without manually hunting through pages.
Overwrite (file already indexed): navigates to the page containing that file’s existing row
New file: navigates to the last page where the processing overlay is appended
Cloud uploads (Google Drive, OneDrive, SharePoint): focus targets persist across the redirect from /upload/[provider] → /knowledge
Problem
The Knowledge table uses AG Grid with client-side pagination (25 rows/page). After uploading or overwriting a file, the grid stayed on the current page — e.g. staying on page 1 while an overwritten file on page 2 entered processing. Local uploads and cloud connector uploads both had this gap; cloud uploads additionally lost focus events because navigation unmounted the Knowledge page before the event could fire.
Test plan
Overwrite on another page: On page 1, overwrite a file whose row is on page 2 → grid jumps to page 2, row shows processing
New local upload: On page 1, upload a new file → grid jumps to last page when processing row appears
Google Drive upload: Ingest from /upload/google_drive, redirected to Knowledge → grid jumps to correct page
Google Drive overwrite: Pick an existing file, confirm overwrite → grid jumps to that file’s page
Already on correct page: Overwrite while viewing the file’s page → stays on same page, status updates
No duplicate/missing rows: Paginate through all pages after mixed uploads; no ghost or duplicate rows
Summary by CodeRabbit