feat: improve ingest tracking events#1933
Conversation
WalkthroughAdds a new ChangesIngestion Analytics Instrumentation
Sequence Diagram(s)sequenceDiagram
participant EntryPoint as Ingestion Entry Point<br/>(Page/Component)
participant trackStartProcess
participant syncMutation as Sync Mutation
participant TaskProvider
participant trackProcessFailure
participant trackProcessSuccess
EntryPoint->>trackStartProcess: processType, category, source, connector_type, file/bucket count
EntryPoint->>syncMutation: initiate ingestion
alt Ingestion succeeds
syncMutation-->>TaskProvider: addTask(taskId, {source, connectorType})
TaskProvider->>TaskProvider: store source in taskSourcesRef
TaskProvider->>TaskProvider: store connectorType in taskConnectorTypesRef
Note over TaskProvider: Task completion handler
TaskProvider->>TaskProvider: compute embedding_model, connector_type, source
TaskProvider->>trackProcessSuccess: embedding_model, connector_type, source
else Ingestion fails
syncMutation-->>trackProcessFailure: error message
trackProcessFailure->>EntryPoint: failure analytics recorded
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…rag into improve-ingest-tracking
mpawlow
left a comment
There was a problem hiding this comment.
Code Review 1
- ✅ Approved / LGTM 🚀
- See PR comments (1a), (1b), (1d) for Minor feedback for potential consideration
- See PR comment: (1c) to verify intended behavior
| }: T & ButtonEventParams): void => | ||
| track("Button Clicked", { action, ...rest } as Record<string, unknown>); | ||
|
|
||
| interface StartProcessParams { |
There was a problem hiding this comment.
(1a) [Minor] source field is untyped — bypasses compile-time validation
Problem
StartProcessParamsdefinesprocessType,process, andcategorybut omitssource- Every call site passes
sourceas an extra generic prop (typed asT = Record<string, unknown>), which bypasses TypeScript compile-time checking - Valid values (
"file","folder","path","connector") are undocumented at the type level — a future caller could pass any arbitrary string with no warning
Code References
frontend/app/upload/[provider]/page.tsxline 100 (source: "connector")frontend/components/connectors/shared-bucket-view.tsxline 85 (source: "connector")frontend/components/knowledge-dropdown.tsxlines 341, 373, 600 (source: "file","folder","path")
Potential Solution
- Add
sourceas an optional typed union toStartProcessParams:interface StartProcessParams { processType: string; process?: string; category?: string; source?: "file" | "folder" | "path" | "connector"; }
- This follows the same pattern as
EndProcessParamsand makes valid values discoverable/enforced by the compiler
There was a problem hiding this comment.
Because this is for tracking only type safety is pretty low priority
|
|
||
| const uploadFile = async (file: File, replace: boolean) => { | ||
| setFileUploading(true); | ||
| trackStartProcess({ |
There was a problem hiding this comment.
(1b) [Minor] "Started Process" event fires even when uploadFile HTTP request fails — orphaned start event
Problem
- In
knowledge-dropdown.tsx:uploadFile,trackStartProcessis called unconditionally before thetryblock - If
uploadFileUtilthrows (e.g., network failure, 4xx/5xx response), thecatchblock shows an error toast but no task is created server-side - Without a task,
task-context.tsxnever firestrackProcessSuccessortrackProcessFailure - Result: a "Started Process" analytics event has no corresponding "Ended Process" event — these orphaned events inflate apparent funnel drop-off and skew conversion metrics
Code References
frontend/lib/analytics.tslines 79–81 (trackStartProcess)frontend/contexts/task-context.tsxlines 438–463 (completion event paths)
Potential Solution
- Move
trackStartProcessinto thetryblock, after a successful response fromuploadFileUtil:const uploadFile = async (file: File, replace: boolean) => { setFileUploading(true); try { await uploadFileUtil(file, replace); trackStartProcess({ processType: "Ingestion", process: "Document Upload", category: "Knowledge", source: "file", total_files: 1, }); refetchTasks(); } catch (error) { ... }
- This ensures a "Started Process" event only fires when a task has been accepted by the server
There was a problem hiding this comment.
fixed by adding a failed event in the catch blocks
| filesToUpload: File[], | ||
| replace: boolean, | ||
| ) => { | ||
| trackStartProcess({ |
There was a problem hiding this comment.
(1c) [Normal] Batch folder uploads produce a 1:N start-to-end event ratio
Problem
uploadFolderBatchesfires a singletrackStartProcessevent once at entry- It then creates one task per batch via
uploadFilesin a loop (line 389:addTask(result.taskId)) - Each task completion independently fires
trackProcessSuccessortrackProcessFailureviatask-context.tsx - For a 100-file folder upload with
uploadBatchSize = 10, this produces 1 start event and 10 end events — a 1:10 mismatch
Background Information
- Funnel analysis tools (Segment included) assume a 1:1 start/end relationship per user action
- A 1:N ratio causes the funnel to show an inflated completion rate (multiple end events per start)
- It also makes aggregate metrics like
total_filesinconsistent: the start event reportstotal_files: 100, but each end event reportstotal_files: 10
Code References
frontend/contexts/task-context.tsxlines 421–463 (per-task completion tracking)
Potential Solution
- Track a single logical start and single logical end by waiting for all batches to resolve, then emitting one summary end event from
uploadFolderBatchesrather than relying on per-task completion events - Alternatively, move the start event inside the batch loop so each batch has its own paired start/end:
for (const batch of batches) { try { const result = await uploadFiles(batch, replace); trackStartProcess({ ..., total_files: batch.length }); addTask(result.taskId); } catch (error) { ... } }
There was a problem hiding this comment.
I think we keep this for now unless there are issues w/ the amplitude dashboards.
| total_files: currentTask.total_files, | ||
| failed_files: failedFiles, | ||
| duration_seconds: currentTask.duration_seconds, | ||
| embedding_model: embeddingModel, |
There was a problem hiding this comment.
(1d) [Minor] source dimension missing from completion events — breaks source-level funnel analysis
Problem
- All five
trackStartProcesscall sites include asourcefield ("file","folder","path","connector") - Neither
trackProcessSuccessnortrackProcessFailureintask-context.tsxincludes asourcefield - This means it is impossible to segment completion rates by upload source (e.g., "what % of connector ingestions succeed vs file uploads?")
Code References
frontend/contexts/task-context.tsxlines 438–463 (trackProcessFailure/trackProcessSuccesscalls)frontend/components/knowledge-dropdown.tsxlines 337, 370, 596 (start events withsource)
Potential Solution
- Derive or propagate
sourcealongsideconnector_typeand include it in completion payloads:const source = connectorType === "local" ? "file" // or use a separate ref to store the source per task_id : "connector"; trackProcessSuccess({ ..., connector_type: connectorType, source, });
- For finer granularity (
"file"vs"folder"vs"path"), store the source in a ref alongsidetaskConnectorTypesRef, similar to how connector type is stored
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/contexts/task-context.tsx (1)
437-474:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftTerminal failure path missing computed metadata — inconsistent analytics.
The completed-with-failures path (lines 447-474) correctly computes and includes
embedding_model,connector_type, andsourcein thetrackProcessFailureandtrackProcessSuccesspayloads. However, the terminal failure path at lines 618-627 callstrackProcessFailurewithout these computed fields:trackProcessFailure({ processType: "Ingestion", process: "Document Upload", category: "Knowledge", task_id: currentTask.task_id, total_files: currentTask.total_files, failed_files: currentTask.failed_files, duration_seconds: currentTask.duration_seconds, resultValue: currentTask.error, // ❌ Missing: embedding_model, connector_type, source });This creates inconsistent analytics data where some ingestion failures include source attribution and others don't, breaking source-level funnel analysis and completion-rate segmentation for terminal failures.
🔧 Proposed fix
Move the metadata computation logic outside the completed handler or duplicate it before the terminal failure tracking:
if ( shouldShowToast && previousTask && !isTerminalFailedTask(previousTask) && isTerminalFailedTask(currentTask) ) { if (!isOnboardingActive) { selectTask(currentTask.task_id); setIsMenuOpen(true); setIsRecentTasksExpanded(true); } + + const firstFile = currentTask.files + ? Object.values(currentTask.files)[0] + : undefined; + const embeddingModel = firstFile?.embedding_model; + const connectorType = + taskConnectorTypesRef.current.get(currentTask.task_id) || "local"; + const source = + taskSourcesRef.current.get(currentTask.task_id) || + (connectorType === "local" ? "file" : "connector"); + // Task just failed - show error toast trackProcessFailure({ processType: "Ingestion", process: "Document Upload", category: "Knowledge", task_id: currentTask.task_id, total_files: currentTask.total_files, failed_files: currentTask.failed_files, duration_seconds: currentTask.duration_seconds, resultValue: currentTask.error, + embedding_model: embeddingModel, + connector_type: connectorType, + source, });This resolves the past review comment for completed tasks and extends the fix to terminal failures.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/contexts/task-context.tsx` around lines 437 - 474, The terminal failure path for trackProcessFailure calls (around lines 618-627) is missing the computed metadata fields (embedding_model, connector_type, and source) that are correctly included in the completed-with-failures path (lines 437-474). Extract the metadata computation logic that derives embedding_model from currentTask.files, connector_type from taskConnectorTypesRef, and source from taskSourcesRef, and duplicate this logic before the terminal failure trackProcessFailure call to ensure it includes all the same fields as the completed path, making the analytics payloads consistent across all failure tracking scenarios.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@frontend/contexts/task-context.tsx`:
- Around line 437-474: The terminal failure path for trackProcessFailure calls
(around lines 618-627) is missing the computed metadata fields (embedding_model,
connector_type, and source) that are correctly included in the
completed-with-failures path (lines 437-474). Extract the metadata computation
logic that derives embedding_model from currentTask.files, connector_type from
taskConnectorTypesRef, and source from taskSourcesRef, and duplicate this logic
before the terminal failure trackProcessFailure call to ensure it includes all
the same fields as the completed path, making the analytics payloads consistent
across all failure tracking scenarios.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d7f8f9dd-b3e1-44df-861a-8c1fdccd7afc
📒 Files selected for processing (6)
frontend/app/chat/page.tsxfrontend/app/connectors/page.tsxfrontend/app/upload/[provider]/page.tsxfrontend/components/connectors/shared-bucket-view.tsxfrontend/components/knowledge-dropdown.tsxfrontend/contexts/task-context.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
- frontend/app/upload/[provider]/page.tsx
Improves event tracking details around start and completion of ingest
Summary by CodeRabbit
source(such as file/folder/path/connector), improving the accuracy of completion and error tracking.