Skip to content

feat: improve ingest tracking events#1933

Merged
mfortman11 merged 5 commits into
mainfrom
improve-ingest-tracking
Jun 22, 2026
Merged

feat: improve ingest tracking events#1933
mfortman11 merged 5 commits into
mainfrom
improve-ingest-tracking

Conversation

@mfortman11

@mfortman11 mfortman11 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Improves event tracking details around start and completion of ingest

Summary by CodeRabbit

  • Chores
    • Improved ingestion analytics across document upload, shared bucket ingestion, and connector sync by emitting “process started” and failure events with richer context (source/origin, connector type, totals, and embedding model when available).
    • Updated task creation/registration so ingestion tasks are consistently tagged with a source (such as file/folder/path/connector), improving the accuracy of completion and error tracking.

@github-actions github-actions Bot added frontend 🟨 Issues related to the UI/UX community labels Jun 18, 2026
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Adds a new trackStartProcess analytics function emitting a "Started Process" event called at every ingestion entry point. Introduces a source field to task metadata tracking alongside existing connectorType. Calls trackProcessFailure at ingestion error paths. Enriches task completion analytics with embedding_model, connector_type, and source fields computed from tracked refs.

Changes

Ingestion Analytics Instrumentation

Layer / File(s) Summary
trackStartProcess analytics event
frontend/lib/analytics.ts
Defines StartProcessParams (with processType, optional process, category) and exports trackStartProcess, which calls the existing track() helper under the "Started Process" event name.
Task metadata infrastructure
frontend/contexts/task-context.tsx
Introduces taskSourcesRef to track per-task source, consolidates metadata cleanup into clearTaskMetadata helper that clears both connector type and source, updates addTask signature to accept optional source field, and applies unified cleanup across all task lifecycle events.
Ingestion entry points
frontend/app/upload/[provider]/page.tsx, frontend/components/connectors/shared-bucket-view.tsx, frontend/components/knowledge-dropdown.tsx
Calls trackStartProcess at ingestion start with process metadata and source; calls trackProcessFailure on errors with error message; and calls addTask with source alongside connectorType. Updated SharedBucketViewProps.addTask type to include optional source field.
Source metadata propagation
frontend/app/chat/page.tsx, frontend/app/connectors/page.tsx
Updates addTask calls to include source field in metadata object alongside connectorType.
Task completion analytics enrichment
frontend/contexts/task-context.tsx
Derives embedding_model from first task file, connector_type from taskConnectorTypesRef (defaulting to "local"), and source from taskSourcesRef (defaulting to "file" for local, else "connector"), then includes all three fields in trackProcessFailure and trackProcessSuccess payloads.

Sequence Diagram(s)

sequenceDiagram
  participant EntryPoint as Ingestion Entry Point<br/>(Page/Component)
  participant trackStartProcess
  participant syncMutation as Sync Mutation
  participant TaskProvider
  participant trackProcessFailure
  participant trackProcessSuccess

  EntryPoint->>trackStartProcess: processType, category, source, connector_type, file/bucket count
  EntryPoint->>syncMutation: initiate ingestion
  alt Ingestion succeeds
    syncMutation-->>TaskProvider: addTask(taskId, {source, connectorType})
    TaskProvider->>TaskProvider: store source in taskSourcesRef
    TaskProvider->>TaskProvider: store connectorType in taskConnectorTypesRef
    Note over TaskProvider: Task completion handler
    TaskProvider->>TaskProvider: compute embedding_model, connector_type, source
    TaskProvider->>trackProcessSuccess: embedding_model, connector_type, source
  else Ingestion fails
    syncMutation-->>trackProcessFailure: error message
    trackProcessFailure->>EntryPoint: failure analytics recorded
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • langflow-ai/openrag#1887: Directly related — modifies task-context.tsx to store and infer per-task connector_type via taskConnectorTypesRef, which this PR consumes and extends with the new source metadata field.

Suggested reviewers

  • ricofurtado
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding improved tracking events throughout the ingestion process across multiple components.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch improve-ingest-tracking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Jun 18, 2026
@github-actions github-actions Bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Jun 18, 2026
@github-actions github-actions Bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Jun 22, 2026
@mpawlow mpawlow self-requested a review June 22, 2026 17:04

@mpawlow mpawlow left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review 1

  • ✅ Approved / LGTM 🚀
  • See PR comments (1a), (1b), (1d) for Minor feedback for potential consideration
  • See PR comment: (1c) to verify intended behavior

Comment thread frontend/lib/analytics.ts
}: T & ButtonEventParams): void =>
track("Button Clicked", { action, ...rest } as Record<string, unknown>);

interface StartProcessParams {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1a) [Minor] source field is untyped — bypasses compile-time validation

Problem

  • StartProcessParams defines processType, process, and category but omits source
  • Every call site passes source as an extra generic prop (typed as T = Record<string, unknown>), which bypasses TypeScript compile-time checking
  • Valid values ("file", "folder", "path", "connector") are undocumented at the type level — a future caller could pass any arbitrary string with no warning

Code References

  • frontend/app/upload/[provider]/page.tsx line 100 (source: "connector")
  • frontend/components/connectors/shared-bucket-view.tsx line 85 (source: "connector")
  • frontend/components/knowledge-dropdown.tsx lines 341, 373, 600 (source: "file", "folder", "path")

Potential Solution

  • Add source as an optional typed union to StartProcessParams:
    interface StartProcessParams {
      processType: string;
      process?: string;
      category?: string;
      source?: "file" | "folder" | "path" | "connector";
    }
  • This follows the same pattern as EndProcessParams and makes valid values discoverable/enforced by the compiler

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is for tracking only type safety is pretty low priority


const uploadFile = async (file: File, replace: boolean) => {
setFileUploading(true);
trackStartProcess({

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1b) [Minor] "Started Process" event fires even when uploadFile HTTP request fails — orphaned start event

Problem

  • In knowledge-dropdown.tsx:uploadFile, trackStartProcess is called unconditionally before the try block
  • If uploadFileUtil throws (e.g., network failure, 4xx/5xx response), the catch block shows an error toast but no task is created server-side
  • Without a task, task-context.tsx never fires trackProcessSuccess or trackProcessFailure
  • Result: a "Started Process" analytics event has no corresponding "Ended Process" event — these orphaned events inflate apparent funnel drop-off and skew conversion metrics

Code References

  • frontend/lib/analytics.ts lines 79–81 (trackStartProcess)
  • frontend/contexts/task-context.tsx lines 438–463 (completion event paths)

Potential Solution

  • Move trackStartProcess into the try block, after a successful response from uploadFileUtil:
    const uploadFile = async (file: File, replace: boolean) => {
      setFileUploading(true);
      try {
        await uploadFileUtil(file, replace);
        trackStartProcess({
          processType: "Ingestion",
          process: "Document Upload",
          category: "Knowledge",
          source: "file",
          total_files: 1,
        });
        refetchTasks();
      } catch (error) { ... }
  • This ensures a "Started Process" event only fires when a task has been accepted by the server

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by adding a failed event in the catch blocks

filesToUpload: File[],
replace: boolean,
) => {
trackStartProcess({

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1c) [Normal] Batch folder uploads produce a 1:N start-to-end event ratio

Problem

  • uploadFolderBatches fires a single trackStartProcess event once at entry
  • It then creates one task per batch via uploadFiles in a loop (line 389: addTask(result.taskId))
  • Each task completion independently fires trackProcessSuccess or trackProcessFailure via task-context.tsx
  • For a 100-file folder upload with uploadBatchSize = 10, this produces 1 start event and 10 end events — a 1:10 mismatch

Background Information

  • Funnel analysis tools (Segment included) assume a 1:1 start/end relationship per user action
  • A 1:N ratio causes the funnel to show an inflated completion rate (multiple end events per start)
  • It also makes aggregate metrics like total_files inconsistent: the start event reports total_files: 100, but each end event reports total_files: 10

Code References

  • frontend/contexts/task-context.tsx lines 421–463 (per-task completion tracking)

Potential Solution

  • Track a single logical start and single logical end by waiting for all batches to resolve, then emitting one summary end event from uploadFolderBatches rather than relying on per-task completion events
  • Alternatively, move the start event inside the batch loop so each batch has its own paired start/end:
    for (const batch of batches) {
      try {
        const result = await uploadFiles(batch, replace);
        trackStartProcess({ ..., total_files: batch.length });
        addTask(result.taskId);
      } catch (error) { ... }
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we keep this for now unless there are issues w/ the amplitude dashboards.

total_files: currentTask.total_files,
failed_files: failedFiles,
duration_seconds: currentTask.duration_seconds,
embedding_model: embeddingModel,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1d) [Minor] source dimension missing from completion events — breaks source-level funnel analysis

Problem

  • All five trackStartProcess call sites include a source field ("file", "folder", "path", "connector")
  • Neither trackProcessSuccess nor trackProcessFailure in task-context.tsx includes a source field
  • This means it is impossible to segment completion rates by upload source (e.g., "what % of connector ingestions succeed vs file uploads?")

Code References

  • frontend/contexts/task-context.tsx lines 438–463 (trackProcessFailure / trackProcessSuccess calls)
  • frontend/components/knowledge-dropdown.tsx lines 337, 370, 596 (start events with source)

Potential Solution

  • Derive or propagate source alongside connector_type and include it in completion payloads:
    const source = connectorType === "local"
      ? "file"   // or use a separate ref to store the source per task_id
      : "connector";
    
    trackProcessSuccess({
      ...,
      connector_type: connectorType,
      source,
    });
  • For finer granularity ("file" vs "folder" vs "path"), store the source in a ref alongside taskConnectorTypesRef, similar to how connector type is stored

@github-actions github-actions Bot added the lgtm label Jun 22, 2026
@github-actions github-actions Bot added enhancement 🔵 New feature or request and removed enhancement 🔵 New feature or request labels Jun 22, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
frontend/contexts/task-context.tsx (1)

437-474: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Terminal failure path missing computed metadata — inconsistent analytics.

The completed-with-failures path (lines 447-474) correctly computes and includes embedding_model, connector_type, and source in the trackProcessFailure and trackProcessSuccess payloads. However, the terminal failure path at lines 618-627 calls trackProcessFailure without these computed fields:

trackProcessFailure({
  processType: "Ingestion",
  process: "Document Upload",
  category: "Knowledge",
  task_id: currentTask.task_id,
  total_files: currentTask.total_files,
  failed_files: currentTask.failed_files,
  duration_seconds: currentTask.duration_seconds,
  resultValue: currentTask.error,
  // ❌ Missing: embedding_model, connector_type, source
});

This creates inconsistent analytics data where some ingestion failures include source attribution and others don't, breaking source-level funnel analysis and completion-rate segmentation for terminal failures.

🔧 Proposed fix

Move the metadata computation logic outside the completed handler or duplicate it before the terminal failure tracking:

       if (
         shouldShowToast &&
         previousTask &&
         !isTerminalFailedTask(previousTask) &&
         isTerminalFailedTask(currentTask)
       ) {
         if (!isOnboardingActive) {
           selectTask(currentTask.task_id);
           setIsMenuOpen(true);
           setIsRecentTasksExpanded(true);
         }
+        
+        const firstFile = currentTask.files
+          ? Object.values(currentTask.files)[0]
+          : undefined;
+        const embeddingModel = firstFile?.embedding_model;
+        const connectorType =
+          taskConnectorTypesRef.current.get(currentTask.task_id) || "local";
+        const source =
+          taskSourcesRef.current.get(currentTask.task_id) ||
+          (connectorType === "local" ? "file" : "connector");
+        
         // Task just failed - show error toast
         trackProcessFailure({
           processType: "Ingestion",
           process: "Document Upload",
           category: "Knowledge",
           task_id: currentTask.task_id,
           total_files: currentTask.total_files,
           failed_files: currentTask.failed_files,
           duration_seconds: currentTask.duration_seconds,
           resultValue: currentTask.error,
+          embedding_model: embeddingModel,
+          connector_type: connectorType,
+          source,
         });

This resolves the past review comment for completed tasks and extends the fix to terminal failures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/contexts/task-context.tsx` around lines 437 - 474, The terminal
failure path for trackProcessFailure calls (around lines 618-627) is missing the
computed metadata fields (embedding_model, connector_type, and source) that are
correctly included in the completed-with-failures path (lines 437-474). Extract
the metadata computation logic that derives embedding_model from
currentTask.files, connector_type from taskConnectorTypesRef, and source from
taskSourcesRef, and duplicate this logic before the terminal failure
trackProcessFailure call to ensure it includes all the same fields as the
completed path, making the analytics payloads consistent across all failure
tracking scenarios.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@frontend/contexts/task-context.tsx`:
- Around line 437-474: The terminal failure path for trackProcessFailure calls
(around lines 618-627) is missing the computed metadata fields (embedding_model,
connector_type, and source) that are correctly included in the
completed-with-failures path (lines 437-474). Extract the metadata computation
logic that derives embedding_model from currentTask.files, connector_type from
taskConnectorTypesRef, and source from taskSourcesRef, and duplicate this logic
before the terminal failure trackProcessFailure call to ensure it includes all
the same fields as the completed path, making the analytics payloads consistent
across all failure tracking scenarios.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d7f8f9dd-b3e1-44df-861a-8c1fdccd7afc

📥 Commits

Reviewing files that changed from the base of the PR and between 537c1c6 and 1faaa01.

📒 Files selected for processing (6)
  • frontend/app/chat/page.tsx
  • frontend/app/connectors/page.tsx
  • frontend/app/upload/[provider]/page.tsx
  • frontend/components/connectors/shared-bucket-view.tsx
  • frontend/components/knowledge-dropdown.tsx
  • frontend/contexts/task-context.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/app/upload/[provider]/page.tsx

@mfortman11 mfortman11 merged commit 08fe04e into main Jun 22, 2026
18 of 19 checks passed
@github-actions github-actions Bot deleted the improve-ingest-tracking branch June 22, 2026 20:23
mfortman11 added a commit that referenced this pull request Jun 23, 2026
* improve ingest tracking events

* PR comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community enhancement 🔵 New feature or request frontend 🟨 Issues related to the UI/UX lgtm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants