This repository was archived by the owner on Jun 3, 2026. It is now read-only.
Description Problem
Current ingestion pipeline is either:
Fully sequential (safe but slow), or
Fully parallel (fast but risks race conditions during updates)
We need a hybrid approach to improve throughput while maintaining consistency.
Goal
Introduce staged parallelism to balance performance and correctness.
Proposed Approach
Split pipeline into 3 stages:
Phase A — Classification / Extraction (Parallel)
Run LLM-based extraction for all batch items concurrently
Independent per item (no shared state)
Phase B — Judge (Sequential)
Critical section
Evaluate and decide updates one-by-one
Ensures deterministic memory updates
Phase C — Weaver (Parallel)
Apply final writes (DB / vector store)
Parallelizable if no shared state conflict
Benefits
Reduces latency significantly
Preserves correctness in update step
Scales better for batch ingestion
Scope
Refactor pipeline execution into stages
Introduce controlled concurrency boundaries
Acceptance Criteria
Classification runs concurrently
Judge phase is strictly sequential
Weaver phase can run concurrently
No race conditions in memory updates
Reactions are currently unavailable
Problem
Current ingestion pipeline is either:
We need a hybrid approach to improve throughput while maintaining consistency.
Goal
Introduce staged parallelism to balance performance and correctness.
Proposed Approach
Split pipeline into 3 stages:
Phase A — Classification / Extraction (Parallel)
Phase B — Judge (Sequential)
Phase C — Weaver (Parallel)
Benefits
Scope
Acceptance Criteria