You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug/Chore: Resolve pipeline anomalies and modernize testing infrastructure
Description
Problem Statement
The main branch transition is currently carrying several critical runtime bugs affecting the local ingestion pipeline and cache management. Additionally, the continuous integration (CI) and local testing workflows require optimization to prevent false negatives and ensure accurate code coverage reporting without relying on legacy dependencies.
🚨 High-Priority Bug Fixes
Fix Watch Mode Infinite Event Cascade:
Location:src/index.js
Issue: The chokidar watcher uses a faulty regex (ignored: [ ... /data_exports[\/\\]?$/]). The end-of-string anchor $ causes it to ignore the export directory but not the files generated inside it. When the pipeline writes a markdown report, it triggers a recursive infinite loop during --watch execution.
Resolution: Remove the anchor. Update the regex to comprehensively ignore the directory and its children (e.g., /data_exports/).
Resolve Cache Eviction Path Collisions:
Location:src/ingestion/file-ingestion.js
Issue: Stale cache cleanup currently evaluates key.startsWith(sourceDirectory). On Windows 11 filesystems, targeting a directory like C:\UAP_Data will falsely flag sibling directories like C:\UAP_Data_Archive, causing severe cache thrashing and forcing redundant rasterization loops.
Resolution: Append the strict OS directory separator to the boolean check: key.startsWith(sourceDirectory + path.sep).
Restore Stage 1/2 NLP and Rasterization Integrations:
Location:src/ingestion/worker.js
Issue: There is an architectural regression. ROADMAP.md claims NLP Entity Recognition (compromise) and PDF/Image rasterization are integrated, but the worker pool still uses hardcoded Regex for metadata extraction and entirely omits .pdf and images from SUPPORTED_TEXT_EXTENSIONS.
Resolution: Reintegrate the Stage 1 & 2 logic into the worker pool to align the active codebase with the current roadmap claims.
🛠️ Testing & CI Pipeline Improvements
Implement Native V8 Code Coverage:
Issue: We need coverage telemetry without bloating node_modules with legacy third-party test runners.
Resolution: Update package.json to utilize the native node:test runner's V8 capabilities:
Issue: The matrix currently tests Node 18.x, but chokidar v5 explicitly requires Node >= 20.19.0. The Node 18 workflow will immediately fail during CI.
Resolution: Remove Node 18.x from the testing matrix to prevent guaranteed false-negative build failures.
Integrate Testing into Pre-Commit Hook:
Issue: Broken pipeline changes can currently be committed locally.
Resolution: Expand the planned Husky .husky/pre-commit workflow. The final PowerShell automation sequence should act as a strict local gatekeeper:
npm run docs:generate
git add docs/
npm test (Ensure local V8 coverage passes before allowing the commit)
npm run docs:check
Definition of Done
--watch mode can be left running without triggering infinite recursive executions.
Title
Bug/Chore: Resolve pipeline anomalies and modernize testing infrastructureDescription
Problem Statement
The
mainbranch transition is currently carrying several critical runtime bugs affecting the local ingestion pipeline and cache management. Additionally, the continuous integration (CI) and local testing workflows require optimization to prevent false negatives and ensure accurate code coverage reporting without relying on legacy dependencies.🚨 High-Priority Bug Fixes
Fix Watch Mode Infinite Event Cascade:
Location:
src/index.jsIssue: The
chokidarwatcher uses a faulty regex (ignored: [ ... /data_exports[\/\\]?$/]). The end-of-string anchor$causes it to ignore the export directory but not the files generated inside it. When the pipeline writes a markdown report, it triggers a recursive infinite loop during--watchexecution.Resolution: Remove the anchor. Update the regex to comprehensively ignore the directory and its children (e.g.,
/data_exports/).Resolve Cache Eviction Path Collisions:
Location:
src/ingestion/file-ingestion.jsIssue: Stale cache cleanup currently evaluates
key.startsWith(sourceDirectory). On Windows 11 filesystems, targeting a directory likeC:\UAP_Datawill falsely flag sibling directories likeC:\UAP_Data_Archive, causing severe cache thrashing and forcing redundant rasterization loops.Resolution: Append the strict OS directory separator to the boolean check:
key.startsWith(sourceDirectory + path.sep).Restore Stage 1/2 NLP and Rasterization Integrations:
Location:
src/ingestion/worker.jsIssue: There is an architectural regression.
ROADMAP.mdclaims NLP Entity Recognition (compromise) and PDF/Image rasterization are integrated, but the worker pool still uses hardcoded Regex for metadata extraction and entirely omits.pdfand images fromSUPPORTED_TEXT_EXTENSIONS.Resolution: Reintegrate the Stage 1 & 2 logic into the worker pool to align the active codebase with the current roadmap claims.
🛠️ Testing & CI Pipeline Improvements
node_moduleswith legacy third-party test runners.package.jsonto utilize the nativenode:testrunner's V8 capabilities:Prune GitHub Actions CI Matrix:
Location:
.github/workflows/test.ymlIssue: The matrix currently tests Node
18.x, butchokidarv5 explicitly requires Node>= 20.19.0. The Node 18 workflow will immediately fail during CI.Resolution: Remove Node
18.xfrom the testing matrix to prevent guaranteed false-negative build failures.Integrate Testing into Pre-Commit Hook:
Issue: Broken pipeline changes can currently be committed locally.
Resolution: Expand the planned Husky
.husky/pre-commitworkflow. The final PowerShell automation sequence should act as a strict local gatekeeper:npm run docs:generategit add docs/npm test(Ensure local V8 coverage passes before allowing the commit)npm run docs:checkDefinition of Done
--watchmode can be left running without triggering infinite recursive executions.\path separators properly)..github/workflows/test.ymlmatrix successfully runs and returns green on Node 20+.