Skip to content

Bug/Chore: Resolve pipeline anomalies and modernize testing infrastructure #13

Description

@aj1126

Title

Bug/Chore: Resolve pipeline anomalies and modernize testing infrastructure


Description

Problem Statement

The main branch transition is currently carrying several critical runtime bugs affecting the local ingestion pipeline and cache management. Additionally, the continuous integration (CI) and local testing workflows require optimization to prevent false negatives and ensure accurate code coverage reporting without relying on legacy dependencies.

🚨 High-Priority Bug Fixes

  • Fix Watch Mode Infinite Event Cascade:

  • Location: src/index.js

  • Issue: The chokidar watcher uses a faulty regex (ignored: [ ... /data_exports[\/\\]?$/]). The end-of-string anchor $ causes it to ignore the export directory but not the files generated inside it. When the pipeline writes a markdown report, it triggers a recursive infinite loop during --watch execution.

  • Resolution: Remove the anchor. Update the regex to comprehensively ignore the directory and its children (e.g., /data_exports/).

  • Resolve Cache Eviction Path Collisions:

  • Location: src/ingestion/file-ingestion.js

  • Issue: Stale cache cleanup currently evaluates key.startsWith(sourceDirectory). On Windows 11 filesystems, targeting a directory like C:\UAP_Data will falsely flag sibling directories like C:\UAP_Data_Archive, causing severe cache thrashing and forcing redundant rasterization loops.

  • Resolution: Append the strict OS directory separator to the boolean check: key.startsWith(sourceDirectory + path.sep).

  • Restore Stage 1/2 NLP and Rasterization Integrations:

  • Location: src/ingestion/worker.js

  • Issue: There is an architectural regression. ROADMAP.md claims NLP Entity Recognition (compromise) and PDF/Image rasterization are integrated, but the worker pool still uses hardcoded Regex for metadata extraction and entirely omits .pdf and images from SUPPORTED_TEXT_EXTENSIONS.

  • Resolution: Reintegrate the Stage 1 & 2 logic into the worker pool to align the active codebase with the current roadmap claims.


🛠️ Testing & CI Pipeline Improvements

  • Implement Native V8 Code Coverage:
  • Issue: We need coverage telemetry without bloating node_modules with legacy third-party test runners.
  • Resolution: Update package.json to utilize the native node:test runner's V8 capabilities:
"scripts": {
  "test": "node --test --experimental-test-coverage"
}
  • Prune GitHub Actions CI Matrix:

  • Location: .github/workflows/test.yml

  • Issue: The matrix currently tests Node 18.x, but chokidar v5 explicitly requires Node >= 20.19.0. The Node 18 workflow will immediately fail during CI.

  • Resolution: Remove Node 18.x from the testing matrix to prevent guaranteed false-negative build failures.

  • Integrate Testing into Pre-Commit Hook:

  • Issue: Broken pipeline changes can currently be committed locally.

  • Resolution: Expand the planned Husky .husky/pre-commit workflow. The final PowerShell automation sequence should act as a strict local gatekeeper:

  1. npm run docs:generate
  2. git add docs/
  3. npm test (Ensure local V8 coverage passes before allowing the commit)
  4. npm run docs:check

Definition of Done

  • --watch mode can be left running without triggering infinite recursive executions.
  • Ingestion cache strictly isolates exact directory paths (handling Windows \ path separators properly).
  • Worker scripts accurately reflect the roadmap features (NLP/PDF support).
  • Native Node test coverage outputs successfully in the terminal.
  • The .github/workflows/test.yml matrix successfully runs and returns green on Node 20+.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingdocumentationImprovements or additions to documentationenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions