- Core Loop: Ingest -> Analyze -> Report
- Analytics Scope
- Core Features
- Telemetry & Extension Pipeline
- Current Implementation
- CLI Runtime Behavior
- Command Reference
- Supported File Types
- Repository Layout
- Testing
- Documentation Workflow
- Non-Destructive Guarantee
- Notes for Contributors and Copilot
- Installation & Setup
- Alternative & Manual Installation
- Usage
- Troubleshooting & Diagnostics
- Planned Technical Optimizations
UAP AnalyticsBot is a file-first analytics system built around a repeatable three-stage loop:
- Ingest: Discover source files from a target folder using read-only access.
- Analyze: Extract and model text + metadata into analytics outputs.
- Report: Produce structured summaries and recommendations for decision-making.
This loop is the domain model Copilot should assume when assisting in this repository.
The analysis stage is intentionally split into four tiers:
- Descriptive: What happened? (term frequencies, glossaries, dates, locations)
- Diagnostic: Why did it happen? (correlations across terms, dates, and locations)
- Predictive: What is likely to happen? (trend forecasting from historical timestamps)
- Prescriptive: What should we do? (actionable recommendations and data-quality flags)
-
Multi-Format Ingestion: Natively processes
.txt,.md,.json,.csv,.log, and.pdffiles. -
WebAssembly OCR Fallback: Automatically detects scanned government PDFs and dynamically rasterizes pages into images via
mupdf(running completely natively in the V8 engine) before passing them to Tesseract for optical character recognition. -
Intelligent Noise Filtering: Utilizes a static
$O(1)$ Stop-Word and artifact culling pass at the ingestion layer to strip grammatical glue and OCR noise, ensuring perfectly clean downstream analytics. -
Multi-Tiered Analytics:
- Descriptive: Top keywords, location extraction, and timeline mapping.
- Diagnostic: Location-based keyword clustering.
- Predictive: Next-likely location hotspots based on frequency modeling.
- Prescriptive: Automated recommendations for folder restructuring and missing metadata alerts.
- Automated Markdown Reporting: Formats raw JSON telemetry into a clean, human-readable intelligence report.
UAP AnalyticsBot features a repository operational telemetry parsing, storage, and agent-spawning pipeline to continuously stream and monitor developer workflows.
- Database Ingestion (
src/telemetry/db.js): Manages the local SQLite database (uap_telemetry.db) to store raw webhook payloads, calculated metrics, and anomaly events. - Metric Ingestion & Extraction (
src/telemetry/ingestion.js): Parses GitHub webhook events (e.g. pull request and push actions) to extract velocity and churn metrics (like cycle velocities, codebase churn ratios, and commit success frequencies). - Validation & Drift Detection (
src/telemetry/analytics.js): Detects metric drift, legacy environment configurations, and registers alerts in the SQLite database. - Virtual Subagent delegation (
src/telemetry/handoff.js): Simulates theinvoke_subagenttask handoff routine, formatting and forwarding analytical telemetry to AI agents. - E2E Simulation (
verify.js): A standalone script that injects mock webhook payloads, executes analysis, updates the database, and prints a final telemetry execution report.
The active implementation is a Node.js CLI that:
- resolves the source directory from the first CLI argument, or defaults to the current working directory
- recursively scans supported text files in read-only mode
- extracts words, dates, locations, and file metadata
- builds descriptive, diagnostic, predictive, and prescriptive analytics
- emits a formatted JSON report to standard output
See docs/architecture.md for the current-vs-planned architecture view. Historical Python prototype details live in docs/legacy-prototype.md.
On a successful run, the CLI prints a JSON report to stdout that includes:
sourceDirectorydescriptivediagnosticpredictiveprescriptive
If an error occurs, the CLI prints the error message to stderr and exits with a non-zero status.
| Command | Purpose |
|---|---|
npm start -- /absolute/path/to/source-folder |
Run the active Node CLI and emit a JSON analytics report. |
npm test |
Run the current Node test suite. |
npm run docs:generate |
Refresh autogenerated documentation sections. |
npm run docs:check |
Verify autogenerated docs are current and that required documentation references remain valid. |
npm run release |
Auto-generate CHANGELOG.md, bump the semantic version, and create a Git release tag based on conventional commit history. |
The current Node ingestion pipeline only analyzes text-oriented files.
| Extension | Status |
|---|---|
.txt |
Ingested by the active Node pipeline |
.md |
Ingested by the active Node pipeline |
.json |
Ingested by the active Node pipeline |
.csv |
Ingested by the active Node pipeline |
.log |
Ingested by the active Node pipeline |
.pdf |
Ingested by the active Node pipeline |
.png |
Ingested by the active Node pipeline |
.jpg |
Ingested by the active Node pipeline |
.jpeg |
Ingested by the active Node pipeline |
src/index.js— Node CLI entry point.src/pipeline.js— Pipeline coordinator that assembles all analytics tiers.src/ingestion/file-ingestion.js— Read-only recursive file ingestion for supported files.src/analytics/— Descriptive, diagnostic, predictive, and prescriptive analytics modules.src/telemetry/db.js— SQLite Database Layer for storing repository telemetry.src/telemetry/ingestion.js— Telemetry Ingestion Engine for parsing webhook events.src/telemetry/analytics.js— Telemetry Analytics and Drift Detection.src/telemetry/handoff.js— Subagent Handoff Simulator (simulates invoke_subagent).verify.js— E2E simulation script for telemetry extension.test/pipeline.test.js— Node test coverage for core pipeline behavior.test/telemetry.test.js— Test suite for telemetry extension.docs/architecture.md— Hand-authored architecture overview for current and planned system design.docs/legacy-prototype.md— Historical Python prototype reference.docs/ROADMAP.md— Active development tracker and planned stages.docs/USER_GUIDE.md— Installation and execution instructions for end-users.
The current Node test suite verifies that:
- the full analytics report is produced for supported text fixtures
- dates and locations are extracted into analytics outputs
- prescriptive recommendations flag files with missing metadata
Run npm test to execute the existing suite.
- Narrative documentation stays hand-authored.
- Command reference, supported file types, and repository layout are generated from repository metadata.
- Run
npm run docs:generateafter changing documented commands, supported file types, or layout metadata. - Run
npm run docs:checkbefore submitting changes; the repository is configured to enforce this on pull requests and releases.
The bot must never modify, move, or delete ingested source files. Ingestion is read-only by design.
- Keep ingestion logic modular and separate from analytics logic.
- Prefer asynchronous and streaming patterns for large datasets.
- Preserve strict read-only behavior for source directories.
- When adding analytics, classify behavior under one of the four analytics tiers.
- Update docs/architecture.md when implementation changes affect current-vs-planned system boundaries.
Important
Simply double-click the install.bat file at the root of the project.
This automatically verifies, downloads, and installs Node.js, npm, and all dependencies.
If you prefer command-line setups or are on a non-Windows platform:
- Windows PowerShell (Automated): Run the following in PowerShell:
Set-ExecutionPolicy Bypass -Scope Process -Force; .\setup.ps1Prerequisites: Ensure you have Node.js installed (version 18, 20, or 22+ recommended).
- Clone the repository:
git clone https://github.com/aj1126/uap_analyticsbot.git
cd uap_analyticsbot- Install dependencies: This project installs as a standard Node.js CLI package, so there are no extra native build steps required for the current worker-thread ingestion flow. Simply run:
npm install- Verify the installation: Run the local test suite to ensure the multithreaded worker pool and caching engine are functioning correctly on your machine:
npm test
(If all tests pass green, you are ready to start analyzing documents!)
Tip
Simply double-click gui.bat at the root of the project.
This spins up the local server and automatically launches your default browser to browse directories, run analyses, and view telemetry metrics.
To run the terminal bot without typing commands:
- Drag-and-Drop: Drag any data folder directly onto
run.batin Windows Explorer. - Double-click: Double-click
run.batand paste the folder path when prompted.
To run the AnalyticsBot via command line, pass the target directory containing your files as the first argument:
node src/index.js ./my_folder/By default, this will parse the documents and output a formatted JSON report directly to your console.
Keep the pipeline running in the background. It will automatically re-analyze the documents and recalculate the math whenever you add, edit, or delete a file in the target directory:
node src/index.js ./my_folder/ --watch
Instead of dumping JSON directly to the console, you can generate formatted report files that are automatically saved to the /data_exports/ directory:
node src/index.js ./my_folder/ --format=md
(Supports md for Markdown or csv for spreadsheet datasets).
The v1.2.0 AnalyticsBot engine supports multithreading and memoization caching. You can control these via CLI arguments:
node src/index.js ./my_folder --workers=4: Manually set the number of Node.js worker threads (defaults to max CPU cores).node src/index.js ./my_folder --clear-cache: Bypasses the.analytics_cache.jsonfile and forces a fresh read of all documents.node src/index.js ./my_folder --format=csv: Exports the final report as a spreadsheet-compatible.csvfile.
If you encounter issues launching the Web GUI, installing packages, or running analyses:
- Double-click
diagnose.batat the root of the project. - This runs comprehensive environment checks, package validation, database integrity checks, and network port analysis.
- Review the terminal output or the generated
diagnostics_report.txtfile in the root folder for troubleshooting details.
- Multithreaded Ingestion: Transition the CPU-bound WebAssembly and OCR tasks to a
Worker Poolusingnode:worker_threadsto achieve near-linear scaling on multi-core Windows 11 systems. - Ingestion Caching: Implement a fingerprinting system using file metadata (size + mtime) to cache extraction results in a
.analytics_cache.jsonfile, bypassing heavy processing for unchanged documents.
- TF-IDF Weighting: Implement Term Frequency-Inverse Document Frequency to mathematically penalize common stop-words and highlight unique, document-defining keywords[cite: 2].
- Semantic Similarity: Utilize Cosine Similarity matrices to discover conceptual overlaps between separate PDF reports in the repository.
- Entity Unification: Enhance the normalization pass to merge variations of locations (e.g., "Mexico:", "Mexico's", and "MEXICO") into a single canonical token.
-
Advanced Stop-Word Culling: Expand the static
$O(1)$ filter set to include common OCR artifacts and administrative government jargon.
- Time-Series Clustering: Group location hotspots by temporal quarters to improve the accuracy of the
likelyNextHotspotforecast. - Metadata Validation: Extend data-quality flags to check for specific required fields in PDF
metadata.infoobjects.