A Node.js CLI tool designed to batch-parse PDF reports, extract time-tracking data using Regular Expressions, and calculate aggregated totals efficiently — with parallel processing, tests, and CI.
This project was created to solve a real-world productivity bottleneck encountered during a consulting engagement.
Billable hours were distributed across dozens of auto-generated, unstructured PDF reports. Manually opening each file, locating time entries, and summing values was slow, repetitive, and error-prone.
The goal was to build a reliable, reusable, and auditable CLI tool that performs this task automatically.
- Batch processing of PDF files
- Controlled parallel processing
- Configurable concurrency
- Command Line Interface (CLI)
- Optional debug logging
- Unit tests and CLI contract tests
- GitHub Actions CI pipeline
npm installnpx pdf-time-extractorpdf-time-extractor [directory] [options]# Default (uses ./documents)
pdf-time-extractor
# Custom directory
pdf-time-extractor ./documents
# Parallel processing
pdf-time-extractor ./documents --concurrency 6
# Enable debug logs
pdf-time-extractor ./documents --verbose-
--concurrency <number>Number of parallel workers (default: 4) -
--verboseEnable debug output -
--helpDisplay usage information -
--versionDisplay the current version
src/
├── cli/ # CLI interface and argument parsing
├── core/ # Core processing logic
├── services/ # External I/O and integrations
├── utils/ # Pure utility functions
tests/ # Unit and CLI tests
.github/workflows/ # CI configuration
documents/ # Sample / input PDFsRun tests locally:
npm test
npm run test:coverageAll tests and coverage checks run automatically on every push and pull request via GitHub Actions.
MIT License