Home

DEMI — Digital Ecosystem Modernization Initiative

Architecture documentation for BC EAO's Digital File Library (DFL) and AI Classifier Tool (AICT).

Key Decision

Extend the existing Eagle ecosystem (eagle-api + Typesense) with a Docling microservice for document ingestion. eagle-api handles workflow/auth/classification; docling-service handles all document parsing, OCR, table extraction, and chunking as a separate Python pod.

Architecture Summary

OpenShift (free, CPU-only):
├── eagle-api (Node.js/Express) — workflow, auth, classification, API
├── docling-service (Python) — document parsing, OCR, tables, chunking
│   ├── Standard pipeline: Tesseract CLI + TableFormer (CPU)
│   └── Local VLM: Granite-Docling-258M (CPU, for complex layouts)
├── Typesense 30.x — keyword + semantic + faceted search + RAG
├── MongoDB — documents, metadata, audit log
├── S3 Object Storage — file binaries
└── ClamAV — virus scanning

Azure (pay-per-use API calls only):
├── GPT-4.1-mini vision — remote VLM fallback for degraded docs (~2%, after Granite fails)
└── GPT-4.1-nano — conversational search answers (v2, user-triggered)

Documentation

Document	Purpose
Project Plan	Business needs → work packages → delivery sequence
Technical Decisions	Research-backed technology choices with evidence
Architecture Overview	Master plan — Typesense-first, extend eagle-api
OCR Pipeline	OCR + text extraction detailed design
Eagle vs EPIC.search	Capability comparison — why Eagle is the better fit
ADR-001: Typesense	ADR: Typesense as unified search engine
ADR-002: Async Processing	ADR: MongoDB-based async job queue
ADR-003: Classification	ADR: No-LLM classification in v1
Implementation Proposal	Costs, hosting, work packages

Source Briefs

In briefs/:

Developer Brief - Digital File Library - APR 2026 2.docx
Developer Brief - AI Classifier Tool - APR 2026 3.docx
Developer Brief - EPIC.system Integration - APR 2026 1.docx
Context Based Tags in the DFL.vsdx

Cost

Component	Monthly	Notes
OpenShift (all infra)	$0	Free — Typesense, MongoDB, ClamAV, S3, docling-service all run here
docling-service (CPU)	$0	MIT license, CPU-only pod in OpenShift
VLM fallback (degraded docs)	~$0–5	Granite-Docling-258M local ($0) + GPT-4.1-mini remote (~2% of pages)
Conversational Search (optional v2)	~$1-3	GPT-4.1-nano at $0.0005/query, only when user triggers
Classification	$0	Rule-based, no LLM
Total	$0–3 (v1) / $1–8 (v2)	vs $500-2000/mo for EPIC.search approach

Pricing sources (May 2026): GPT-4.1-nano: $0.10/1M input, $0.40/1M output. GPT-4.1-mini: $0.40/1M input, $1.60/1M output. Granite-Docling-258M: $0 (local CPU). Azure DI Layout (upgrade path): $10/1K pages. Docling: $0 (MIT). One-time ETL: ~$15–75.

Related Repositories

Repo	Role
eagle-api	Backend — extends with DFL features
eagle-admin	Staff UI — search + upload + metadata review
eagle-public	Public UI — document search
EPIC.search	Reference — OCR patterns salvageable, architecture not adopted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

DEMI — Digital Ecosystem Modernization Initiative

Key Decision

Architecture Summary

Documentation

Source Briefs

Cost

Related Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DEMI Wiki

Architecture

Planning

ADRs

Related

Clone this wiki locally