Project Plan

Project Plan — DEMI

Business needs → technical work → delivery sequence.

Business Needs

The EAO needs to modernize document management for Environmental Assessments. Three developer briefs define the scope:

1. Digital File Library (DFL)

Problem: Staff cannot efficiently find, file, or retrieve EA documents. 65,000+ existing documents in EPIC have limited metadata, no full-text search, and no structured filing system.

Business Outcomes:

Staff find documents in seconds (full-text + faceted search)
Documents are classified by type, topic, and ORCS code
Visibility controls enforce who sees what (team → EAO → IDIR → public)
Paper records and legacy scans become searchable
Filing backlog (27,000 paper records) has a digital path forward

2. AI Classifier Tool (AICT)

Problem: Manual metadata entry for every uploaded document is slow and inconsistent. Staff skip fields, use wrong categories, or leave documents untagged.

Business Outcomes:

Uploaded documents get automatic metadata suggestions (type, topic, ORCS)
Staff review and confirm — not replace — AI suggestions
Classification is consistent across 65,000 documents
Zero per-document cost (rule-based, no AI billing)
Future LLM upgrade path exists if accuracy insufficient

3. EPIC.system Integration

Problem: DFL/AICT must work within the existing EPIC platform (eagle-api, eagle-admin, eagle-public), not as standalone services.

Business Outcomes:

No new applications to deploy or maintain
Existing authentication, RBAC, upload pipelines reused
Single system of record (MongoDB) for all document metadata
Search works across both existing EPIC content and new DFL filings

How Work Maps to Business Needs

Business Need	Work Package(s)	Outcome
Find documents fast	WP5 (Typesense schema) + WP6 (UI)	Full-text + semantic + faceted search
Classify documents	WP3 (Rule-based classifier)	Auto-suggestions, $0/doc
Human review before publish	WP4 (Metadata hold)	Nothing goes live without staff approval
Search scanned PDFs	WP2 (docling-service)	Tesseract CLI OCR + TableFormer, CPU-only
Search Office/Email/HTML	WP2 (docling-service)	Unified parsing for all formats
Handle complex/degraded docs	WP2 (docling-service VLM)	Granite-Docling-258M local + Azure OpenAI GPT-4.1-mini vision remote fallback
Handle legacy 65K docs	WP7 (ETL)	Batch reprocess via scaled docling replicas
Search across entities	WP5 (Typesense federated)	One search bar → docs + projects + activities
Search UX (suggestions, groups)	WP5 (Typesense analytics + grouped hits)	Query suggestions, results grouped by project
Admin result control	WP5 (Typesense curation)	Pin important docs, bury superseded, domain synonyms
Visibility controls	WP1 (Schema) + existing RBAC proxy	`allowed_roles` already enforces
AI answers (future)	WP8 (Conversational search)	GPT-4.1-nano via Typesense, user-triggered
No new infrastructure cost	All	Everything in OpenShift (free), Azure API calls only

Delivery Phases

Phase 1: Foundation

WP	Task	Delivers
WP1	Schema extension — add DFL fields to Document model	Visibility, classification, holdState fields
WP5	Typesense schema update — facets, embeddings, synonyms, analytics, curation	Semantic + hybrid search, query suggestions, grouped results, domain synonyms
WP4	Metadata verification hold — staged/admitted workflow	Upload → hold → review → publish pipeline

Milestone: Documents can be uploaded with DFL metadata and held for review.

Phase 2: Intelligence

WP	Task	Delivers
WP2	docling-service — Python microservice for parsing, OCR, tables, chunking	All document formats → searchable, chunked text (RAG-ready)
WP3	Rule-based classifier — keyword scoring + vocabularies	Auto-suggest type, topics, ORCS code

Milestone: Uploaded documents get auto-extracted text (via Docling) and classification suggestions.

Phase 3: Interface (parallel with Phase 2)

WP	Task	Delivers
WP6	eagle-admin UI — DFL search, upload, review screens	Staff can search, upload, review, admit documents

Milestone: Staff-usable DFL interface in eagle-admin.

Phase 4: Migration

WP	Task	Delivers
WP7	Priority 1A ETL — batch-process 65K existing docs (scale docling replicas)	Legacy documents searchable + classified

Milestone: All existing EPIC documents available in DFL search.

Phase 5: AI Enhancement (post-launch, optional)

WP	Task	Delivers
WP8	Conversational search — Typesense + GPT-4.1-nano	"Ask a question" AI answers with source citations

Milestone: Users can ask natural language questions and get answers grounded in actual documents.

Dependency Graph

graph LR
    WP1[WP1: Schema] --> WP4[WP4: Metadata Hold]
    WP1 --> WP5[WP5: Typesense Schema]
    WP2[WP2: docling-service] --> WP3[WP3: Classifier]
    WP3 --> WP7[WP7: ETL 65K docs]
    WP4 --> WP6[WP6: eagle-admin UI]
    WP5 --> WP6
    WP5 --> WP8[WP8: Conversational Search]

    classDef critical fill:#4a1a1a,stroke:#3a0a0a,color:#fff
    classDef parallel fill:#1a3a4a,stroke:#0a2a3a,color:#fff
    classDef optional fill:#3a3a1a,stroke:#2a2a0a,color:#fff
    class WP1,WP2,WP3,WP7 critical
    class WP4,WP5,WP6 parallel
    class WP8 optional

Critical path: WP1 + WP2 (parallel, no dependency) → WP3 → WP7 (searchable legacy corpus).

Parallel track: WP4/WP5 → WP6 (UI) runs alongside intelligence work.

Cost Summary

Phase	Monthly	One-Time	Notes
Phase 1–4 (v1)	$0	~$15–75	Remote VLM fallback for degraded docs during ETL
Phase 5 (v2 AI answers)	$1–3	—	GPT-4.1-nano, user-triggered only

Full pricing breakdown: Implementation Proposal

Risks and Mitigations

Risk	Impact	Mitigation
Docling accuracy on degraded scans	Some docs poorly extracted	Two-tier VLM: Granite-Docling-258M (local CPU) → Azure OpenAI GPT-4.1-mini vision (remote). Upgrade path: Azure DI Layout ($10/1K pages)
Rule-based accuracy below 60%	Staff find suggestions unhelpful	Upgrade to Typesense embedding similarity (still $0)
65K ETL takes too long	Delayed availability	Scale docling replicas to 3+, prioritize most-accessed docs
docling-service resource usage	Pod eviction/OOM	Set memory limits at 2Gi, monitor during ETL, scale horizontally
UI complexity	Timeline slip	Ship minimal viable search first, iterate
Staff adoption	DFL unused	Involve staff in hold workflow design, make it faster than current process

Decision Points (Need Business Input)

#	Decision	Options	Recommendation
1	VLM confidence threshold?	A) 0.70 (more Azure calls), B) 0.50 (only worst docs)	A — better quality, small cost delta
2	Conversational search timing?	A) Ship with v1, B) Add after core DFL proven	B — core value is filing + finding
3	Auto-admit high-confidence classifications?	A) Always hold, B) Auto-admit >95% confidence	A for v1, revisit after accuracy data
4	Paper records (27K) timeline?	A) After digital, B) Parallel	A — prove pipeline on digital first
5	Public search (eagle-public) scope?	A) Staff-only initially, B) Public from day 1	A — staff validates before public exposure

Success Metrics

Metric	Target	How Measured
Document findability	<5 seconds to locate any document	Search latency + user feedback
Classification accuracy	>70% top-3 suggestions correct	Staff override rate during hold
Filing speed	2x faster than current manual process	Time-to-admit per document
Corpus coverage	100% of 65K docs searchable	ETL completion tracking
Cost	$0/month for v1 operations	Azure billing dashboard

Next Steps

Approve plan — confirm scope and decision points
Start WP1 — schema extension (unblocks everything else)
Define vocabularies — document types, topics, ORCS mappings for classifier
UI wireframes — eagle-admin DFL screens (can parallel with WP1-3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Plan

Project Plan — DEMI

Business Needs

1. Digital File Library (DFL)

2. AI Classifier Tool (AICT)

3. EPIC.system Integration

How Work Maps to Business Needs

Delivery Phases

Phase 1: Foundation

Phase 2: Intelligence

Phase 3: Interface (parallel with Phase 2)

Phase 4: Migration

Phase 5: AI Enhancement (post-launch, optional)

Dependency Graph

Cost Summary

Risks and Mitigations

Decision Points (Need Business Input)

Success Metrics

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DEMI Wiki

Architecture

Planning

ADRs

Related

Clone this wiki locally