Skip to content

Asmith-M/Noetic-Vault-DocAI-

Repository files navigation

NOETIC VAULT

Where knowledge stays private, but works smarter.

Noetic Vault is a privacy-first, offline AI-powered document intelligence system designed to extract, analyze, and answer queries from large internal documents such as policies, contracts, SOPs, and government PDFs.

It combines Retrieval-Augmented Generation (RAG), multi-agent reasoning, OCR pipelines, and verification layers to deliver accurate, explainable, and secure answers — without relying on the cloud.

FastAPI React Ollama ChromaDB Sentence-Transformers PyMuPDF Tesseract EasyOCR


Executive Summary

Noetic Vault is built around one critical question:

How can organizations safely use AI on sensitive documents without sending data to the cloud?

This system answers that by combining:

Layer Responsibility Key Components
Document Processing Extract text from all PDF types PyMuPDF, Tesseract, EasyOCR
Intelligence Layer Semantic understanding & retrieval Embeddings + ChromaDB
Reasoning Layer Multi-agent orchestration Retriever, Generator, Verifier
Security Layer Privacy & integrity Offline execution + Hashing
Output Layer Explainable results Citations, confidence, reports

SCREENSHOTS

image image

What Noetic Vault Does

  • Accepts digital and scanned PDFs (including govt documents)
  • Extracts structured text using a tiered OCR pipeline
  • Converts documents into semantic embeddings
  • Retrieves relevant information using vector search
  • Generates human-like answers using local LLMs
  • Verifies answers with source-backed validation
  • Supports multilingual queries (English, Hindi, Marathi)
  • Enables cross-document reasoning and contradiction detection
  • Runs completely offline — no cloud dependency
  • Provides optional blockchain-style document hashing

System Flow

User Upload (PDF)
        |
        v
OCR Engine (PyMuPDF / Tesseract / EasyOCR)
        |
        v
Text Cleaning + Semantic Chunking
        |
        v
Embedding Generation (Sentence-Transformers)
        |
        v
ChromaDB (Vector Storage)
        |
        v
Multi-Agent Pipeline
    |
    +--> RankerAgent (Retrieval)
    +--> GeneratorAgent (Answer)
    +--> VerifierAgent (Validation)
        |
        v
RAG Pipeline (Context + LLM)
        |
        v
Output:
- Answer
- Citations
- Confidence Score
- (Future: Risk Heatmap + Report)

About

DocAI is a multi-agent, offline Q&A system that answers questions from internal PDFs using LLMs. Built with React, FastAPI, and LangChain/CrewAI, it ensures privacy and works fully locally for sensitive use cases.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors