Shard is a high-performance, "Database-First" invoice processing engine that transforms unstructured documents into structured, actionable financial data. Built for speed and reliability, Shard solves the "Trust Gap" in AI by combining LLM-based semantic extraction with an Intelligent Review Queue and Computer Vision pre-validation.
DEPLOY LINK - https://shard-9qyi.onrender.com/
- Semantic AI Extraction: Powered by Groq/Grok, Shard understands the context of financial documents, making it layout-agnostic.
- Image Quality Guardrail: Uses Laplacian variance and histogram analysis to detect blur or low-contrast uploads before they hit the AI.
- Human-in-the-Loop (HITL): A dedicated Review Queue for low-confidence extractions ensures 100% data integrity.
- Database-First Architecture: Strict synchronization between React/Zustand and MongoDB, moving away from stale local caching.
- Real-Time Analytics: Visualize spend trends, vendor distributions, and AI accuracy metrics instantly.
- Frontend: React, Vite, Tailwind CSS, Framer Motion, Zustand
- Backend: Flask (Python), OpenCV (Image Processing)
- AI/LLM: Groq LPU™ Inference / Grok
- Database: MongoDB
- Cloud Readiness: Architected via Kiro for AWS (S3, DocumentDB, Cognito)
- Ingestion: User uploads PDF/Image.
- Validation: Backend performs blur/contrast checks.
- Extraction: AI identifies vendor, line items, taxes, and totals.
- Persistence: Data is saved to MongoDB with a unique
userIdidentity token. - Review: Low-confidence scores (< 85%) are quarantined for manual approval.
- Node.js (v18+)
- Python 3.9+
- MongoDB instance
- Clone the Repo
git clone [https://github.com/ASHUTOSH-A-49/Shard.git](https://github.com/ASHUTOSH-A-49/Shard.git) cd Shard - Backend Setup
cd backend pip install -r requirements.txt Create a .env file with your MONGO_URI and GROQ_API_KEY python app.py
`` 3. Frontend Setup
cd frontend
npm install
npm run devIdentity & Security A great README.md is often the difference between a project that gets ignored and one that wins a hackathon. Since you are positioning this as an innovative, production-ready AI tool, the README needs to be clean, technical, and visually organized.
Create a file named README.md in your root directory and paste the following:
Markdown
Shard is a high-performance, "Database-First" invoice processing engine that transforms unstructured documents into structured, actionable financial data. Built for speed and reliability, Shard solves the "Trust Gap" in AI by combining LLM-based semantic extraction with an Intelligent Review Queue and Computer Vision pre-validation.
- Semantic AI Extraction: Powered by Groq/Grok, Shard understands the context of financial documents, making it layout-agnostic.
- Image Quality Guardrail: Uses Laplacian variance and histogram analysis to detect blur or low-contrast uploads before they hit the AI.
- Human-in-the-Loop (HITL): A dedicated Review Queue for low-confidence extractions ensures 100% data integrity.
- Database-First Architecture: Strict synchronization between React/Zustand and MongoDB, moving away from stale local caching.
- Real-Time Analytics: Visualize spend trends, vendor distributions, and AI accuracy metrics instantly.
- Frontend: React, Vite, Tailwind CSS, Framer Motion, Zustand
- Backend: Flask (Python), OpenCV (Image Processing)
- AI/LLM: Groq LPU™ Inference / Grok
- Database: MongoDB
- Cloud Readiness: Architected via Kiro for AWS (S3, DocumentDB, Cognito)
- Ingestion: User uploads PDF/Image.
- Validation: Backend performs blur/contrast checks.
- Extraction: AI identifies vendor, line items, taxes, and totals.
- Persistence: Data is saved to MongoDB with a unique
userIdidentity token. - Review: Low-confidence scores (< 85%) are quarantined for manual approval.
- Node.js (v18+)
- Python 3.9+
- MongoDB instance
- Clone the Repo
git clone [https://github.com/ASHUTOSH-A-49/Shard.git](https://github.com/ASHUTOSH-A-49/Shard.git) cd Shard
Backend Setup
Bash
cd backend pip install -r requirements.txt
python app.py Frontend Setup
Bash
cd frontend npm install npm run dev 🛡️ Identity & Security Shard uses a standardized JWT Identity Payload stringified into the Authorization header. This ensures that every API request is partitioned by user identity, preventing data leaks in a multi-tenant environment.
Future Roadmap
- Direct AWS S3 integration for document archiving.
- Adding a Chatbot to ask questions directly from dashboard and analytics insights
- Export to QuickBooks/Xero API.
- Multi-currency conversion via real-time exchange rate APIs.
Built with ❤️ by team JETT-2-HOLIDAY for Hackxios contributors and team members
- Ashutosh Behera - https://github.com/ASHUTOSH-A-49
- Rahul Sahu - https://github.com/Rahulsahu7389
- S Vaibhavi - https://github.com/Vaibhaviii14
- Shourya Sinha - https://github.com/ShouryaGit023
- Samir Tiwari - https://github.com/samirtiwari020
- Akshat Sharma - https://github.com/Sharmaakshat369