A modular Retrieval-Augmented Generation (RAG) system for document analysis, prompt engineering, and data management. Designed for research, prototyping, and production workflows involving invoices, PDFs, and custom prompts.
- Overview
- Features
- Directory Structure
- Installation
- Usage
- Prompt Engineering
- Data & Storage
- Contributing
- License
- Links
This repository provides a flexible framework for:
- Document ingestion and processing (PDFs, JSON)
- Retrieval-Augmented Generation (RAG) pipelines
- Custom prompt engineering and evaluation
- Data storage and management
- Extensible chains and loaders for NLP tasks
- RAG Pipelines: Easily ingest, retrieve, and query documents
- Prompt Library: Curated prompt templates for zero-shot, few-shot, CoT, and more
- Invoice Analysis: Scripts and data for invoice extraction and validation
- Modular Design: Organized codebase for easy extension and maintenance
- Docker Support: Containerized setup for reproducibility
├── build/ # Docker and deployment files
│ ├── docker-compose.yml
│ └── data/ # Database and assets
├── prompt/ # Prompt templates and meta-prompt generator
├── src/ # Source code
│ ├── chains.py # RAG chains and pipelines
│ ├── loaders.py # Data/document loaders
│ ├── prompts.py # Prompt utilities
│ ├── utils.py # General utilities
│ ├── cnk-emb/ # Embedding and chunking modules
│ ├── data/ # Processed datasets
│ ├── RAG/ # RAG-specific modules
│ └── scripts/ # Analysis scripts
├── storage/ # Raw documents (PDF, PNG)
├── requirements.txt # Python dependencies
└── readme.md # Project documentation
- Python 3.8+
- Docker (optional)
git clone https://github.com/yourusername/ws-prompt-rag.git
cd ws-prompt-ragpip install -r requirements.txtcd build
docker-compose up --buildRun the main RAG pipeline:
python src/RAG/ingest.py
python src/RAG/qa.pypython src/scripts/analyze_invoice.pypython src/cnk-emb/main.pyExplore and use prompt templates in prompt/:
1-zero-shot.txt: Zero-shot prompt3-few-shot.txt: Few-shot prompt4-cot-persona.txt: Chain-of-thought persona promptmeta-prompt-generator.md: Meta prompt generation guide
- Processed Data:
src/data/processed/ - Raw Documents:
storage/ - Database & Assets:
build/data/
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Distributed under the MIT License. See LICENSE for details.
- LangChain - RAG framework inspiration
- Docker Documentation
- Prompt Engineering Guide
For questions or support, open an issue or contact the maintainer.