DocQnA: Document Question Answering System

Overview

DocQnA is a two-phase document question-answering system that combines advanced natural language processing techniques with a robust data ingestion and transformation pipeline. The first phase involves fine-tuning a language model, while the second phase focuses on deploying the system for real-time document question answering.
It is deployed on Azure having Website Link: [https://dockyqna.azurewebsites.net/]
You can find my finetuned version of Zephyr-7B Model for document question-answering here
Hugging Face Link: [https://huggingface.co/Feluda/Zephyr-7b-QnA]

Phases

Phase 1: Fine-Tuning:

This phase involves training and fine-tuning a Zephyr-7B Model language model on a dataset of choice. The fine-tuned model is then saved for later use in the DocQnA system.
This is the training results of my model:

Phase 2: Deployment:

In this phase, the fine-tuned language model is integrated with a document retrieval system to provide accurate and contextually relevant responses to user queries.

Features

Two-Phase Process:

Clear separation between model training and deployment phases, ensuring optimal performance and maintainability.

Advanced Language Model Integration:

Utilizes a fine-tuned language model for generating high-quality, human-like responses.

Efficient Document Retrieval:

Implements a FAISS vector store for fast and precise document retrieval, ensuring that the most relevant information is quickly accessible.

Conversational Memory Management:

Employs a conversational memory mechanism to maintain continuity in the conversation, allowing the chatbot to consider past interactions and provide contextually relevant responses.

Intuitive User Interface:

Offers a user-friendly interface for users to interact with the system, making it easy to ask questions and receive detailed answers.

Scalability:

Designed to scale with the size of the document collection, accommodating large volumes of data without compromising performance.

Installation

Prerequisites
Python 3.x
Flask
langchain-community libraries
PyPDF2 (for handling PDF files)
Hugging Face Transformers (for embeddings)

Steps

Clone the repository:

git clone https://github.com/shrey2003/docq.git

Navigate to the project directory:

cd docq

Install the required packages:

pip install -r requirements.txt

Run the Flask application:

python app.py

Usage

Open a web browser and navigate to http://localhost:7000.
Upload a PDF document.
Ask a question related to the content of the document.
Receive an answer generated by the fine-tuned language model.

Deployment

The application can be deployed using Docker. A Dockerfile is provided for building a container image.

Build the Docker image:

docker build -t docqna .

Run the Docker container:

docker run -p 7000:7000 docqna

Workflows

docker build -t docqna.azurecr.io/docqna:latest .

docker login docqna.azurecr.io

docker push docqna.azurecr.io/docqna:latest

License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
config		config
logs		logs
model_trainer_notebook		model_trainer_notebook
research		research
src/DocQnA		src/DocQnA
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
setup.py		setup.py
template.py		template.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocQnA: Document Question Answering System

Table of Contents

Overview

Phases

Phase 1: Fine-Tuning:

Phase 2: Deployment:

Features

Two-Phase Process:

Advanced Language Model Integration:

Efficient Document Retrieval:

Conversational Memory Management:

Intuitive User Interface:

Scalability:

Installation

Steps

Clone the repository:

Navigate to the project directory:

Install the required packages:

Run the Flask application:

Usage

Deployment

Build the Docker image:

Run the Docker container:

Workflows

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocQnA: Document Question Answering System

Table of Contents

Overview

Phases

Phase 1: Fine-Tuning:

Phase 2: Deployment:

Features

Two-Phase Process:

Advanced Language Model Integration:

Efficient Document Retrieval:

Conversational Memory Management:

Intuitive User Interface:

Scalability:

Installation

Steps

Clone the repository:

Navigate to the project directory:

Install the required packages:

Run the Flask application:

Usage

Deployment

Build the Docker image:

Run the Docker container:

Workflows

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages