Skip to content

shrey2003/docq

Repository files navigation

DocQnA: Document Question Answering System

Table of Contents

Overview

SampleImage
DocQnA is a two-phase document question-answering system that combines advanced natural language processing techniques with a robust data ingestion and transformation pipeline. The first phase involves fine-tuning a language model, while the second phase focuses on deploying the system for real-time document question answering.
It is deployed on Azure having Website Link: [https://dockyqna.azurewebsites.net/]
You can find my finetuned version of Zephyr-7B Model for document question-answering here
Hugging Face Link: [https://huggingface.co/Feluda/Zephyr-7b-QnA]

Phases

Phase 1: Fine-Tuning:

This phase involves training and fine-tuning a Zephyr-7B Model language model on a dataset of choice. The fine-tuned model is then saved for later use in the DocQnA system.
This is the training results of my model: image

Phase 2: Deployment:

In this phase, the fine-tuned language model is integrated with a document retrieval system to provide accurate and contextually relevant responses to user queries.

Features

Two-Phase Process:

Clear separation between model training and deployment phases, ensuring optimal performance and maintainability.

Advanced Language Model Integration:

Utilizes a fine-tuned language model for generating high-quality, human-like responses.

Efficient Document Retrieval:

Implements a FAISS vector store for fast and precise document retrieval, ensuring that the most relevant information is quickly accessible.

Conversational Memory Management:

Employs a conversational memory mechanism to maintain continuity in the conversation, allowing the chatbot to consider past interactions and provide contextually relevant responses.

Intuitive User Interface:

Offers a user-friendly interface for users to interact with the system, making it easy to ask questions and receive detailed answers.

Scalability:

Designed to scale with the size of the document collection, accommodating large volumes of data without compromising performance.

Installation

Prerequisites
Python 3.x
Flask
langchain-community libraries
PyPDF2 (for handling PDF files)
Hugging Face Transformers (for embeddings)

Steps

Clone the repository:

git clone https://github.com/shrey2003/docq.git

Navigate to the project directory:

cd docq

Install the required packages:

pip install -r requirements.txt

Run the Flask application:

python app.py

Usage

Open a web browser and navigate to http://localhost:7000.
Upload a PDF document.
Ask a question related to the content of the document.
Receive an answer generated by the fine-tuned language model.

Deployment

The application can be deployed using Docker. A Dockerfile is provided for building a container image.

Build the Docker image:

docker build -t docqna .

Run the Docker container:

docker run -p 7000:7000 docqna

Workflows

docker build -t docqna.azurecr.io/docqna:latest .

docker login docqna.azurecr.io

docker push docqna.azurecr.io/docqna:latest

License

Distributed under the MIT License. See LICENSE for more information.

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors