🚀 AI-Powered Knowledge Base with pgvector and Sentence Transformer 🧠📚

🌟 Overview

This project utilizes pgvector for vector-based storage and retrieval, combined with an open-source sentence transformer for text embedding. The default OpenAI embedder requires an API key, which may exceed credit limits; thus, we configure the system to use a sentence transformer model with 1024 dimensions instead of the default 1536 dimensions.

🔥 Open-Source PDF Assistant

I have created an open-source PDF assistant that eliminates the limitations imposed by OpenAI's embedder. This assistant can be used without any restrictions, allowing unlimited queries and document processing. This project utilizes pgvector for vector-based storage and retrieval, combined with an open-source sentence transformer for text embedding. The default OpenAI embedder requires an API key, which may exceed credit limits; thus, we configure the system to use a sentence transformer model with 1024 dimensions instead of the default 1536 dimensions.

🛠️ Setup Instructions

✅ Prerequisites

🐳 Docker & Docker Compose installed
🐍 Python 3.8+ installed
🗄️ PostgreSQL with pgvector extension enabled
🌐 Streamlit for the frontend

🚀 Running the Project

Step 0: 🔗 Clone the GitHub Repository

First, clone the project repository from GitHub:

git clone https://github.com/tejas-130704/PDF_Assistant.git
cd PDF_Assistant

Step 1: 🏗️ Start pgvector with Docker

Ensure your docker-compose.yaml file is correctly set up, then run:

docker-compose up -d

Step 2: 🗄️ Configure the Database

After the Docker container is running, execute the following commands:

docker exec -it <container_id> psql -U root -d mydb

Replace <container_id> with the actual container ID (can be found using docker ps).

Connect to the database:

\c mydb

Check existing tables:

\dt

Drop the existing embeddings table if it exists:

DROP TABLE IF EXISTS ai.embeddings;

Create the new table with 1024-dimensional embeddings:

CREATE TABLE ai.embeddings (
    id VARCHAR PRIMARY KEY,
    name VARCHAR NOT NULL,
    meta_data JSONB,
    filters JSONB,
    content TEXT NOT NULL,
    embedding vector(1024), -- Adjusted to match the embedding model dimensions
    usage JSONB,
    content_hash VARCHAR UNIQUE
);

Verify that the table was created successfully:

\dt ai.*

Step 3: 📦 Install Dependencies

Navigate to your project directory and install required packages:

pip install -r requirements.txt

Step 4: 🚀 Run the Application

Start the Streamlit application:

streamlit run app.py

Once running, open your browser and go to:

http://localhost:8501/

Step 5: 📚 Load the Knowledge Base

Add GROQ_API_KEY in the sidebar.
Provide the PDF link containing knowledge base content.
Click "Load Knowledge Base".
Once you see the message "Knowledge Base Loaded Successfully!", you can start asking questions. 🎉

Screenshots

🛠️ Troubleshooting

⚠️ If you get an error about mismatched vector dimensions, ensure that the embedding dimension in PostgreSQL matches the sentence transformer (1024).
🛑 If OpenAI is still being used, check that your Python script is correctly configured to use sentence transformers instead of OpenAI embeddings.
✅ Ensure that all required dependencies are installed using pip install -r requirements.txt.

🚀 Future Enhancements

🔒 Adding user authentication for secure access
🚀 Implementing cache storage to speed up repeated queries
🎨 Enhancing UI/UX for a more interactive experience

🎖️ Contributors

Tejas Narayan Jadhav - GitHub

🤝 Feel free to contribute by submitting pull requests or reporting issues! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Readme.md		Readme.md
app.py		app.py
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 AI-Powered Knowledge Base with pgvector and Sentence Transformer 🧠📚

🌟 Overview

🔥 Open-Source PDF Assistant

🛠️ Setup Instructions

✅ Prerequisites

🚀 Running the Project

Step 0: 🔗 Clone the GitHub Repository

Step 1: 🏗️ Start pgvector with Docker

Step 2: 🗄️ Configure the Database

Step 3: 📦 Install Dependencies

Step 4: 🚀 Run the Application

Step 5: 📚 Load the Knowledge Base

Screenshots

🛠️ Troubleshooting

🚀 Future Enhancements

🎖️ Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

tejas-130704/PDF_Assistant

Folders and files

Latest commit

History

Repository files navigation

🚀 AI-Powered Knowledge Base with pgvector and Sentence Transformer 🧠📚

🌟 Overview

🔥 Open-Source PDF Assistant

🛠️ Setup Instructions

✅ Prerequisites

🚀 Running the Project

Step 0: 🔗 Clone the GitHub Repository

Step 1: 🏗️ Start pgvector with Docker

Step 2: 🗄️ Configure the Database

Step 3: 📦 Install Dependencies

Step 4: 🚀 Run the Application

Step 5: 📚 Load the Knowledge Base

Screenshots

🛠️ Troubleshooting

🚀 Future Enhancements

🎖️ Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages