This project utilizes pgvector for vector-based storage and retrieval, combined with an open-source sentence transformer for text embedding. The default OpenAI embedder requires an API key, which may exceed credit limits; thus, we configure the system to use a sentence transformer model with 1024 dimensions instead of the default 1536 dimensions.
I have created an open-source PDF assistant that eliminates the limitations imposed by OpenAI's embedder. This assistant can be used without any restrictions, allowing unlimited queries and document processing. This project utilizes pgvector for vector-based storage and retrieval, combined with an open-source sentence transformer for text embedding. The default OpenAI embedder requires an API key, which may exceed credit limits; thus, we configure the system to use a sentence transformer model with 1024 dimensions instead of the default 1536 dimensions.
- π³ Docker & Docker Compose installed
- π Python 3.8+ installed
- ποΈ PostgreSQL with pgvector extension enabled
- π Streamlit for the frontend
First, clone the project repository from GitHub:
git clone https://github.com/tejas-130704/PDF_Assistant.git
cd PDF_AssistantEnsure your docker-compose.yaml file is correctly set up, then run:
docker-compose up -dAfter the Docker container is running, execute the following commands:
docker exec -it <container_id> psql -U root -d mydbReplace <container_id> with the actual container ID (can be found using docker ps).
Connect to the database:
\c mydbCheck existing tables:
\dtDrop the existing embeddings table if it exists:
DROP TABLE IF EXISTS ai.embeddings;Create the new table with 1024-dimensional embeddings:
CREATE TABLE ai.embeddings (
id VARCHAR PRIMARY KEY,
name VARCHAR NOT NULL,
meta_data JSONB,
filters JSONB,
content TEXT NOT NULL,
embedding vector(1024), -- Adjusted to match the embedding model dimensions
usage JSONB,
content_hash VARCHAR UNIQUE
);Verify that the table was created successfully:
\dt ai.*Navigate to your project directory and install required packages:
pip install -r requirements.txtStart the Streamlit application:
streamlit run app.pyOnce running, open your browser and go to:
http://localhost:8501/
- Add
GROQ_API_KEYin the sidebar. - Provide the PDF link containing knowledge base content.
- Click "Load Knowledge Base".
- Once you see the message "Knowledge Base Loaded Successfully!", you can start asking questions. π
β οΈ If you get an error about mismatched vector dimensions, ensure that the embedding dimension in PostgreSQL matches the sentence transformer (1024).- π If OpenAI is still being used, check that your Python script is correctly configured to use sentence transformers instead of OpenAI embeddings.
- β
Ensure that all required dependencies are installed using
pip install -r requirements.txt.
- π Adding user authentication for secure access
- π Implementing cache storage to speed up repeated queries
- π¨ Enhancing UI/UX for a more interactive experience
- Tejas Narayan Jadhav - GitHub
π€ Feel free to contribute by submitting pull requests or reporting issues! π

