Skip to content

Latest commit

Β 

History

History
34 lines (21 loc) Β· 767 Bytes

File metadata and controls

34 lines (21 loc) Β· 767 Bytes

πŸ“š AI-Search

A simple semantic search engine for PDFs using embeddings + Pinecone vector database β€” written in Python.

Given a set of PDF documents, this project:

βœ” Loads & chunks text βœ” Generates vector embeddings βœ” Stores vectors in Pinecone βœ” Queries them with natural language βœ” Returns relevant passages, document names, and page numbers

🧩 Features

Extract text from PDF files

Compute vector embeddings via your adapter (e.g., LangChain, OpenAI)

Index vectors in a Pinecone index

Run semantic search queries

Filter and dedupe results

Stream or show exact matches + context

πŸš€ Quick Start Requirements

Before you begin, make sure you have:

βœ” Python 3.8 or higher βœ” Pinecone API key βœ” OpenAI key (or other embedding provider)