A Retrieval-Augmented Generation (RAG) application built with a modern tech stack, featuring a Next.js frontend and a FastAPI backend.
The application implements a robust Retrieval-Augmented Generation (RAG) pipeline to provide accurate, context-aware answers from your documents.
When a document is uploaded:
- Loading: Documents are fetched from S3 using
S3FileLoader. - Chunking: Text is split into manageable chunks (1000 characters) using
RecursiveCharacterTextSplitterto ensure optimal context window usage. - Embedding: Each chunk is converted into a vector embedding using the BAAI/bge-small-en model via
HuggingFaceEmbeddings. This model is optimized for retrieval tasks. - Storage: Vectors and metadata are stored in ChromaDB, a high-performance open-source vector database.
When a user asks a question:
- Query Embedding: The user's question is converted into a vector using the same embedding model.
- Semantic Search:
ChromaDBperforms a similarity search to find the most relevant document chunks. - Context Assembly: Retrieved chunks are combined with the conversation history (to support follow-up questions).
- LLM Generation: The assembled context and user query are sent to Google Gemini 2.5 Flash (
gemini-2.5-flash). - Response: The LLM generates a concise, accurate answer based only on the provided context, citing sources where possible.
- LLM: Google Gemini 2.5 Flash (via
langchain-google-genai) - Embeddings: BAAI/bge-small-en (via
langchain-huggingface) - Vector Store: ChromaDB
- Orchestration: LangChain
Ensure you have the following installed:
- Node.js (v18+)
- Bun (
npm install -g bun) - Python (v3.13+)
- Docker & Docker Compose
- uv (Recommended for Python dependency management)
- Framework: Next.js 16 (App Router)
- Language: TypeScript
- Styling: Tailwind CSS 4
- State Management: Zustand
- Data Fetching: TanStack Query (React Query)
- Icons: Lucide React
- Framework: FastAPI
- Language: Python 3.13+
- AI/ML:
- Database:
- PostgreSQL (Application Data)
- ChromaDB (Vector Database)
- Migrations: Alembic
- Package Manager: uv
- Monorepo: Turborepo
- Runtime: Bun (Frontend), Python (Backend)
- Containerization: Docker & Docker Compose
-
Clone the repository:
git clone <repository-url> cd rag-docs
-
Install Frontend Dependencies:
bun install
-
Install Backend Dependencies:
cd apps/backend uv sync
- Navigate to
apps/backend. - Copy the example environment file:
cp .env.example .env
- Update
.envwith your API keys (Google GenAI, etc.) and database credentials.
- Navigate to
apps/web. - Create a
.envfile (if not present) and configure necessary environment variables (e.g., API base URL).
Run the infrastructure (DBs) in Docker, and the apps locally for hot-reloading.
-
Start Databases (Postgres & Chroma):
docker-compose up -d db chroma
-
Start Backend:
cd apps/backend # Apply migrations alembic upgrade head # Start server uvicorn app.main:app --reload --port 8080 # OR if using just just dev
-
Start Frontend: From the root directory:
bun dev # OR turbo run devThe web app will be available at
http://localhost:3000.
Run the entire backend stack in Docker.
-
Start Backend & DBs:
docker-compose up -d --build
This starts Postgres, Chroma, and the FastAPI backend (on port 8080).
-
Start Frontend:
bun dev
rag-docs/
├── apps/
│ ├── web/ # Next.js Frontend Application
│ └── backend/ # FastAPI Backend Application
├── packages/ # Shared packages (if any)
├── docker/ # Docker configurations
├── docker-compose.yml
├── turbo.json # Turborepo configuration
└── package.json
bun dev: Start the development server (Frontend).bun build: Build the application.bun lint: Lint the codebase.bun format: Format code using Prettier.bun check-types: Run TypeScript type checking.