A production-ready Retrieval-Augmented Generation (RAG) system built using:
- Spring Boot 3
- Spring AI
- PostgreSQL + pgvector
- OpenAI (Embeddings + LLM)
- Docker
This project demonstrates modern AI system design integrated into a clean, scalable backend architecture suitable for enterprise environments.
This application implements a Retrieval-Augmented Generation (RAG) pipeline:
- Documents are ingested and converted into vector embeddings.
- Embeddings are stored in PostgreSQL using pgvector.
- User queries are embedded and matched using vector similarity search.
- Retrieved context is injected into an LLM prompt.
- The LLM generates grounded, context-aware responses.
The system ensures responses are based strictly on stored domain data, reducing hallucinations and improving reliability.
This project demonstrates:
- Real-world AI backend integration
- Vector database usage
- Semantic search implementation
- LLM prompt engineering
- Cost-aware AI architecture
- Clean layered Spring architecture
- Containerized infrastructure
It bridges traditional backend engineering with modern AI system design.
flowchart TD
A[User Request] --> B[REST Controller]
B --> C[Service Layer]
C --> D[Vector Store - PGVector]
D --> E[Embedding Model - OpenAI]
E --> F[Similarity Search]
F --> G[Context Injection]
G --> H[LLM - OpenAI GPT]
H --> I[Response]
When a document is added:
- Text is sent to the embedding model.
- A high-dimensional vector is generated (e.g., 1536 dimensions).
- The vector and original content are stored in PostgreSQL.
This allows semantic similarity comparison rather than keyword matching.
When a user asks a question:
- The query is embedded.
- Vector similarity search retrieves top-K relevant documents.
- Documents are merged into structured prompt context.
- The LLM generates a response using only retrieved context.
This enforces grounding and reduces hallucinations.
| Layer | Technology |
|---|---|
| Backend | Spring Boot 3 |
| AI Integration | Spring AI |
| Vector Database | PostgreSQL + pgvector |
| LLM Provider | OpenAI |
| Containerization | Docker |
| Build Tool | Maven |
| Java Version | 17+ |
docker-compose up -d
Container includes:
PostgreSQL 16
pgvector extension
Database:
Name: ragdb
User: postgres
Password: postgres
⚙️ Application Configuration
application.yml
spring:
datasource:
url: jdbc:postgresql://localhost:5432/ragdb
username: postgres
password: postgres
ai:
openai:
api-key: ${OPENAI_API_KEY}
embedding:
options:
model: text-embedding-3-small
vectorstore:
pgvector:
initialize-schema: true
▶️ Running the Application
mvn clean install
mvn spring-boot:run
🧪 API Endpoints
Add Document
curl -X POST http://localhost:8080/api/rag/documents \
-H "Content-Type: text/plain" \
-d "Spring Boot is a Java framework for building microservices."
Ask Question
curl --get http://localhost:8080/api/rag/ask \
--data-urlencode "question=What is Spring Boot?"Vector search indexed with pgvector Top-K configurable Embeddings cached strategy-ready Dockerized for cloud deployment
Embedding model is lightweight and cost-efficient Similarity search happens inside PostgreSQL Stateless service layer for horizontal scaling
API key externalized via environment variables No sensitive data stored in prompts Database credentials configurable
Can be extended with:
PDF ingestion + automatic chunking Metadata filtering Streaming responses Multi-tenant isolation Hybrid search (BM25 + vector) Observability (Micrometer, OpenTelemetry)
Add token-aware context truncation Add vector index tuning Add response caching Add rate limiting Add circuit breaker for LLM calls Add prompt versioning Add audit logging for AI interactions Add evaluation metrics (RAG accuracy testing)
Separation of concerns (Controller / Service / AI layer) Clean prompt engineering AI cost-awareness Retrieval pipeline design Infrastructure as code Container-based local development JVM ecosystem + AI integration Understanding of embedding vs generation models Understanding of vector similarity metrics
App: Kubernetes / ECS / Azure Container Apps DB: Managed PostgreSQL with pgvector Secrets: Vault / AWS Secrets Manager Observability: Prometheus + Grafana LLM: External provider (OpenAI) or self-hosted model
Modern AI system integration Backend architectural maturity Clean, production-grade engineering practices Understanding of vector databases Understanding of LLM lifecycle Cost/performance trade-offs Practical GenAI backend implementation
Senior backend engineer exploring AI-native system design and production-grade GenAI integration within the Spring ecosystem.
MIT