Privacy-first AI support backend for documentation search, grounded answers, and multi-turn support conversations.
Built to run natively in your environment, with optional OpenAI and Anthropic support when you want cloud LLM generation.
- Native-first deployment — run the stack in your own environment with Spring Boot, PostgreSQL, pgvector, and the Haystack sidecar.
- Optional cloud support — connect OpenAI or Anthropic APIs when you want cloud inference, without making the platform cloud-dependent.
- State-of-the-art retrieval — pgvector-backed semantic search with HNSW indexing for fast, high-quality retrieval over your support corpus.
- Privacy-first deep retrieval — retrieval, vector search, and conversational memory stay inside your stack; when confidence is low, the system can broaden retrieval scope before relying on optional external LLM providers.
- Grounded support answers — responses are generated with source context, confidence signals, and groundedness checks.
- Conversational memory — session-aware memory helps the system handle follow-up questions and multi-turn support flows.
- Production-ready backend — stable REST APIs, API-key auth, streaming responses, document ingestion, and evaluation endpoints.
RAG Support Engine gives product and support teams a backend for:
- answering documentation and support questions,
- ingesting PDFs, markdown, and URLs,
- retrieving relevant context from pgvector-backed knowledge stores,
- maintaining conversation context across sessions,
- streaming answers to client apps,
- switching between native and cloud-backed inference strategies.
The platform is designed to run as a self-hosted backend:
- Spring Boot provides the client-facing API layer.
- Haystack sidecar handles indexing, retrieval, query orchestration, streaming, and agent flows.
- PostgreSQL + pgvector store metadata, vectors, and conversation memory.
- Optional local LLM runtime can be enabled in hybrid deployments.
Retrieval is built on PostgreSQL + pgvector with vector indexing for semantic search.
- pgvector extension enabled in the shared database
- HNSW vector indexes for similarity search
- embedding-based retrieval pipelines via Haystack
- shared document and memory storage for grounded, contextual answers
The system is designed to keep retrieval operations close to your data:
- documents, vectors, and memory live in your environment,
- retrieval can expand and refine search scope when confidence is low,
- source selection, confidence scoring, and groundedness checks help reduce weak answers,
- cloud LLM usage is optional and separate from the core retrieval layer.
When you want managed cloud inference, the platform supports both:
- OpenAI APIs
- Anthropic APIs
This makes it easy to choose the right model strategy for cost, latency, quality, or compliance needs.
The backend supports session-based conversational memory stored in PostgreSQL, enabling:
- better follow-up question handling,
- more coherent multi-turn support experiences,
- retention and cleanup policies for operational control.
Key API capabilities include:
/query— synchronous grounded question answering/query/stream— streaming responses via SSE/documents/upload— document ingestion/documents/ingest-url— URL ingestion/sources— source management/evaluation/*— evaluation and audit-oriented endpoints
src/main/java— Spring Boot API gateway, auth, controllers, evaluation serviceshaystack-sidecar/— Python Haystack retrieval and generation servicesrc/main/resources/db/init.sql— shared database initialization with pgvector
- A client sends a support query to the Spring Boot API.
- Spring Boot validates the API key and forwards retrieval/generation work to the sidecar.
- The sidecar retrieves relevant context from PostgreSQL + pgvector.
- The system scores confidence and groundedness, and can broaden retrieval when needed.
- A grounded answer is returned with sources, with streaming available for real-time UX.
Use OpenAI or Anthropic API keys for cloud-backed generation:
docker compose up --buildRun with the hybrid profile to enable the local LLM path:
docker compose --profile hybrid up --buildAPI endpoint:
http://localhost:8080
- Copy and configure environment variables from
.env.example. - Provide your API key and any optional provider keys.
- Start the stack with Docker Compose.
- Ingest documents or URLs.
- Query the API from your support application, agent console, or internal tools.
RAG Support Engine is a strong fit for organizations that want to:
- keep support knowledge and retrieval infrastructure under their control,
- avoid making cloud LLMs a hard dependency,
- deliver grounded support answers with source traceability,
- support regulated or privacy-sensitive environments,
- combine native deployment with optional best-of-breed cloud models.
Manual/opt-in agentic fallback E2E suites are available:
- Java:
src/test/java/com/ragsupport/e2e/AgenticFallbackE2EHttpTest.java - Python:
haystack-sidecar/tests/test_agentic_fallback_e2e_http.py
Prepare environment:
cp .env.e2e.example .env.e2e
set -a; source .env.e2e; set +aRun Java E2E:
mvn -Pagentic-e2e test \
-DAGENTIC_JAVA_BASE_URL="$AGENTIC_JAVA_BASE_URL" \
-DAGENTIC_API_KEY="$AGENTIC_API_KEY" \
-DAGENTIC_E2E_MODE="$AGENTIC_E2E_MODE"Run Python E2E:
pytest -m agentic_e2e -q haystack-sidecar/tests/test_agentic_fallback_e2e_http.pyRAG Support Engine is a privacy-first, client-ready AI support backend that combines:
- native deployment,
- optional OpenAI and Anthropic cloud support,
- pgvector-based retrieval,
- confidence-aware deep retrieval behavior,
- conversational memory,
- grounded answer delivery for support workflows.