RAG Support Engine

Privacy-first AI support backend for documentation search, grounded answers, and multi-turn support conversations.

Built to run natively in your environment, with optional OpenAI and Anthropic support when you want cloud LLM generation.

Why teams choose it

Native-first deployment — run the stack in your own environment with Spring Boot, PostgreSQL, pgvector, and the Haystack sidecar.
Optional cloud support — connect OpenAI or Anthropic APIs when you want cloud inference, without making the platform cloud-dependent.
State-of-the-art retrieval — pgvector-backed semantic search with HNSW indexing for fast, high-quality retrieval over your support corpus.
Privacy-first deep retrieval — retrieval, vector search, and conversational memory stay inside your stack; when confidence is low, the system can broaden retrieval scope before relying on optional external LLM providers.
Grounded support answers — responses are generated with source context, confidence signals, and groundedness checks.
Conversational memory — session-aware memory helps the system handle follow-up questions and multi-turn support flows.
Production-ready backend — stable REST APIs, API-key auth, streaming responses, document ingestion, and evaluation endpoints.

What it does

RAG Support Engine gives product and support teams a backend for:

answering documentation and support questions,
ingesting PDFs, markdown, and URLs,
retrieving relevant context from pgvector-backed knowledge stores,
maintaining conversation context across sessions,
streaming answers to client apps,
switching between native and cloud-backed inference strategies.

Core capabilities

Native AI support backend

The platform is designed to run as a self-hosted backend:

Spring Boot provides the client-facing API layer.
Haystack sidecar handles indexing, retrieval, query orchestration, streaming, and agent flows.
PostgreSQL + pgvector store metadata, vectors, and conversation memory.
Optional local LLM runtime can be enabled in hybrid deployments.

Advanced retrieval with pgvector

Retrieval is built on PostgreSQL + pgvector with vector indexing for semantic search.

pgvector extension enabled in the shared database
HNSW vector indexes for similarity search
embedding-based retrieval pipelines via Haystack
shared document and memory storage for grounded, contextual answers

Privacy-first deep retrieval

The system is designed to keep retrieval operations close to your data:

documents, vectors, and memory live in your environment,
retrieval can expand and refine search scope when confidence is low,
source selection, confidence scoring, and groundedness checks help reduce weak answers,
cloud LLM usage is optional and separate from the core retrieval layer.

OpenAI and Anthropic support

When you want managed cloud inference, the platform supports both:

OpenAI APIs
Anthropic APIs

This makes it easy to choose the right model strategy for cost, latency, quality, or compliance needs.

Conversational memory

The backend supports session-based conversational memory stored in PostgreSQL, enabling:

better follow-up question handling,
more coherent multi-turn support experiences,
retention and cleanup policies for operational control.

Client-ready API surface

Key API capabilities include:

/query — synchronous grounded question answering
/query/stream — streaming responses via SSE
/documents/upload — document ingestion
/documents/ingest-url — URL ingestion
/sources — source management
/evaluation/* — evaluation and audit-oriented endpoints

Architecture

Main components

src/main/java — Spring Boot API gateway, auth, controllers, evaluation services
haystack-sidecar/ — Python Haystack retrieval and generation service
src/main/resources/db/init.sql — shared database initialization with pgvector

Runtime flow

A client sends a support query to the Spring Boot API.
Spring Boot validates the API key and forwards retrieval/generation work to the sidecar.
The sidecar retrieves relevant context from PostgreSQL + pgvector.
The system scores confidence and groundedness, and can broaden retrieval when needed.
A grounded answer is returned with sources, with streaming available for real-time UX.

Deployment modes

Cloud-enabled mode

Use OpenAI or Anthropic API keys for cloud-backed generation:

docker compose up --build

Hybrid / native-first mode

Run with the hybrid profile to enable the local LLM path:

docker compose --profile hybrid up --build

API endpoint:

http://localhost:8080

Quick start

Copy and configure environment variables from .env.example.
Provide your API key and any optional provider keys.
Start the stack with Docker Compose.
Ingest documents or URLs.
Query the API from your support application, agent console, or internal tools.

Example value proposition

RAG Support Engine is a strong fit for organizations that want to:

keep support knowledge and retrieval infrastructure under their control,
avoid making cloud LLMs a hard dependency,
deliver grounded support answers with source traceability,
support regulated or privacy-sensitive environments,
combine native deployment with optional best-of-breed cloud models.

Testing

Manual/opt-in agentic fallback E2E suites are available:

Java: src/test/java/com/ragsupport/e2e/AgenticFallbackE2EHttpTest.java
Python: haystack-sidecar/tests/test_agentic_fallback_e2e_http.py

Prepare environment:

cp .env.e2e.example .env.e2e
set -a; source .env.e2e; set +a

Run Java E2E:

mvn -Pagentic-e2e test \
  -DAGENTIC_JAVA_BASE_URL="$AGENTIC_JAVA_BASE_URL" \
  -DAGENTIC_API_KEY="$AGENTIC_API_KEY" \
  -DAGENTIC_E2E_MODE="$AGENTIC_E2E_MODE"

Run Python E2E:

pytest -m agentic_e2e -q haystack-sidecar/tests/test_agentic_fallback_e2e_http.py

Summary

RAG Support Engine is a privacy-first, client-ready AI support backend that combines:

native deployment,
optional OpenAI and Anthropic cloud support,
pgvector-based retrieval,
confidence-aware deep retrieval behavior,
conversational memory,
grounded answer delivery for support workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.mvn		.mvn
design		design
haystack-sidecar		haystack-sidecar
knowledge-base		knowledge-base
plans		plans
requirements		requirements
scripts		scripts
src		src
.env.e2e.example		.env.e2e.example
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LOCAL_LATENCY_CONVERSATIONAL_RAG_FINDINGS.md		LOCAL_LATENCY_CONVERSATIONAL_RAG_FINDINGS.md
README.md		README.md
docker-compose.hybrid.yml		docker-compose.hybrid.yml
docker-compose.yml		docker-compose.yml
manual-test-questions.md		manual-test-questions.md
pom.xml		pom.xml
rag-support-engine.postman_collection.json		rag-support-engine.postman_collection.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Support Engine

Why teams choose it

What it does

Core capabilities

Native AI support backend

Advanced retrieval with pgvector

Privacy-first deep retrieval

OpenAI and Anthropic support

Conversational memory

Client-ready API surface

Architecture

Main components

Runtime flow

Deployment modes

Cloud-enabled mode

Hybrid / native-first mode

Quick start

Example value proposition

Testing

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Support Engine

Why teams choose it

What it does

Core capabilities

Native AI support backend

Advanced retrieval with pgvector

Privacy-first deep retrieval

OpenAI and Anthropic support

Conversational memory

Client-ready API surface

Architecture

Main components

Runtime flow

Deployment modes

Cloud-enabled mode

Hybrid / native-first mode

Quick start

Example value proposition

Testing

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages