Skip to content

sdeonvacation/AI-support-engine

Repository files navigation

RAG Support Engine

Privacy-first AI support backend for documentation search, grounded answers, and multi-turn support conversations.

Built to run natively in your environment, with optional OpenAI and Anthropic support when you want cloud LLM generation.

Why teams choose it

  • Native-first deployment — run the stack in your own environment with Spring Boot, PostgreSQL, pgvector, and the Haystack sidecar.
  • Optional cloud support — connect OpenAI or Anthropic APIs when you want cloud inference, without making the platform cloud-dependent.
  • State-of-the-art retrieval — pgvector-backed semantic search with HNSW indexing for fast, high-quality retrieval over your support corpus.
  • Privacy-first deep retrieval — retrieval, vector search, and conversational memory stay inside your stack; when confidence is low, the system can broaden retrieval scope before relying on optional external LLM providers.
  • Grounded support answers — responses are generated with source context, confidence signals, and groundedness checks.
  • Conversational memory — session-aware memory helps the system handle follow-up questions and multi-turn support flows.
  • Production-ready backend — stable REST APIs, API-key auth, streaming responses, document ingestion, and evaluation endpoints.

What it does

RAG Support Engine gives product and support teams a backend for:

  • answering documentation and support questions,
  • ingesting PDFs, markdown, and URLs,
  • retrieving relevant context from pgvector-backed knowledge stores,
  • maintaining conversation context across sessions,
  • streaming answers to client apps,
  • switching between native and cloud-backed inference strategies.

Core capabilities

Native AI support backend

The platform is designed to run as a self-hosted backend:

  • Spring Boot provides the client-facing API layer.
  • Haystack sidecar handles indexing, retrieval, query orchestration, streaming, and agent flows.
  • PostgreSQL + pgvector store metadata, vectors, and conversation memory.
  • Optional local LLM runtime can be enabled in hybrid deployments.

Advanced retrieval with pgvector

Retrieval is built on PostgreSQL + pgvector with vector indexing for semantic search.

  • pgvector extension enabled in the shared database
  • HNSW vector indexes for similarity search
  • embedding-based retrieval pipelines via Haystack
  • shared document and memory storage for grounded, contextual answers

Privacy-first deep retrieval

The system is designed to keep retrieval operations close to your data:

  • documents, vectors, and memory live in your environment,
  • retrieval can expand and refine search scope when confidence is low,
  • source selection, confidence scoring, and groundedness checks help reduce weak answers,
  • cloud LLM usage is optional and separate from the core retrieval layer.

OpenAI and Anthropic support

When you want managed cloud inference, the platform supports both:

  • OpenAI APIs
  • Anthropic APIs

This makes it easy to choose the right model strategy for cost, latency, quality, or compliance needs.

Conversational memory

The backend supports session-based conversational memory stored in PostgreSQL, enabling:

  • better follow-up question handling,
  • more coherent multi-turn support experiences,
  • retention and cleanup policies for operational control.

Client-ready API surface

Key API capabilities include:

  • /query — synchronous grounded question answering
  • /query/stream — streaming responses via SSE
  • /documents/upload — document ingestion
  • /documents/ingest-url — URL ingestion
  • /sources — source management
  • /evaluation/* — evaluation and audit-oriented endpoints

Architecture

Main components

  • src/main/java — Spring Boot API gateway, auth, controllers, evaluation services
  • haystack-sidecar/ — Python Haystack retrieval and generation service
  • src/main/resources/db/init.sql — shared database initialization with pgvector

Runtime flow

  1. A client sends a support query to the Spring Boot API.
  2. Spring Boot validates the API key and forwards retrieval/generation work to the sidecar.
  3. The sidecar retrieves relevant context from PostgreSQL + pgvector.
  4. The system scores confidence and groundedness, and can broaden retrieval when needed.
  5. A grounded answer is returned with sources, with streaming available for real-time UX.

Deployment modes

Cloud-enabled mode

Use OpenAI or Anthropic API keys for cloud-backed generation:

docker compose up --build

Hybrid / native-first mode

Run with the hybrid profile to enable the local LLM path:

docker compose --profile hybrid up --build

API endpoint:

http://localhost:8080

Quick start

  1. Copy and configure environment variables from .env.example.
  2. Provide your API key and any optional provider keys.
  3. Start the stack with Docker Compose.
  4. Ingest documents or URLs.
  5. Query the API from your support application, agent console, or internal tools.

Example value proposition

RAG Support Engine is a strong fit for organizations that want to:

  • keep support knowledge and retrieval infrastructure under their control,
  • avoid making cloud LLMs a hard dependency,
  • deliver grounded support answers with source traceability,
  • support regulated or privacy-sensitive environments,
  • combine native deployment with optional best-of-breed cloud models.

Testing

Manual/opt-in agentic fallback E2E suites are available:

  • Java: src/test/java/com/ragsupport/e2e/AgenticFallbackE2EHttpTest.java
  • Python: haystack-sidecar/tests/test_agentic_fallback_e2e_http.py

Prepare environment:

cp .env.e2e.example .env.e2e
set -a; source .env.e2e; set +a

Run Java E2E:

mvn -Pagentic-e2e test \
  -DAGENTIC_JAVA_BASE_URL="$AGENTIC_JAVA_BASE_URL" \
  -DAGENTIC_API_KEY="$AGENTIC_API_KEY" \
  -DAGENTIC_E2E_MODE="$AGENTIC_E2E_MODE"

Run Python E2E:

pytest -m agentic_e2e -q haystack-sidecar/tests/test_agentic_fallback_e2e_http.py

Summary

RAG Support Engine is a privacy-first, client-ready AI support backend that combines:

  • native deployment,
  • optional OpenAI and Anthropic cloud support,
  • pgvector-based retrieval,
  • confidence-aware deep retrieval behavior,
  • conversational memory,
  • grounded answer delivery for support workflows.

About

RAG based AI support engine with advanced capabilities

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors