Encompass RAG Assistant

A Retrieval-Augmented Generation (RAG) system for querying Encompass API documentation with natural language, featuring hybrid retrieval and multiple LLM support.

Unofficial. Not affiliated with ICE Mortgage Technology. A community-built retrieval index and RAG pipeline over the publicly available Encompass Developer Connect documentation, intended as a developer reference and for educational/research use. For canonical, up-to-date documentation, always defer to the official source: https://developer.icemortgagetechnology.com/.

Overview

The Encompass RAG Assistant is a tool that allows users to query Encompass API documentation using natural language. It leverages a hybrid retrieval system combining semantic and keyword search over the publicly available Encompass Developer Connect documentation and the Postman collection to provide contextual answers to API-related questions.

This application combines cutting-edge RAG techniques with a user-friendly interface to make Encompass API documentation more accessible and easier to navigate.

Features

Hybrid Retrieval System: Combines FAISS semantic search with BM25 keyword search, fused via Reciprocal Rank Fusion (RRF)
Multi-LLM Support: Choose between Ollama (qwen2.5-coder:7b) and Google Gemini (gemini-1.5-flash)
Jina v3 Embeddings: jinaai/jina-embeddings-v3 (1024-d) with task-specific prompts for retrieval
Postman Endpoint Gate: Token-overlap + score-ratio gate surfaces 0–3 relevant API endpoints separately from prose chunks
Two Data Sources: Crawled Encompass Developer Connect documentation + Postman collection
Environment Configuration: Full .env support for easy deployment and configuration
Enhanced FastAPI Backend: Robust API with health checks, configuration endpoints, and CORS support
Streamlit UI: Clean, intuitive interface with enhanced source display
Source Attribution: Detailed source tracking with metadata and full content access

Architecture

The system has a multi-component architecture:

Data Processing & Storage

Jina v3 Embeddings: jinaai/jina-embeddings-v3 (1024-d); task=retrieval.passage at ingest, retrieval.query at query time
FAISS-IP Vector Store: Inner-product similarity search over Jina vectors
BM25 Index: rank_bm25.BM25Okapi over the same chunks for lexical matching
Metadata Store: Per-chunk title, breadcrumb, kind, and source URL

Hybrid Retrieval System

HybridRetriever: FAISS semantic + BM25 lexical over the prose chunk corpus
RRF Fusion: Reciprocal Rank Fusion (k=60) merges semantic and lexical rankings into one top-k list
Postman Endpoint Gate: Separate BM25 over Postman entries → token-overlap filter (≥0.3) → score-ratio split returns 0–3 endpoints

LLM Integration

Ollama Support: Local LLM hosting with qwen2.5-coder:7b
Google Gemini: Cloud-based gemini-1.5-flash model option
Direct Prompt Construction: ## Documentation and ## Relevant API endpoint(s) are built as separate prompt sections (no RetrievalQA chain)

API & Interface

FastAPI Backend: Production-ready API with comprehensive endpoints
Environment Configuration: Full .env support for deployment flexibility
Streamlit UI: Enhanced interface with categorized source display
CORS Support: Ready for web application integration

flowchart TD
    subgraph Ingest["Build pipeline (offline)"]
        direction TB
        Crawler["scripts/crawler/<br/>(skips auth-gated)"] --> JSONL["developer_connect.jsonl"]
        JSONL --> FDC["filter → dedupe → chunk<br/>page-wise ≤ 2K, else 1500/200"]
        FDC --> Embed["Jina v3 embed<br/>task=retrieval.passage"]
        Postman["Encompass_Developer_Connect_<br/>postman_collection.json"]
    end

    Embed --> FAISS[("FAISS-IP<br/>jsonl_faiss/")]
    Embed --> BM25C[("BM25 over chunks<br/>jsonl_bm25.pkl")]
    Postman --> PEntries[("Postman BM25 + entries<br/>postman_*.pkl")]

    HF[("HF dataset (fallback)<br/>Richie-rk/encompass-developer-connect-index")] -. "snapshot_download<br/>if local missing" .-> FAISS

    subgraph Runtime["Query pipeline (runtime)"]
        direction TB
        Q["User query"] --> QE["Embed via Jina v3<br/>task=retrieval.query"]
        QE --> Sem["FAISS top-N"]
        Q --> Lex["BM25 top-N over chunks"]
        Sem --> RRF["RRF fuse → top-K chunks"]
        Lex --> RRF
        Q --> PG["Postman BM25 +<br/>token-overlap gate"]
        RRF --> Prompt["Prompt builder<br/>## Documentation + ## Endpoints"]
        PG --> Prompt
        Prompt --> LLM["LLM<br/>Ollama qwen2.5-coder · Gemini 1.5 Flash"]
        LLM --> Ans["Answer + sources + endpoints"]
    end

    FAISS --> Sem
    BM25C --> Lex
    PEntries --> PG

Installation

Prerequisites

Python 3.11+ (Python 3.13 compatible)
Ollama (for local LLM hosting) OR Google Gemini API key
CUDA-compatible GPU recommended for embeddings
16GB+ RAM recommended for optimal performance
Optional: uv for faster package installation

Setup

Clone the repository:

git clone https://github.com/richie-rk/encompass-rag-assistant.git
cd encompass-rag-assistant

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies — pick the path that matches your hardware. The only thing that differs is which torch build gets installed; the application code dispatches to CPU or GPU at runtime via JINA_DEVICE in .env.

Option A — uv with pyproject.toml (recommended)

CPU only (works everywhere; embedding pipeline runs on CPU):
```
uv pip install -e .
```
NVIDIA GPU (CUDA 12.1) — pulls CUDA-enabled torch + faiss-gpu:
```
uv pip install -e ".[gpu]"
```
With dev extras:
```
uv pip install -e ".[dev]"
uv pip install -e ".[dev,gpu]"   # GPU + dev tools
```
Everything:
```
uv pip install -e ".[all]"
```
Option B — plain pip (no uv)

CPU:
```
pip install -e .
# or:  pip install -r requirements.txt
```
NVIDIA GPU (CUDA 12.1) — pip can't auto-route torch to the PyTorch index from pyproject.toml, so the GPU build is a two-step:
```
pip install -e .
pip uninstall -y torch
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install faiss-gpu>=1.7.4
```
💡 Which method to choose?
- uv (Option A): significantly faster installs; auto-routes torch to the right index when you pick [gpu]. Recommended.
- pip (Option B): works without extra tooling; needs the manual torch swap above for GPU.
💡 Hardware setup After install, set JINA_DEVICE=cuda (or mps on Apple silicon) in .env to actually use the GPU at embedding time. Default is cpu.

LLM Setup (Choose one):

Option A: Ollama (Local)

# Install Ollama from https://ollama.ai/
ollama pull qwen2.5-coder:7b

Option B: Google Gemini (Cloud)

# Get API key from https://ai.google.dev/
# Set in .env file: GEMINI_API_KEY=your_key_here

Environment Configuration:

# Create .env file with your settings
cp .env.example .env
# Edit .env with your preferred configuration

Crawl the Developer Connect documentation:

# From the repo root — fetches all guide / API-reference / changelog pages
# for /developer-connect/ and writes scripts/data/developer_connect.jsonl
python -m scripts.crawler

The crawler enumerates the page frontier from the ReadMe sidebar (no BFS) and extracts each page's authored Markdown, OpenAPI spec, and metadata directly from the embedded ssr-props JSON. First run takes ~5–7 minutes (216 pages at 0.5 s delay); HTML is cached to scripts/data/cache/, so re-runs are instant.

Useful flags:

python -m scripts.crawler --dry-run              # enumerate frontier, no fetches
python -m scripts.crawler --limit 10             # debug: only first 10 pages
python -m scripts.crawler --no-cache             # ignore cache, refetch all
python -m scripts.crawler --include-hidden       # include hidden:true pages
python -m scripts.crawler --delay 1.0            # slower pace for politeness
python -m scripts.crawler --help                 # full option list

Outputs:

scripts/data/developer_connect.jsonl — one record per page (slug, title, URL, breadcrumb, body_md, OpenAPI oas, updated_at, …)
scripts/data/developer_connect_manifest.jsonl — per-URL fetch log with status, cache-hit, body hash, and any error reason

Create the vector store:
```
python scripts/create_vector_store.py
```
💡 Skip steps 6–7 if you just want to run the API. If ./vector_store/ is missing on first boot, the API automatically fetches the published index from Hugging Face into the HF cache (~/.cache/huggingface/hub/...) and uses it. Configurable via VECTOR_STORE_HF_REPO and VECTOR_STORE_HF_REVISION in .env (defaults point at Richie-rk/encompass-developer-connect-index on main). Build locally only when you want to re-ingest your own crawl.

Usage

Quick Start

Start the FastAPI backend:

cd rag_docs/src
python -m rag_docs.rag_app

In a separate terminal, start the Streamlit UI:

cd rag_docs/src
streamlit run rag_docs/web_ui.py

Access the application:
- Streamlit UI: http://localhost:8501
- FastAPI docs: http://localhost:8000/docs
- Health check: http://localhost:8000/api/health

API Endpoints

POST /api/query - Submit questions about Encompass API
GET /api/health - Check system status
GET /api/config - View current configuration

Configuration

The application is fully configurable through environment variables or a .env file:

Environment Variables

# LLM Configuration
OLLAMA_MODEL=qwen2.5-coder:7b
USE_GEMINI=false
GEMINI_API_KEY=your_gemini_api_key_here
TEMPERATURE=0.1

# Vector Store Configuration
VECTOR_STORE_PATH=vector_store

# Embeddings (Jina v3)
JINA_BACKEND=local              # `local` (sentence-transformers) or `api`
JINA_API_KEY=                   # required when JINA_BACKEND=api
JINA_MODEL=jinaai/jina-embeddings-v3
JINA_DEVICE=cpu                 # `cpu`, `cuda`, or `mps`

# API Configuration
HOST=0.0.0.0
PORT=8000

# Logging Configuration
LOG_LEVEL=INFO
LOG_FORMAT=%(asctime)s - %(levelname)s - [%(filename)s:%(lineno)d] - %(message)s
LOG_FILE=logs/app.log
LOG_MAX_SIZE=10485760
LOG_BACKUP_COUNT=5

# Retrieval Configuration
HYBRID_ALPHA=0.6  # Weight for semantic vs keyword search
RETRIEVAL_K=5     # Number of documents to retrieve

Example .env File

# Choose your LLM provider
USE_GEMINI=false
OLLAMA_MODEL=qwen2.5-coder:7b
# GEMINI_API_KEY=your_key_here

# Vector store settings
VECTOR_STORE_PATH=vector_store
TEMPERATURE=0.1

# API settings
HOST=0.0.0.0
PORT=8000

# Logging
LOG_LEVEL=INFO

Building the Package

# Build wheel and source distribution
python -m build

# Or with uv (faster)
uv build

Built with ❤️ for Encompass API users

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
rag_docs		rag_docs
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Flowchart.png		Flowchart.png
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Encompass RAG Assistant

Table of Contents

Overview

Features

Architecture

Data Processing & Storage

Hybrid Retrieval System

LLM Integration

API & Interface

Installation

Prerequisites

Setup

Usage

Quick Start

API Endpoints

Configuration

Environment Variables

Example .env File

Building the Package

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Encompass RAG Assistant

Table of Contents

Overview

Features

Architecture

Data Processing & Storage

Hybrid Retrieval System

LLM Integration

API & Interface

Installation

Prerequisites

Setup

Usage

Quick Start

API Endpoints

Configuration

Environment Variables

Example .env File

Building the Package

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages