Skip to content

richie-rk/EncompassRest-RAG-Assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Encompass RAG Assistant

License: MIT Dataset: 🤗 Hugging Face

A Retrieval-Augmented Generation (RAG) system for querying Encompass API documentation with natural language, featuring hybrid retrieval and multiple LLM support.

Unofficial. Not affiliated with ICE Mortgage Technology. A community-built retrieval index and RAG pipeline over the publicly available Encompass Developer Connect documentation, intended as a developer reference and for educational/research use. For canonical, up-to-date documentation, always defer to the official source: https://developer.icemortgagetechnology.com/.

Table of Contents

Overview

The Encompass RAG Assistant is a tool that allows users to query Encompass API documentation using natural language. It leverages a hybrid retrieval system combining semantic and keyword search over the publicly available Encompass Developer Connect documentation and the Postman collection to provide contextual answers to API-related questions.

This application combines cutting-edge RAG techniques with a user-friendly interface to make Encompass API documentation more accessible and easier to navigate.

Features

  • Hybrid Retrieval System: Combines FAISS semantic search with BM25 keyword search, fused via Reciprocal Rank Fusion (RRF)
  • Multi-LLM Support: Choose between Ollama (qwen2.5-coder:7b) and Google Gemini (gemini-1.5-flash)
  • Jina v3 Embeddings: jinaai/jina-embeddings-v3 (1024-d) with task-specific prompts for retrieval
  • Postman Endpoint Gate: Token-overlap + score-ratio gate surfaces 0–3 relevant API endpoints separately from prose chunks
  • Two Data Sources: Crawled Encompass Developer Connect documentation + Postman collection
  • Environment Configuration: Full .env support for easy deployment and configuration
  • Enhanced FastAPI Backend: Robust API with health checks, configuration endpoints, and CORS support
  • Streamlit UI: Clean, intuitive interface with enhanced source display
  • Source Attribution: Detailed source tracking with metadata and full content access

Architecture

The system has a multi-component architecture:

Data Processing & Storage

  • Jina v3 Embeddings: jinaai/jina-embeddings-v3 (1024-d); task=retrieval.passage at ingest, retrieval.query at query time
  • FAISS-IP Vector Store: Inner-product similarity search over Jina vectors
  • BM25 Index: rank_bm25.BM25Okapi over the same chunks for lexical matching
  • Metadata Store: Per-chunk title, breadcrumb, kind, and source URL

Hybrid Retrieval System

  • HybridRetriever: FAISS semantic + BM25 lexical over the prose chunk corpus
  • RRF Fusion: Reciprocal Rank Fusion (k=60) merges semantic and lexical rankings into one top-k list
  • Postman Endpoint Gate: Separate BM25 over Postman entries → token-overlap filter (≥0.3) → score-ratio split returns 0–3 endpoints

LLM Integration

  • Ollama Support: Local LLM hosting with qwen2.5-coder:7b
  • Google Gemini: Cloud-based gemini-1.5-flash model option
  • Direct Prompt Construction: ## Documentation and ## Relevant API endpoint(s) are built as separate prompt sections (no RetrievalQA chain)

API & Interface

  • FastAPI Backend: Production-ready API with comprehensive endpoints
  • Environment Configuration: Full .env support for deployment flexibility
  • Streamlit UI: Enhanced interface with categorized source display
  • CORS Support: Ready for web application integration
flowchart TD
    subgraph Ingest["Build pipeline (offline)"]
        direction TB
        Crawler["scripts/crawler/<br/>(skips auth-gated)"] --> JSONL["developer_connect.jsonl"]
        JSONL --> FDC["filter → dedupe → chunk<br/>page-wise ≤ 2K, else 1500/200"]
        FDC --> Embed["Jina v3 embed<br/>task=retrieval.passage"]
        Postman["Encompass_Developer_Connect_<br/>postman_collection.json"]
    end

    Embed --> FAISS[("FAISS-IP<br/>jsonl_faiss/")]
    Embed --> BM25C[("BM25 over chunks<br/>jsonl_bm25.pkl")]
    Postman --> PEntries[("Postman BM25 + entries<br/>postman_*.pkl")]

    HF[("HF dataset (fallback)<br/>Richie-rk/encompass-developer-connect-index")] -. "snapshot_download<br/>if local missing" .-> FAISS

    subgraph Runtime["Query pipeline (runtime)"]
        direction TB
        Q["User query"] --> QE["Embed via Jina v3<br/>task=retrieval.query"]
        QE --> Sem["FAISS top-N"]
        Q --> Lex["BM25 top-N over chunks"]
        Sem --> RRF["RRF fuse → top-K chunks"]
        Lex --> RRF
        Q --> PG["Postman BM25 +<br/>token-overlap gate"]
        RRF --> Prompt["Prompt builder<br/>## Documentation + ## Endpoints"]
        PG --> Prompt
        Prompt --> LLM["LLM<br/>Ollama qwen2.5-coder · Gemini 1.5 Flash"]
        LLM --> Ans["Answer + sources + endpoints"]
    end

    FAISS --> Sem
    BM25C --> Lex
    PEntries --> PG
Loading

Installation

Prerequisites

  • Python 3.11+ (Python 3.13 compatible)
  • Ollama (for local LLM hosting) OR Google Gemini API key
  • CUDA-compatible GPU recommended for embeddings
  • 16GB+ RAM recommended for optimal performance
  • Optional: uv for faster package installation

Setup

  1. Clone the repository:

    git clone https://github.com/richie-rk/encompass-rag-assistant.git
    cd encompass-rag-assistant
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies — pick the path that matches your hardware. The only thing that differs is which torch build gets installed; the application code dispatches to CPU or GPU at runtime via JINA_DEVICE in .env.

    Option A — uv with pyproject.toml (recommended)

    CPU only (works everywhere; embedding pipeline runs on CPU):

    uv pip install -e .

    NVIDIA GPU (CUDA 12.1) — pulls CUDA-enabled torch + faiss-gpu:

    uv pip install -e ".[gpu]"

    With dev extras:

    uv pip install -e ".[dev]"
    uv pip install -e ".[dev,gpu]"   # GPU + dev tools

    Everything:

    uv pip install -e ".[all]"

    Option B — plain pip (no uv)

    CPU:

    pip install -e .
    # or:  pip install -r requirements.txt

    NVIDIA GPU (CUDA 12.1) — pip can't auto-route torch to the PyTorch index from pyproject.toml, so the GPU build is a two-step:

    pip install -e .
    pip uninstall -y torch
    pip install torch --index-url https://download.pytorch.org/whl/cu121
    pip install faiss-gpu>=1.7.4

    💡 Which method to choose?

    • uv (Option A): significantly faster installs; auto-routes torch to the right index when you pick [gpu]. Recommended.
    • pip (Option B): works without extra tooling; needs the manual torch swap above for GPU.

    💡 Hardware setup After install, set JINA_DEVICE=cuda (or mps on Apple silicon) in .env to actually use the GPU at embedding time. Default is cpu.

  4. LLM Setup (Choose one):

    Option A: Ollama (Local)

    # Install Ollama from https://ollama.ai/
    ollama pull qwen2.5-coder:7b

    Option B: Google Gemini (Cloud)

    # Get API key from https://ai.google.dev/
    # Set in .env file: GEMINI_API_KEY=your_key_here
  5. Environment Configuration:

    # Create .env file with your settings
    cp .env.example .env
    # Edit .env with your preferred configuration
  6. Crawl the Developer Connect documentation:

    # From the repo root — fetches all guide / API-reference / changelog pages
    # for /developer-connect/ and writes scripts/data/developer_connect.jsonl
    python -m scripts.crawler

    The crawler enumerates the page frontier from the ReadMe sidebar (no BFS) and extracts each page's authored Markdown, OpenAPI spec, and metadata directly from the embedded ssr-props JSON. First run takes ~5–7 minutes (216 pages at 0.5 s delay); HTML is cached to scripts/data/cache/, so re-runs are instant.

    Useful flags:

    python -m scripts.crawler --dry-run              # enumerate frontier, no fetches
    python -m scripts.crawler --limit 10             # debug: only first 10 pages
    python -m scripts.crawler --no-cache             # ignore cache, refetch all
    python -m scripts.crawler --include-hidden       # include hidden:true pages
    python -m scripts.crawler --delay 1.0            # slower pace for politeness
    python -m scripts.crawler --help                 # full option list

    Outputs:

    • scripts/data/developer_connect.jsonl — one record per page (slug, title, URL, breadcrumb, body_md, OpenAPI oas, updated_at, …)
    • scripts/data/developer_connect_manifest.jsonl — per-URL fetch log with status, cache-hit, body hash, and any error reason
  7. Create the vector store:

    python scripts/create_vector_store.py

    💡 Skip steps 6–7 if you just want to run the API. If ./vector_store/ is missing on first boot, the API automatically fetches the published index from Hugging Face into the HF cache (~/.cache/huggingface/hub/...) and uses it. Configurable via VECTOR_STORE_HF_REPO and VECTOR_STORE_HF_REVISION in .env (defaults point at Richie-rk/encompass-developer-connect-index on main). Build locally only when you want to re-ingest your own crawl.

Usage

Quick Start

  1. Start the FastAPI backend:

    cd rag_docs/src
    python -m rag_docs.rag_app
  2. In a separate terminal, start the Streamlit UI:

    cd rag_docs/src
    streamlit run rag_docs/web_ui.py
  3. Access the application:

API Endpoints

  • POST /api/query - Submit questions about Encompass API
  • GET /api/health - Check system status
  • GET /api/config - View current configuration

Configuration

The application is fully configurable through environment variables or a .env file:

Environment Variables

# LLM Configuration
OLLAMA_MODEL=qwen2.5-coder:7b
USE_GEMINI=false
GEMINI_API_KEY=your_gemini_api_key_here
TEMPERATURE=0.1

# Vector Store Configuration
VECTOR_STORE_PATH=vector_store

# Embeddings (Jina v3)
JINA_BACKEND=local              # `local` (sentence-transformers) or `api`
JINA_API_KEY=                   # required when JINA_BACKEND=api
JINA_MODEL=jinaai/jina-embeddings-v3
JINA_DEVICE=cpu                 # `cpu`, `cuda`, or `mps`

# API Configuration
HOST=0.0.0.0
PORT=8000

# Logging Configuration
LOG_LEVEL=INFO
LOG_FORMAT=%(asctime)s - %(levelname)s - [%(filename)s:%(lineno)d] - %(message)s
LOG_FILE=logs/app.log
LOG_MAX_SIZE=10485760
LOG_BACKUP_COUNT=5

# Retrieval Configuration
HYBRID_ALPHA=0.6  # Weight for semantic vs keyword search
RETRIEVAL_K=5     # Number of documents to retrieve

Example .env File

# Choose your LLM provider
USE_GEMINI=false
OLLAMA_MODEL=qwen2.5-coder:7b
# GEMINI_API_KEY=your_key_here

# Vector store settings
VECTOR_STORE_PATH=vector_store
TEMPERATURE=0.1

# API settings
HOST=0.0.0.0
PORT=8000

# Logging
LOG_LEVEL=INFO

Building the Package

# Build wheel and source distribution
python -m build

# Or with uv (faster)
uv build

Built with ❤️ for Encompass API users

About

A Retrieval-Augmented Generation (RAG) system querying ICE mortgage Encompass API documentation with natural language, featuring hybrid retrieval and multiple LLM support.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages