Skip to content

vaskoyudha/omnibus-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Omnibus Legal Compass, AI-powered Indonesian legal assistant

omnibus-rag

Production-grade Indonesian Legal RAG framework
Hybrid retrieval · Grounding verification · Confidence scoring · Knowledge graph · Eval toolkit

PyPI Python License: MIT CI Tests Code style: ruff


Live Demo · Module Reference · Examples · Report Bug · Request Feature




Live Demo

The RAG components in this library power Omnibus Legal Compass, a production Indonesian legal AI assistant covering 44 indexed regulations and 11,969+ document segments.


Omnibus Legal Compass, Chat Interface

The hybrid retrieval + grounding pipeline in action. Every answer cites the exact legal article it draws from.




Overview

The omnibus-rag library was extracted and open-sourced from Omnibus Legal Compass, a production-grade Indonesian legal AI assistant. This framework provides a collection of modular components specifically designed for building high-reliability Retrieval-Augmented Generation (RAG) systems in the Indonesian legal domain. It includes specialized support for handling various regulation types, including Undang-Undang (UU), Peraturan Pemerintah (PP), Peraturan Presiden (Perpres), and Peraturan Menteri (Permen).


The package remains provider-agnostic, supporting integrations with Qdrant, NVIDIA NIM, and Jina AI. It also works seamlessly with local sentence-transformers for offline or on-premise deployments. Key features include a hybrid retrieval engine that fuses keyword and vector search, a grounding verifier to detect hallucinations, and a multi-factor confidence-scoring system to gate responses.


Building legal AI requires more than simple vector search. This library provides the specialized NLP tools needed for Indonesian legal text, such as abbreviation expansion and hierarchical relationship mapping. By using a knowledge graph to track how laws relate to each other, omnibus-rag enables more sophisticated retrieval strategies that respect the structure of the legal system. This approach ensures that the AI doesn't just find relevant text but understands the context of the regulations it's citing.




Features

  • Hybrid Retrieval: Combines BM25 sparse search with dense vector retrieval using Reciprocal Rank Fusion (RRF). It supports optional CrossEncoder reranking for improved precision in top-K results, ensuring the most relevant legal context is always prioritized.

  • Indonesian NLP: Features a domain-specific tokenizer that handles stopword removal, legal abbreviation expansion (e.g., UU, PP, Perpres, Permen), and synonym expansion for common Indonesian legal terminology.

  • Grounding Verifier: Implements an LLM-as-judge protocol to verify whether generated claims are actually supported by retrieved documents. This provides an extra layer of safety against model hallucinations in sensitive legal contexts.

  • Confidence Scoring: Calculates a multi-factor confidence score based on retrieval similarity, document authority, citation coverage, and query complexity. It helps identify when a query is outside the system's knowledge base.

  • Refusal Gate: Includes a configurable gate that automatically issues a refusal message when confidence falls below a threshold. This prevents the system from providing potentially incorrect or misleading legal information.

  • Knowledge Graph: Uses Pydantic and NetworkX to represent the Indonesian legal hierarchy. This allows the system to understand the relationships between regulations, chapters, and articles (e.g., UU to Bab to Pasal relationships).

  • Eval Toolkit: Provides a suite of offline evaluation tools, including implementations for MRR, Recall@K, and NDCG@K. These are compatible with custom GoldenDataset benchmarks for regression testing retrieval pipelines.




How It Works: 5-Stage Verification Pipeline


5-Stage Verification Pipeline

Every query passes through five sequential stages before an answer is returned:


Stage Name What It Does
01 Hybrid Search Fuses BM25 keyword matching + dense vector retrieval via Reciprocal Rank Fusion (k=60)
02 CrossEncoder Reranking Neural reranker re-scores top-K candidates for maximum precision
03 LLM Generation Answer generated with forced citation to specific Pasal/UU articles
04 LLM-as-Judge A second LLM call independently verifies every claim against the cited sources
05 Confidence Gate Response is refused if confidence score falls below the configured threshold (default 0.30)

The "refuse if unsure" design is intentional. In legal contexts, a confident wrong answer is worse than no answer. This rigorous pipeline helps maintain high accuracy and trustworthiness in a domain where errors have real-world consequences.




Installation

You can install the core package or include optional dependencies for specific embedding providers or vector databases.


# Core (metrics, confidence, grounding, knowledge graph)
pip install omnibus-rag

# With local embeddings (sentence-transformers)
pip install "omnibus-rag[embeddings]"

# With Qdrant vector DB support
pip install "omnibus-rag[qdrant]"

# With NVIDIA NIM embedding integration
pip install "omnibus-rag[nvidia]"

# With Jina AI embedding integration
pip install "omnibus-rag[jina]"

# Install all optional dependencies
pip install "omnibus-rag[all]"



Quick Start

1. Confidence scoring + refusal gate

Use the ConfidenceScorer to evaluate the quality of your retrieval results. The RefusalGate can then decide whether the confidence is high enough to proceed with answer generation.


from omnibus_rag.confidence import ConfidenceScorer, RefusalGate
from omnibus_rag.retrieval import SearchResult

# Initialize the scorer and gate
scorer = ConfidenceScorer()
gate = RefusalGate(threshold=0.30)

# Example search results from a retriever
results = [
    SearchResult(
        id=1, 
        text="Pasal 33 UU No. 11 Tahun 2020...", 
        citation="UU 11/2020",
        citation_id="uu_11_2020", 
        score=0.87, 
        metadata={}
    ),
]

# Calculate confidence for a specific query
# This score takes into account similarity, document authority, and more
confidence = scorer.score(results, query="hak cipta software")

# Check if the system should refuse to answer
# If confidence is too low, we avoid giving potentially wrong advice
refusal = gate.check(confidence)
if refusal:
    # If confidence is below threshold, print the refusal message
    print(refusal)  # Returns a professional Indonesian refusal message
else:
    # Otherwise, proceed with the confidence label and score
    print(f"Confidence: {confidence.label} ({confidence.numeric:.2f})")

2. Knowledge graph

The knowledge graph allows you to model the structure of legal documents and their interconnections. This is useful for navigating the hierarchy of Indonesian regulations.


from omnibus_rag.knowledge_graph import LegalKnowledgeGraph
from omnibus_rag.knowledge_graph.schema import Law, Article, EdgeType

kg = LegalKnowledgeGraph()

# Create legal document entities using Pydantic schemas
# These represent the structured hierarchy of law
law = Law(id="uu_11_2020", number=11, year=2020, title="UU Cipta Kerja", about="Omnibus Law")
article = Article(
    id="uu_11_2020_pasal_33", 
    number="33",
    full_text="...", 
    parent_regulation_id="uu_11_2020"
)

# Add nodes to the graph
kg.add_node(law)
kg.add_node(article)

# Define relationships between nodes
# The graph models how laws contain specific articles
kg.add_edge("uu_11_2020", "uu_11_2020_pasal_33", EdgeType.CONTAINS)

# Retrieve related nodes for a given entity
neighbors = kg.get_neighbors("uu_11_2020")

3. Eval metrics

Evaluate your retrieval pipeline using standard Information Retrieval metrics. The library provides pure function implementations for easy integration into testing scripts.


from omnibus_rag.eval import compute_mrr, compute_ndcg, compute_recall_at_k

# List of relevance labels (1 for relevant, 0 for not) in ranked order
# In this example, the 2nd and 5th results are relevant
ranked_relevance = [0, 1, 0, 0, 1]

# Compute various metrics for benchmarking your search engine
# compute_mrr expects a list of reciprocal ranks
mrr_val = compute_mrr([1/2])  
ndcg_val = compute_ndcg(ranked_relevance, k=5)
recall_val = compute_recall_at_k(ranked_relevance, k=3)

print(f"MRR:       {mrr_val:.3f}")    # Expected: 0.500
print(f"NDCG@5:    {ndcg_val:.3f}")   # Normalized Discounted Cumulative Gain
print(f"Recall@3:  {recall_val}")    # Expected: 1.0 (since rank 2 is in top 3)



Module Reference

The package is organized into several sub-modules, each focusing on a specific part of the RAG pipeline.


Module Description Key Components
omnibus_rag.retrieval Core search functionality. HybridRetriever, SearchResult, BaseRetriever protocol, tokenize_indonesian()
omnibus_rag.confidence Scoring and refusal logic. ConfidenceScorer, ConfidenceScore, ValidationResult, RefusalGate
omnibus_rag.grounding Hallucination detection. GroundingVerifier, BaseLLMClient protocol
omnibus_rag.knowledge_graph Graph-based legal modeling. LegalKnowledgeGraph, Law, Article, Chapter, EdgeType
omnibus_rag.eval Performance measurement. compute_mrr, compute_ndcg, compute_recall_at_k, RetrievalEvaluator, GoldenDataset



Architecture

The following tree shows the internal structure of the omnibus_rag package:


omnibus_rag/
├── retrieval/           # Hybrid search logic
│   ├── base.py          # Protocols and shared data models
│   ├── embedders.py     # Provider-specific embedding clients
│   ├── hybrid.py        # Main retrieval engine
│   └── tokenizer.py     # Indonesian legal text processing
├── confidence/          # Reliability scoring
│   ├── models.py        # Confidence score data structures
│   ├── scorer.py        # Heuristic scoring implementation
│   └── refusal.py       # Decision gate for refusals
├── grounding/           # Verification tools
│   ├── base.py          # LLM interface protocols
│   └── verifier.py      # LLM-as-judge implementation
├── knowledge_graph/     # Legal hierarchy modeling
│   ├── graph.py         # NetworkX graph management
│   ├── schema.py        # Pydantic models for legal entities
│   └── ingest.py        # Utilities for data ingestion
└── eval/                # Evaluation framework
    ├── metrics.py       # Core IR metric implementations
    ├── evaluator.py     # Pipeline evaluation runner
    └── dataset.py       # Dataset management protocols



Screenshots


Knowledge Graph: Legal Hierarchy Explorer


Knowledge Graph page

Browse the structured hierarchy of Indonesian regulations. From Undang-Undang down to individual Pasal nodes, the system models cross-regulation edge relationships via NetworkX. This visualization helps developers explore the interconnections within the legal landscape.




Compliance Checker


Compliance Checker page

Input a business scenario and receive a structured compliance analysis grounded in specific regulatory articles. This feature leverages the grounding verifier to ensure that every compliance claim is backed by a valid legal source.




Dashboard


Dashboard page

Real-time monitoring of retrieval pipeline health, query volume, and confidence score distributions. The dashboard provides insights into how the RAG system is performing in production, helping you identify trends and potential issues.




Built With


Layer Technology
Vector Database Qdrant
Dense Embeddings NVIDIA NIM · Jina AI · sentence-transformers
Sparse Search rank-bm25
Reranking BAAI/bge-reranker-v2-m3
Knowledge Graph NetworkX + Pydantic
LLM Backend Kimi K2 via NVIDIA NIM
Frontend (Live Demo) Next.js + Tailwind CSS
API (Live Demo) FastAPI



Environment Variables

The library can be configured using environment variables for various backend services. These allow you to switch between local models and cloud providers without changing code.


Variable Description Default
QDRANT_URL The URL of your Qdrant server instance. http://localhost:6333
QDRANT_API_KEY API key for authenticated Qdrant instances. None
NVIDIA_API_KEY Your NVIDIA NIM API key for cloud-based embeddings. None
JINA_API_KEY Your Jina AI API key for cloud-based embeddings. None
USE_NVIDIA_EMBEDDINGS Set to true to prioritize NVIDIA NIM over local models. false
USE_JINA_EMBEDDINGS Set to true to prioritize Jina AI over local models. false



Roadmap


  • Async Support: Native asyncio support for the HybridRetriever to improve concurrent performance.
  • LangChain Integration: Official adapter in omnibus_rag.integrations.langchain for seamless chain usage.
  • LlamaIndex Integration: Official adapter in omnibus_rag.integrations.llamaindex for index-based retrieval.
  • Multi-language Support: Expanding the tokenizer and NLP components to support languages beyond Indonesian.
  • PyPI Publication: Official release on the Python Package Index for easier installation.
  • Streaming Eval: Support for evaluating streaming LLM responses in real-time.
  • RAGAS Adapter: Compatibility layer for the RAGAS evaluation framework to leverage its metric suite.



Development

To set up the library for local development or contribution, we recommend using a virtual environment.


# Clone the repository from GitHub
git clone https://github.com/vaskoyudha/omnibus-rag
cd omnibus-rag

# Install with development dependencies (ruff, pytest, etc.)
pip install -e ".[dev]"

# Run the full test suite with verbose output
pytest tests/ -v



Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

  1. Fork the repository
  2. Create a feature branch: git checkout -b feat/your-feature
  3. Ensure all tests pass: pytest tests/ -v
  4. Follow the existing code style (ruff, type annotations, docstrings)
  5. Open a pull request

Please do not include proprietary datasets, prompt templates, or model routing configurations in pull requests.




License

MIT, see LICENSE for the full license text.




Acknowledgements

The omnibus-rag project is built on top of several open-source libraries that make production RAG possible. We are grateful to the maintainers of these projects:

  • Qdrant: High-performance vector search engine for production deployments.
  • rank-bm25: Reliable implementation of the BM25 ranking algorithm.
  • sentence-transformers: Framework for state-of-the-art sentence and text embeddings.
  • NetworkX: Powerful tool for the creation, manipulation, and study of complex networks.
  • Pydantic: Most widely used data validation and settings management library for Python.



Copyright (c) 2026 Vasco Yudha Nodyatama Sera

About

Open-source Indonesian Legal RAG framework — hybrid retrieval, grounding verification, confidence scoring, knowledge graph, and eval toolkit

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages