omnibus-rag

Omnibus Legal Compass, AI-powered Indonesian legal assistant

omnibus-rag

Production-grade Indonesian Legal RAG framework
Hybrid retrieval · Grounding verification · Confidence scoring · Knowledge graph · Eval toolkit

Live Demo · Module Reference · Examples · Report Bug · Request Feature

Live Demo

The RAG components in this library power Omnibus Legal Compass, a production Indonesian legal AI assistant covering 44 indexed regulations and 11,969+ document segments.

The hybrid retrieval + grounding pipeline in action. Every answer cites the exact legal article it draws from.

Overview

The omnibus-rag library was extracted and open-sourced from Omnibus Legal Compass, a production-grade Indonesian legal AI assistant. This framework provides a collection of modular components specifically designed for building high-reliability Retrieval-Augmented Generation (RAG) systems in the Indonesian legal domain. It includes specialized support for handling various regulation types, including Undang-Undang (UU), Peraturan Pemerintah (PP), Peraturan Presiden (Perpres), and Peraturan Menteri (Permen).

The package remains provider-agnostic, supporting integrations with Qdrant, NVIDIA NIM, and Jina AI. It also works seamlessly with local sentence-transformers for offline or on-premise deployments. Key features include a hybrid retrieval engine that fuses keyword and vector search, a grounding verifier to detect hallucinations, and a multi-factor confidence-scoring system to gate responses.

Building legal AI requires more than simple vector search. This library provides the specialized NLP tools needed for Indonesian legal text, such as abbreviation expansion and hierarchical relationship mapping. By using a knowledge graph to track how laws relate to each other, omnibus-rag enables more sophisticated retrieval strategies that respect the structure of the legal system. This approach ensures that the AI doesn't just find relevant text but understands the context of the regulations it's citing.

Features

Hybrid Retrieval: Combines BM25 sparse search with dense vector retrieval using Reciprocal Rank Fusion (RRF). It supports optional CrossEncoder reranking for improved precision in top-K results, ensuring the most relevant legal context is always prioritized.
Indonesian NLP: Features a domain-specific tokenizer that handles stopword removal, legal abbreviation expansion (e.g., UU, PP, Perpres, Permen), and synonym expansion for common Indonesian legal terminology.
Grounding Verifier: Implements an LLM-as-judge protocol to verify whether generated claims are actually supported by retrieved documents. This provides an extra layer of safety against model hallucinations in sensitive legal contexts.
Confidence Scoring: Calculates a multi-factor confidence score based on retrieval similarity, document authority, citation coverage, and query complexity. It helps identify when a query is outside the system's knowledge base.
Refusal Gate: Includes a configurable gate that automatically issues a refusal message when confidence falls below a threshold. This prevents the system from providing potentially incorrect or misleading legal information.
Knowledge Graph: Uses Pydantic and NetworkX to represent the Indonesian legal hierarchy. This allows the system to understand the relationships between regulations, chapters, and articles (e.g., UU to Bab to Pasal relationships).
Eval Toolkit: Provides a suite of offline evaluation tools, including implementations for MRR, Recall@K, and NDCG@K. These are compatible with custom GoldenDataset benchmarks for regression testing retrieval pipelines.

How It Works: 5-Stage Verification Pipeline

Every query passes through five sequential stages before an answer is returned:

Stage	Name	What It Does
01	Hybrid Search	Fuses BM25 keyword matching + dense vector retrieval via Reciprocal Rank Fusion (k=60)
02	CrossEncoder Reranking	Neural reranker re-scores top-K candidates for maximum precision
03	LLM Generation	Answer generated with forced citation to specific Pasal/UU articles
04	LLM-as-Judge	A second LLM call independently verifies every claim against the cited sources
05	Confidence Gate	Response is refused if confidence score falls below the configured threshold (default 0.30)

The "refuse if unsure" design is intentional. In legal contexts, a confident wrong answer is worse than no answer. This rigorous pipeline helps maintain high accuracy and trustworthiness in a domain where errors have real-world consequences.

Installation

You can install the core package or include optional dependencies for specific embedding providers or vector databases.

# Core (metrics, confidence, grounding, knowledge graph)
pip install omnibus-rag

# With local embeddings (sentence-transformers)
pip install "omnibus-rag[embeddings]"

# With Qdrant vector DB support
pip install "omnibus-rag[qdrant]"

# With NVIDIA NIM embedding integration
pip install "omnibus-rag[nvidia]"

# With Jina AI embedding integration
pip install "omnibus-rag[jina]"

# Install all optional dependencies
pip install "omnibus-rag[all]"

Quick Start

1. Confidence scoring + refusal gate

Use the ConfidenceScorer to evaluate the quality of your retrieval results. The RefusalGate can then decide whether the confidence is high enough to proceed with answer generation.

from omnibus_rag.confidence import ConfidenceScorer, RefusalGate
from omnibus_rag.retrieval import SearchResult

# Initialize the scorer and gate
scorer = ConfidenceScorer()
gate = RefusalGate(threshold=0.30)

# Example search results from a retriever
results = [
    SearchResult(
        id=1, 
        text="Pasal 33 UU No. 11 Tahun 2020...", 
        citation="UU 11/2020",
        citation_id="uu_11_2020", 
        score=0.87, 
        metadata={}
    ),
]

# Calculate confidence for a specific query
# This score takes into account similarity, document authority, and more
confidence = scorer.score(results, query="hak cipta software")

# Check if the system should refuse to answer
# If confidence is too low, we avoid giving potentially wrong advice
refusal = gate.check(confidence)
if refusal:
    # If confidence is below threshold, print the refusal message
    print(refusal)  # Returns a professional Indonesian refusal message
else:
    # Otherwise, proceed with the confidence label and score
    print(f"Confidence: {confidence.label} ({confidence.numeric:.2f})")

2. Knowledge graph

The knowledge graph allows you to model the structure of legal documents and their interconnections. This is useful for navigating the hierarchy of Indonesian regulations.

from omnibus_rag.knowledge_graph import LegalKnowledgeGraph
from omnibus_rag.knowledge_graph.schema import Law, Article, EdgeType

kg = LegalKnowledgeGraph()

# Create legal document entities using Pydantic schemas
# These represent the structured hierarchy of law
law = Law(id="uu_11_2020", number=11, year=2020, title="UU Cipta Kerja", about="Omnibus Law")
article = Article(
    id="uu_11_2020_pasal_33", 
    number="33",
    full_text="...", 
    parent_regulation_id="uu_11_2020"
)

# Add nodes to the graph
kg.add_node(law)
kg.add_node(article)

# Define relationships between nodes
# The graph models how laws contain specific articles
kg.add_edge("uu_11_2020", "uu_11_2020_pasal_33", EdgeType.CONTAINS)

# Retrieve related nodes for a given entity
neighbors = kg.get_neighbors("uu_11_2020")

3. Eval metrics

Evaluate your retrieval pipeline using standard Information Retrieval metrics. The library provides pure function implementations for easy integration into testing scripts.

from omnibus_rag.eval import compute_mrr, compute_ndcg, compute_recall_at_k

# List of relevance labels (1 for relevant, 0 for not) in ranked order
# In this example, the 2nd and 5th results are relevant
ranked_relevance = [0, 1, 0, 0, 1]

# Compute various metrics for benchmarking your search engine
# compute_mrr expects a list of reciprocal ranks
mrr_val = compute_mrr([1/2])  
ndcg_val = compute_ndcg(ranked_relevance, k=5)
recall_val = compute_recall_at_k(ranked_relevance, k=3)

print(f"MRR:       {mrr_val:.3f}")    # Expected: 0.500
print(f"NDCG@5:    {ndcg_val:.3f}")   # Normalized Discounted Cumulative Gain
print(f"Recall@3:  {recall_val}")    # Expected: 1.0 (since rank 2 is in top 3)

Module Reference

The package is organized into several sub-modules, each focusing on a specific part of the RAG pipeline.

Module	Description	Key Components
`omnibus_rag.retrieval`	Core search functionality.	`HybridRetriever`, `SearchResult`, `BaseRetriever` protocol, `tokenize_indonesian()`
`omnibus_rag.confidence`	Scoring and refusal logic.	`ConfidenceScorer`, `ConfidenceScore`, `ValidationResult`, `RefusalGate`
`omnibus_rag.grounding`	Hallucination detection.	`GroundingVerifier`, `BaseLLMClient` protocol
`omnibus_rag.knowledge_graph`	Graph-based legal modeling.	`LegalKnowledgeGraph`, `Law`, `Article`, `Chapter`, `EdgeType`
`omnibus_rag.eval`	Performance measurement.	`compute_mrr`, `compute_ndcg`, `compute_recall_at_k`, `RetrievalEvaluator`, `GoldenDataset`

Architecture

The following tree shows the internal structure of the omnibus_rag package:

omnibus_rag/
├── retrieval/           # Hybrid search logic
│   ├── base.py          # Protocols and shared data models
│   ├── embedders.py     # Provider-specific embedding clients
│   ├── hybrid.py        # Main retrieval engine
│   └── tokenizer.py     # Indonesian legal text processing
├── confidence/          # Reliability scoring
│   ├── models.py        # Confidence score data structures
│   ├── scorer.py        # Heuristic scoring implementation
│   └── refusal.py       # Decision gate for refusals
├── grounding/           # Verification tools
│   ├── base.py          # LLM interface protocols
│   └── verifier.py      # LLM-as-judge implementation
├── knowledge_graph/     # Legal hierarchy modeling
│   ├── graph.py         # NetworkX graph management
│   ├── schema.py        # Pydantic models for legal entities
│   └── ingest.py        # Utilities for data ingestion
└── eval/                # Evaluation framework
    ├── metrics.py       # Core IR metric implementations
    ├── evaluator.py     # Pipeline evaluation runner
    └── dataset.py       # Dataset management protocols

Screenshots

Knowledge Graph: Legal Hierarchy Explorer

Browse the structured hierarchy of Indonesian regulations. From Undang-Undang down to individual Pasal nodes, the system models cross-regulation edge relationships via NetworkX. This visualization helps developers explore the interconnections within the legal landscape.

Compliance Checker

Input a business scenario and receive a structured compliance analysis grounded in specific regulatory articles. This feature leverages the grounding verifier to ensure that every compliance claim is backed by a valid legal source.

Dashboard

Real-time monitoring of retrieval pipeline health, query volume, and confidence score distributions. The dashboard provides insights into how the RAG system is performing in production, helping you identify trends and potential issues.

Built With

Layer	Technology
Vector Database	Qdrant
Dense Embeddings	NVIDIA NIM · Jina AI · sentence-transformers
Sparse Search	rank-bm25
Reranking	BAAI/bge-reranker-v2-m3
Knowledge Graph	NetworkX + Pydantic
LLM Backend	Kimi K2 via NVIDIA NIM
Frontend (Live Demo)	Next.js + Tailwind CSS
API (Live Demo)	FastAPI

Environment Variables

The library can be configured using environment variables for various backend services. These allow you to switch between local models and cloud providers without changing code.

Variable	Description	Default
`QDRANT_URL`	The URL of your Qdrant server instance.	`http://localhost:6333`
`QDRANT_API_KEY`	API key for authenticated Qdrant instances.	`None`
`NVIDIA_API_KEY`	Your NVIDIA NIM API key for cloud-based embeddings.	`None`
`JINA_API_KEY`	Your Jina AI API key for cloud-based embeddings.	`None`
`USE_NVIDIA_EMBEDDINGS`	Set to `true` to prioritize NVIDIA NIM over local models.	`false`
`USE_JINA_EMBEDDINGS`	Set to `true` to prioritize Jina AI over local models.	`false`

Roadmap

Async Support: Native asyncio support for the HybridRetriever to improve concurrent performance.
LangChain Integration: Official adapter in omnibus_rag.integrations.langchain for seamless chain usage.
LlamaIndex Integration: Official adapter in omnibus_rag.integrations.llamaindex for index-based retrieval.
Multi-language Support: Expanding the tokenizer and NLP components to support languages beyond Indonesian.
PyPI Publication: Official release on the Python Package Index for easier installation.
Streaming Eval: Support for evaluating streaming LLM responses in real-time.
RAGAS Adapter: Compatibility layer for the RAGAS evaluation framework to leverage its metric suite.

Development

To set up the library for local development or contribution, we recommend using a virtual environment.

# Clone the repository from GitHub
git clone https://github.com/vaskoyudha/omnibus-rag
cd omnibus-rag

# Install with development dependencies (ruff, pytest, etc.)
pip install -e ".[dev]"

# Run the full test suite with verbose output
pytest tests/ -v

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

Fork the repository
Create a feature branch: git checkout -b feat/your-feature
Ensure all tests pass: pytest tests/ -v
Follow the existing code style (ruff, type annotations, docstrings)
Open a pull request

Please do not include proprietary datasets, prompt templates, or model routing configurations in pull requests.

License

MIT, see LICENSE for the full license text.

Acknowledgements

The omnibus-rag project is built on top of several open-source libraries that make production RAG possible. We are grateful to the maintainers of these projects:

Qdrant: High-performance vector search engine for production deployments.
rank-bm25: Reliable implementation of the BM25 ranking algorithm.
sentence-transformers: Framework for state-of-the-art sentence and text embeddings.
NetworkX: Powerful tool for the creation, manipulation, and study of complex networks.
Pydantic: Most widely used data validation and settings management library for Python.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
docs/screenshots		docs/screenshots
examples		examples
omnibus_rag		omnibus_rag
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

omnibus-rag

Live Demo

Overview

Features

How It Works: 5-Stage Verification Pipeline

Installation

Quick Start

1. Confidence scoring + refusal gate

2. Knowledge graph

3. Eval metrics

Module Reference

Architecture

Screenshots

Knowledge Graph: Legal Hierarchy Explorer

Compliance Checker

Dashboard

Built With

Environment Variables

Roadmap

Development

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

omnibus-rag

Live Demo

Overview

Features

How It Works: 5-Stage Verification Pipeline

Installation

Quick Start

1. Confidence scoring + refusal gate

2. Knowledge graph

3. Eval metrics

Module Reference

Architecture

Screenshots

Knowledge Graph: Legal Hierarchy Explorer

Compliance Checker

Dashboard

Built With

Environment Variables

Roadmap

Development

Contributing

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages