Production-grade Indonesian Legal RAG framework
Hybrid retrieval · Grounding verification · Confidence scoring · Knowledge graph · Eval toolkit
Live Demo · Module Reference · Examples · Report Bug · Request Feature
The RAG components in this library power Omnibus Legal Compass, a production Indonesian legal AI assistant covering 44 indexed regulations and 11,969+ document segments.
The hybrid retrieval + grounding pipeline in action. Every answer cites the exact legal article it draws from.
The omnibus-rag library was extracted and open-sourced from Omnibus Legal Compass, a production-grade Indonesian legal AI assistant. This framework provides a collection of modular components specifically designed for building high-reliability Retrieval-Augmented Generation (RAG) systems in the Indonesian legal domain. It includes specialized support for handling various regulation types, including Undang-Undang (UU), Peraturan Pemerintah (PP), Peraturan Presiden (Perpres), and Peraturan Menteri (Permen).
The package remains provider-agnostic, supporting integrations with Qdrant, NVIDIA NIM, and Jina AI. It also works seamlessly with local sentence-transformers for offline or on-premise deployments. Key features include a hybrid retrieval engine that fuses keyword and vector search, a grounding verifier to detect hallucinations, and a multi-factor confidence-scoring system to gate responses.
Building legal AI requires more than simple vector search. This library provides the specialized NLP tools needed for Indonesian legal text, such as abbreviation expansion and hierarchical relationship mapping. By using a knowledge graph to track how laws relate to each other, omnibus-rag enables more sophisticated retrieval strategies that respect the structure of the legal system. This approach ensures that the AI doesn't just find relevant text but understands the context of the regulations it's citing.
-
Hybrid Retrieval: Combines BM25 sparse search with dense vector retrieval using Reciprocal Rank Fusion (RRF). It supports optional CrossEncoder reranking for improved precision in top-K results, ensuring the most relevant legal context is always prioritized.
-
Indonesian NLP: Features a domain-specific tokenizer that handles stopword removal, legal abbreviation expansion (e.g., UU, PP, Perpres, Permen), and synonym expansion for common Indonesian legal terminology.
-
Grounding Verifier: Implements an LLM-as-judge protocol to verify whether generated claims are actually supported by retrieved documents. This provides an extra layer of safety against model hallucinations in sensitive legal contexts.
-
Confidence Scoring: Calculates a multi-factor confidence score based on retrieval similarity, document authority, citation coverage, and query complexity. It helps identify when a query is outside the system's knowledge base.
-
Refusal Gate: Includes a configurable gate that automatically issues a refusal message when confidence falls below a threshold. This prevents the system from providing potentially incorrect or misleading legal information.
-
Knowledge Graph: Uses Pydantic and NetworkX to represent the Indonesian legal hierarchy. This allows the system to understand the relationships between regulations, chapters, and articles (e.g., UU to Bab to Pasal relationships).
-
Eval Toolkit: Provides a suite of offline evaluation tools, including implementations for MRR, Recall@K, and NDCG@K. These are compatible with custom
GoldenDatasetbenchmarks for regression testing retrieval pipelines.
Every query passes through five sequential stages before an answer is returned:
| Stage | Name | What It Does |
|---|---|---|
| 01 | Hybrid Search | Fuses BM25 keyword matching + dense vector retrieval via Reciprocal Rank Fusion (k=60) |
| 02 | CrossEncoder Reranking | Neural reranker re-scores top-K candidates for maximum precision |
| 03 | LLM Generation | Answer generated with forced citation to specific Pasal/UU articles |
| 04 | LLM-as-Judge | A second LLM call independently verifies every claim against the cited sources |
| 05 | Confidence Gate | Response is refused if confidence score falls below the configured threshold (default 0.30) |
The "refuse if unsure" design is intentional. In legal contexts, a confident wrong answer is worse than no answer. This rigorous pipeline helps maintain high accuracy and trustworthiness in a domain where errors have real-world consequences.
You can install the core package or include optional dependencies for specific embedding providers or vector databases.
# Core (metrics, confidence, grounding, knowledge graph)
pip install omnibus-rag
# With local embeddings (sentence-transformers)
pip install "omnibus-rag[embeddings]"
# With Qdrant vector DB support
pip install "omnibus-rag[qdrant]"
# With NVIDIA NIM embedding integration
pip install "omnibus-rag[nvidia]"
# With Jina AI embedding integration
pip install "omnibus-rag[jina]"
# Install all optional dependencies
pip install "omnibus-rag[all]"Use the ConfidenceScorer to evaluate the quality of your retrieval results. The RefusalGate can then decide whether the confidence is high enough to proceed with answer generation.
from omnibus_rag.confidence import ConfidenceScorer, RefusalGate
from omnibus_rag.retrieval import SearchResult
# Initialize the scorer and gate
scorer = ConfidenceScorer()
gate = RefusalGate(threshold=0.30)
# Example search results from a retriever
results = [
SearchResult(
id=1,
text="Pasal 33 UU No. 11 Tahun 2020...",
citation="UU 11/2020",
citation_id="uu_11_2020",
score=0.87,
metadata={}
),
]
# Calculate confidence for a specific query
# This score takes into account similarity, document authority, and more
confidence = scorer.score(results, query="hak cipta software")
# Check if the system should refuse to answer
# If confidence is too low, we avoid giving potentially wrong advice
refusal = gate.check(confidence)
if refusal:
# If confidence is below threshold, print the refusal message
print(refusal) # Returns a professional Indonesian refusal message
else:
# Otherwise, proceed with the confidence label and score
print(f"Confidence: {confidence.label} ({confidence.numeric:.2f})")The knowledge graph allows you to model the structure of legal documents and their interconnections. This is useful for navigating the hierarchy of Indonesian regulations.
from omnibus_rag.knowledge_graph import LegalKnowledgeGraph
from omnibus_rag.knowledge_graph.schema import Law, Article, EdgeType
kg = LegalKnowledgeGraph()
# Create legal document entities using Pydantic schemas
# These represent the structured hierarchy of law
law = Law(id="uu_11_2020", number=11, year=2020, title="UU Cipta Kerja", about="Omnibus Law")
article = Article(
id="uu_11_2020_pasal_33",
number="33",
full_text="...",
parent_regulation_id="uu_11_2020"
)
# Add nodes to the graph
kg.add_node(law)
kg.add_node(article)
# Define relationships between nodes
# The graph models how laws contain specific articles
kg.add_edge("uu_11_2020", "uu_11_2020_pasal_33", EdgeType.CONTAINS)
# Retrieve related nodes for a given entity
neighbors = kg.get_neighbors("uu_11_2020")Evaluate your retrieval pipeline using standard Information Retrieval metrics. The library provides pure function implementations for easy integration into testing scripts.
from omnibus_rag.eval import compute_mrr, compute_ndcg, compute_recall_at_k
# List of relevance labels (1 for relevant, 0 for not) in ranked order
# In this example, the 2nd and 5th results are relevant
ranked_relevance = [0, 1, 0, 0, 1]
# Compute various metrics for benchmarking your search engine
# compute_mrr expects a list of reciprocal ranks
mrr_val = compute_mrr([1/2])
ndcg_val = compute_ndcg(ranked_relevance, k=5)
recall_val = compute_recall_at_k(ranked_relevance, k=3)
print(f"MRR: {mrr_val:.3f}") # Expected: 0.500
print(f"NDCG@5: {ndcg_val:.3f}") # Normalized Discounted Cumulative Gain
print(f"Recall@3: {recall_val}") # Expected: 1.0 (since rank 2 is in top 3)The package is organized into several sub-modules, each focusing on a specific part of the RAG pipeline.
| Module | Description | Key Components |
|---|---|---|
omnibus_rag.retrieval |
Core search functionality. | HybridRetriever, SearchResult, BaseRetriever protocol, tokenize_indonesian() |
omnibus_rag.confidence |
Scoring and refusal logic. | ConfidenceScorer, ConfidenceScore, ValidationResult, RefusalGate |
omnibus_rag.grounding |
Hallucination detection. | GroundingVerifier, BaseLLMClient protocol |
omnibus_rag.knowledge_graph |
Graph-based legal modeling. | LegalKnowledgeGraph, Law, Article, Chapter, EdgeType |
omnibus_rag.eval |
Performance measurement. | compute_mrr, compute_ndcg, compute_recall_at_k, RetrievalEvaluator, GoldenDataset |
The following tree shows the internal structure of the omnibus_rag package:
omnibus_rag/
├── retrieval/ # Hybrid search logic
│ ├── base.py # Protocols and shared data models
│ ├── embedders.py # Provider-specific embedding clients
│ ├── hybrid.py # Main retrieval engine
│ └── tokenizer.py # Indonesian legal text processing
├── confidence/ # Reliability scoring
│ ├── models.py # Confidence score data structures
│ ├── scorer.py # Heuristic scoring implementation
│ └── refusal.py # Decision gate for refusals
├── grounding/ # Verification tools
│ ├── base.py # LLM interface protocols
│ └── verifier.py # LLM-as-judge implementation
├── knowledge_graph/ # Legal hierarchy modeling
│ ├── graph.py # NetworkX graph management
│ ├── schema.py # Pydantic models for legal entities
│ └── ingest.py # Utilities for data ingestion
└── eval/ # Evaluation framework
├── metrics.py # Core IR metric implementations
├── evaluator.py # Pipeline evaluation runner
└── dataset.py # Dataset management protocols
Browse the structured hierarchy of Indonesian regulations. From Undang-Undang down to individual Pasal nodes, the system models cross-regulation edge relationships via NetworkX. This visualization helps developers explore the interconnections within the legal landscape.
Input a business scenario and receive a structured compliance analysis grounded in specific regulatory articles. This feature leverages the grounding verifier to ensure that every compliance claim is backed by a valid legal source.
Real-time monitoring of retrieval pipeline health, query volume, and confidence score distributions. The dashboard provides insights into how the RAG system is performing in production, helping you identify trends and potential issues.
| Layer | Technology |
|---|---|
| Vector Database | Qdrant |
| Dense Embeddings | NVIDIA NIM · Jina AI · sentence-transformers |
| Sparse Search | rank-bm25 |
| Reranking | BAAI/bge-reranker-v2-m3 |
| Knowledge Graph | NetworkX + Pydantic |
| LLM Backend | Kimi K2 via NVIDIA NIM |
| Frontend (Live Demo) | Next.js + Tailwind CSS |
| API (Live Demo) | FastAPI |
The library can be configured using environment variables for various backend services. These allow you to switch between local models and cloud providers without changing code.
| Variable | Description | Default |
|---|---|---|
QDRANT_URL |
The URL of your Qdrant server instance. | http://localhost:6333 |
QDRANT_API_KEY |
API key for authenticated Qdrant instances. | None |
NVIDIA_API_KEY |
Your NVIDIA NIM API key for cloud-based embeddings. | None |
JINA_API_KEY |
Your Jina AI API key for cloud-based embeddings. | None |
USE_NVIDIA_EMBEDDINGS |
Set to true to prioritize NVIDIA NIM over local models. |
false |
USE_JINA_EMBEDDINGS |
Set to true to prioritize Jina AI over local models. |
false |
- Async Support: Native
asynciosupport for theHybridRetrieverto improve concurrent performance. - LangChain Integration: Official adapter in
omnibus_rag.integrations.langchainfor seamless chain usage. - LlamaIndex Integration: Official adapter in
omnibus_rag.integrations.llamaindexfor index-based retrieval. - Multi-language Support: Expanding the tokenizer and NLP components to support languages beyond Indonesian.
- PyPI Publication: Official release on the Python Package Index for easier installation.
- Streaming Eval: Support for evaluating streaming LLM responses in real-time.
- RAGAS Adapter: Compatibility layer for the RAGAS evaluation framework to leverage its metric suite.
To set up the library for local development or contribution, we recommend using a virtual environment.
# Clone the repository from GitHub
git clone https://github.com/vaskoyudha/omnibus-rag
cd omnibus-rag
# Install with development dependencies (ruff, pytest, etc.)
pip install -e ".[dev]"
# Run the full test suite with verbose output
pytest tests/ -vContributions are welcome. Please open an issue first to discuss what you'd like to change.
- Fork the repository
- Create a feature branch:
git checkout -b feat/your-feature - Ensure all tests pass:
pytest tests/ -v - Follow the existing code style (ruff, type annotations, docstrings)
- Open a pull request
Please do not include proprietary datasets, prompt templates, or model routing configurations in pull requests.
MIT, see LICENSE for the full license text.
The omnibus-rag project is built on top of several open-source libraries that make production RAG possible. We are grateful to the maintainers of these projects:
- Qdrant: High-performance vector search engine for production deployments.
- rank-bm25: Reliable implementation of the BM25 ranking algorithm.
- sentence-transformers: Framework for state-of-the-art sentence and text embeddings.
- NetworkX: Powerful tool for the creation, manipulation, and study of complex networks.
- Pydantic: Most widely used data validation and settings management library for Python.
Copyright (c) 2026 Vasco Yudha Nodyatama Sera



