Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion backend/app/agents/legal_research/retrievers.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,33 @@
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import SentenceTransformerEmbeddings

def get_hybrid_retriever(
documents: List[Document],
embedding_model: str = "all-MiniLM-L6-v2",
k: int = 5,
) -> EnsembleRetriever:
"""
Builds an in-memory hybrid retriever using dense + sparse retrieval.

This is primarily used for testing and local experimentation where a
list of LangChain `Document` objects is already available.
"""
embeddings = SentenceTransformerEmbeddings(model_name=embedding_model)

# Dense retriever via vector store
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": k})

# Sparse retriever via BM25
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = k

# Fuse results from both retrievers
return EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.5, 0.5]
)


def get_persistent_retriever(persist_directory: str = "chroma_db", collection_name: str = "legal_judgments", k: int = 5):
"""
Expand All @@ -26,4 +53,3 @@ def get_persistent_retriever(persist_directory: str = "chroma_db", collection_na
)

return vectorstore.as_retriever(search_kwargs={"k": k})

214 changes: 214 additions & 0 deletions docs/rag_upgrade_proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# RAG + Validation Upgrade Proposal (Best-Outcome Plan)

## Objective
Upgrade the current drafting pipeline from "prompt + evidence extraction + generation" to a legally grounded, auditable, and citation-aware system with the highest practical quality.

## Current Gap Summary
- Draft generation (`/documents/generation/generate`) does not retrieve legal context from vector/keyword stores before generation.
- Research RAG exists but is isolated from the drafting path.
- Validation is present at a high level but not yet a full legal rule engine with clause-level checks.
- Clause tagging/traceability is missing for compliance review and explainability.

## Target Architecture (Recommended)
1. User Prompt + Optional Evidence
2. Template & Clause Planner
3. Hybrid Retriever (Dense + BM25 + RRF)
4. Context Pack Builder (jurisdiction + recency + source quality filtering)
5. Clause Generator (LLM, per-clause)
6. Entity/Fact Extractor (from generated draft)
7. Legal Validation Engine (deterministic + LLM-assisted fallback)
8. Citation & Source Trace Attachment
9. Final Document + PDF/DOCX Export

## RAG vs Agentic RAG (Decision Clarification)
- **Recommended default for this product:** **RAG-assisted drafting** (non-agentic by default).
- **Why:** Most generation requests are linear and latency-sensitive; they benefit from retrieval grounding without full multi-agent orchestration overhead.
- **When to switch to agentic RAG:** only for complex matters (multi-document conflicts, citation-heavy petitions, contradictory evidence, or mandatory multi-step verification).
- **Implementation policy:**
- Start with one deterministic generation pipeline + retrieval + validation.
- Gate agentic mode behind a policy trigger (complexity score / user selection / template type).
- Preserve a fast-path non-agentic route for routine notices and standard templates.



## If You Make It Completely Agentic RAG (Trade-off Analysis)
**Short answer:** possible, but not recommended as the default for all requests.

### Benefits
- Better handling of complex, multi-step legal workflows (conflict resolution, cross-document reasoning, citation reconciliation).
- Natural tool orchestration (retrieval, validator calls, policy checks, re-drafting loops).
- Stronger explainability if each agent emits traces and intermediate decisions.

### Risks / Costs
- Higher latency and token cost due to planner + multiple agent/tool turns.
- Larger failure surface (routing errors, tool timeout chains, brittle planner behavior).
- Harder QA and determinism, especially for production SLAs.
- More operational complexity (state store, retries, observability, guardrails).

### Recommended policy (best outcome)
- Keep **default path = non-agentic RAG + deterministic validation**.
- Enable **agentic mode only** when complexity triggers are met, e.g.:
- >1 uploaded evidence document with conflicting facts,
- citation-heavy pleading templates,
- unresolved validator errors after one deterministic regeneration pass.
- Add a hard budget: max steps, max tokens, max wall-time; auto-fallback to deterministic RAG path.

### Minimum controls before full-agentic rollout
1. Step-level tracing and replay.
2. Strong tool contracts and schema validation between agents.
3. Circuit breakers and fallback on each tool edge.
4. Offline benchmark suite: legal accuracy, citation grounding, latency, cost.

## Required Changes (Priority Ordered)

### P0 — Integrate retrieval into draft generation route
**Why:** Largest quality gain; prevents unsupported legal statements.

- Add retrieval step inside `generate_document` before `assembly_engine.assemble_document`.
- Input query for retrieval: `case_facts.prompt`, extracted issues/statutes, and template metadata.
- Inject retrieved legal context into generation prompt under a strict section: `Grounded Legal Context`.
- Persist source IDs/URLs with the generated document metadata.

**Files to touch:**
- `backend/app/api/v1/endpoints/documents.py`
- `backend/app/services/llm_service.py`
- `backend/app/agents/document_generator/prompt_templates.py`
- `backend/app/models/models.py` (optional metadata field if needed)

### P0 — Implement actual hybrid retrieval (not config-only)
**Why:** Legal queries need both exact lexical matching (sections/citations) and semantic matching.

- Implement `get_hybrid_retriever(...)` in legal research layer.
- Compose:
- Dense retriever: Chroma + SentenceTransformer embeddings.
- Sparse retriever: BM25 over indexed chunks.
- Fusion: Reciprocal Rank Fusion (RRF) with tunable `k`.
- Add weighted reranking based on:
- jurisdiction match,
- court level,
- recency,
- source reliability.

**Files to touch:**
- `backend/app/agents/legal_research/retrievers.py`
- `backend/app/core/config.py`
- `backend/app/agents/legal_research/agent.py`

### P0 — Clause tagging and per-clause generation
**Why:** Improves controllability and validation precision.

- Add clause schema (`clause_type`, `required`, `jurisdiction_rules`, `citations_required`).
- Split template into clause objects and generate clause-by-clause.
- Store provenance: which sources informed each clause.

**Files to touch:**
- `backend/app/agents/document_generator/section_generator.py`
- `backend/app/agents/document_generator/assembly_engine.py`
- new: `backend/app/agents/document_generator/clause_schema.py`

### P0 — Legal validation engine
**Why:** Core to production trust and defensibility.

- Deterministic validators:
- required clauses present,
- date consistency,
- amount consistency,
- party-role consistency,
- jurisdiction-specific mandatory phrasing.
- Citation validators:
- every legal proposition requiring support has at least one attached source.
- source citation format and URL validity.
- Add confidence score + validation report in response.

**Files to touch:**
- new: `backend/app/agents/document_generator/legal_validator.py`
- `backend/app/agents/document_generator/consistency_checker.py`
- `backend/app/api/v1/endpoints/documents.py` (return validation payload)

### P1 — Evidence harmonization and conflict resolution
**Why:** Prevent contradictory draft facts from prompt vs uploaded evidence.

- Define fact precedence policy:
1) authenticated evidence extraction,
2) explicit user override,
3) inferred/retrieved context.
- Add conflict log for user review UI.

**Files to touch:**
- `backend/app/agents/document_processor/llm_extractor.py`
- `backend/app/api/v1/endpoints/documents.py`
- frontend review panel integration.

### P1 — Research + drafting unification
**Why:** Single "legal memory" for both answering and drafting.

- Expose one retrieval service used by both `/research/query` and generation pipeline.
- Keep retrieval telemetry and cache hot queries.

**Files to touch:**
- new: `backend/app/services/retrieval_service.py`
- `backend/app/api/v1/endpoints/research.py`
- `backend/app/api/v1/endpoints/documents.py`

### P1 — Output traceability and explainability
**Why:** Required for lawyer trust and auditability.

- Return `clause_trace`: clause -> source chunk IDs/URLs -> validation status.
- Add "show sources" in editor next to generated clause blocks.

**Files to touch:**
- backend response schemas
- `frontend/app/editor/page.tsx`

### P2 — Agentic orchestration (targeted)
**Why:** Useful for complex documents, not required for every request.

- Use orchestrator only when:
- multiple evidence docs,
- conflicting facts,
- citation-heavy petitions.
- Keep fast non-agentic path for simple notices.

**Files to touch:**
- `backend/app/agents/orchestrator/*`
- route-level policy switch.

## API Contract Additions
- `/documents/generation/generate` response should include:
- `content`
- `citations[]`
- `validation_report`
- `clause_trace[]`
- `confidence_score`

## Data Model Additions
- Add JSON fields (or linked tables) for:
- `retrieval_sources`
- `validation_report`
- `clause_trace`

## Evaluation Metrics (must-track)
1. Grounded citation precision@k
2. Missing mandatory clause rate
3. Factual inconsistency rate
4. Hallucinated legal reference rate
5. User correction rate per generated draft
6. End-to-end latency (P50/P95)

## Rollout Strategy
1. **Phase 1 (2 weeks):** Retrieval-in-generation + source attachment.
2. **Phase 2 (2–3 weeks):** Hybrid retriever + reranker + offline benchmarks.
3. **Phase 3 (2 weeks):** Clause tagging + validation engine + UI report.
4. **Phase 4 (optional):** Targeted agentic mode for complex workflows.

## Risk Controls
- Fallback to current generation path if retrieval fails.
- Strict timeout budget per stage.
- Validation failure should degrade to "draft with warnings" not silent pass.

## Definition of Done (Best-Outcome Standard)
- Drafts include traceable legal sources for critical claims.
- Mandatory clauses validated per selected template/jurisdiction.
- Evidence vs prompt conflicts surfaced to user.
- Hybrid retrieval measurably outperforms dense-only baseline.
- No major latency regression beyond agreed SLA.
39 changes: 39 additions & 0 deletions docs/upgrade_execution_plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Upgrade Execution Plan (Kickoff)

## Status
- **Kickoff started**
- Scope: Phase 0 cleanup + interface alignment before major RAG feature changes.

## Phase 0 (Current) — Cleanup and Baseline Hardening

### 0.1 Interface alignment
- [x] Add missing `get_hybrid_retriever(...)` implementation in legal research retrievers module.
- [ ] Align hybrid retriever behavior and defaults with product requirements (weights, k, fusion policy).

### 0.2 Simulation and placeholder audit
- [ ] Replace or isolate simulated paths in orchestrator/document export routes.
- [ ] Identify dead/unused utilities and stale references.
- [ ] Produce `cleanup_audit.md` with keep/remove/replace decisions.

### 0.3 Test baseline
- [ ] Run focused retriever tests and fix regressions.
- [ ] Add contract tests for retriever output schema.
- [ ] Add smoke test for generation route with retrieval disabled/enabled flags.

## Phase 1 — Retrieval in Draft Generation
- [ ] Inject retrieval stage into `/documents/generation/generate`.
- [ ] Build grounded prompt block with source metadata.
- [ ] Return draft + sources in response.

## Phase 2 — Clause + Validation
- [ ] Clause-level generation and traceability.
- [ ] Deterministic legal validation report.
- [ ] Confidence scoring and citation checks.

## Phase 3 — Targeted Agentic Mode
- [ ] Policy-gated agentic escalation for complex cases only.
- [ ] Step budgets, circuit breakers, fallback to deterministic RAG.

## Immediate next implementation ticket
1. Verify hybrid retriever with unit tests.
2. Introduce retrieval service abstraction used by both research and generation paths.