JurisFind

AI-powered legal research and conversational analysis platform. Search across 46,000+ indexed Indian Supreme Court judgments using semantic similarity, chat with specific cases or your own uploaded PDFs, and ask general legal questions — all from a single persistent workspace backed by a LangGraph agent orchestrator.

Features

Semantic Search Natural language search over 46,456 pre-indexed Indian Supreme Court judgments. Queries are embedded using sentence-transformers/all-mpnet-base-v2 and compared against a Qdrant vector database (1.1M+ vectors). Results are ranked by cosine similarity and include metadata filters for court, year, state, and case type.

Agentic RAG Chat Every chat request flows through a LangGraph state machine that classifies intent and routes to one of three execution paths: general legal knowledge, document-specific RAG, or full corpus search. The classifier and the answer node together make exactly two LLM calls per request. Citations are built directly from Qdrant payload metadata — no LLM extraction.

Document-Specific Chat Clicking Analyze on a corpus search result or uploading a private PDF attaches it to a chat session. The agent automatically searches the correct vector store: Qdrant (filtered by document ID) for corpus cases, pgvector for private uploads. Both paths produce cited, grounded answers.

Private PDF Upload Users can upload their own confidential PDFs. A Celery worker asynchronously extracts text via PyMuPDF, chunks it with LangChain, generates 768-dim embeddings, and stores them in the private pgvector store isolated by owner_id. Private documents never interact with the shared Qdrant corpus.

Session Management Conversations are persistent and multi-document. Each session stores ordered messages, attached documents, and citations in PostgreSQL. Sessions are strictly isolated by user_id and session_id — every API request re-validates JWT ownership before any data is read or written.

Architecture

User Request (POST /api/sessions/{id}/messages)
        |
        v
  FastAPI Endpoint
  - Verify JWT + session ownership
  - Persist user message to DB
  - Load chat history from DB
  - Collect attached document IDs
        |
        v
  LangGraph State Machine (juris_graph)
        |
   classifier_node  <-- LLM Call 1 (intent + guardrail)
        |
   -----+------------------------------------------
   |                  |                   |        |
   v                  v                   v        v
general_chat    document_chat      corpus_search  blocked
(LLM Call 2)   (Qdrant or         (Qdrant full   (rejection
                pgvector + LLM)    scan + LLM)    message)
        |
        v
  Citations built from Qdrant payload (no LLM)
        |
        v
  SSE stream to frontend → save to DB

Infrastructure layers:

FastAPI handles HTTP, JWT auth, session ownership, and SSE streaming.
LangGraph orchestrates intent classification and retrieval routing.
Celery + RabbitMQ processes document uploads asynchronously.
PostgreSQL + pgvector stores relational data and private document vectors.
Qdrant stores the 46k-case shared corpus (1.1M vectors, metadata indexed).
Groq runs llama-3.3-70b-versatile for LLM inference.

Tech Stack

Component	Technology
Frontend	React 18, Vite, TailwindCSS
Backend	FastAPI, Python 3.9+, SQLAlchemy, Alembic
Agent Orchestration	LangGraph, LangChain
Task Queue	Celery, RabbitMQ
Database	PostgreSQL 17, pgvector
Vector Store (Corpus)	Qdrant
LLM	Groq llama-3.3-70b-versatile
Embeddings	sentence-transformers/all-mpnet-base-v2 (768-dim)
PDF Processing	PyMuPDF, LangChain RecursiveCharacterTextSplitter
File Storage	Azure Blob Storage (or local fallback)
Deployment	Docker Compose, Azure VM, Azure Static Web Apps

Project Structure

JurisFind/
├── backend/
│   ├── alembic/                  # Database migrations
│   ├── app/
│   │   ├── agents/               # LangGraph agent orchestrator
│   │   │   ├── graph.py          # Compiled StateGraph singleton
│   │   │   ├── state.py          # JurisFindState TypedDict
│   │   │   └── nodes/
│   │   │       ├── classifier.py     # Node 1: intent + guardrail
│   │   │       ├── general_chat.py   # Node 2A: general legal Q&A
│   │   │       ├── document_chat.py  # Node 2B: document RAG
│   │   │       ├── corpus_search.py  # Node 2C: full corpus search
│   │   │       ├── _embedder.py      # Shared embedding helper
│   │   │       └── _qdrant.py        # Shared Qdrant client
│   │   ├── ai/                   # Legacy agents (kept for reference)
│   │   ├── api/                  # FastAPI routers
│   │   │   ├── auth.py           # Register, login, profile
│   │   │   ├── sessions.py       # Chat sessions + SSE streaming
│   │   │   ├── documents.py      # PDF upload and status
│   │   │   ├── cases.py          # Corpus case management
│   │   │   ├── search.py         # Semantic search endpoints
│   │   │   └── dependencies/     # JWT validation
│   │   ├── db/
│   │   │   ├── models.py         # SQLAlchemy models
│   │   │   ├── crud/             # Repository functions
│   │   │   └── session.py        # DB session management
│   │   ├── schemas/              # Pydantic request/response models
│   │   ├── services/             # Embedding, retrieval, blob storage
│   │   ├── workers/              # Celery document processing task
│   │   └── main.py               # App factory
│   └── requirements.txt
├── frontend/
│   └── src/
│       ├── components/           # Reusable UI components
│       ├── context/              # React Auth Context
│       └── pages/                # Search, Assistant, Login pages
├── docs/                         # Technical documentation
└── docker-compose.yml

Quick Start

Prerequisites

Python 3.9+
Node.js 18+
Docker and Docker Compose
Groq API key from console.groq.com
Qdrant running locally on port 6333

Backend

# 1. Start infrastructure
docker compose up db rabbitmq -d

# 2. Set up Python environment
cd backend
python -m venv venv
# Windows: .\venv\Scripts\activate
# Linux/Mac: source venv/bin/activate
pip install -r requirements.txt

# 3. Configure environment
cp .env.example .env
# Edit .env — set GROQ_API_KEY, DATABASE_URL, RABBITMQ_URL

# 4. Run database migrations
alembic upgrade head

# 5. Start API server
uvicorn app.main:create_app --factory --host 0.0.0.0 --port 8000 --reload

# 6. Start Celery worker (separate terminal)
celery -A app.workers.celery_app worker --loglevel=info -Q jurisfind_documents

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173.

Environment Variables

Backend (`backend/.env`)

Variable	Description
`DATABASE_URL`	PostgreSQL connection string
`RABBITMQ_URL`	RabbitMQ connection string
`GROQ_API_KEY`	Required for LLM inference
`SECRET_KEY`	JWT signing key
`QDRANT_HOST`	Qdrant host (default: localhost)
`QDRANT_PORT`	Qdrant port (default: 6333)
`AZURE_STORAGE_CONNECTION_STRING`	Optional, enables Azure Blob Storage
`GROQ_MODEL`	LLM model name (default: llama-3.3-70b-versatile)

Frontend (`frontend/.env`)

Variable	Description
`VITE_API_BASE_URL`	Backend URL (default: http://localhost:8000)

Documentation

Detailed technical documentation is in the docs/ directory:

docs/01_overview.md — System overview and architecture diagram
docs/02_data_flows.md — End-to-end data flow for every user action
docs/03_agent.md — LangGraph agent design, nodes, and routing logic
docs/04_api_reference.md — All API endpoints with request/response details
docs/05_setup_and_installation.md — Local setup and deployment guide

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JurisFind

Table of Contents

Features

Architecture

Tech Stack

Project Structure

Quick Start

Prerequisites

Backend

Frontend

Environment Variables

Backend (`backend/.env`)

Frontend (`frontend/.env`)

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JurisFind

Table of Contents

Features

Architecture

Tech Stack

Project Structure

Quick Start

Prerequisites

Backend

Frontend

Environment Variables

Backend (backend/.env)

Frontend (frontend/.env)

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend (`backend/.env`)

Frontend (`frontend/.env`)

Packages