A self-hosted web platform for exploring ~751K Israeli Supreme Court documents. Provides full-text search, corpus statistics with visualizations, on-demand LLM-powered NER analysis, a REST API, and an admin backend.
Built as a deliverable of an Israel Innovation Authority research project (grants #78560, #78561), conducted by the Hebrew University of Jerusalem and Tel Aviv University.
- Full-text search across 733K+ documents using SQLite FTS5
- Faceted filtering by year, document type, legal division, judge, lawyer, party
- Document viewer with proper RTL Hebrew text rendering
- Case view with document timeline and participants (judges, lawyers, parties)
- Statistics dashboard — interactive charts (documents by year, type distribution, top judges/lawyers, technical ratio)
- NER analysis — on-demand LLM-powered Named Entity Recognition with inline highlighting
- REST API with API key authentication and rate limiting
- Admin backend — dashboard, API key management, data import, LLM configuration
- Backend: Python 3.11+, FastAPI, SQLAlchemy, SQLite + FTS5
- Frontend: React, TypeScript, Tailwind CSS, Recharts
- Package management: uv (Python), npm (frontend)
- LLM integration: LiteLLM (supports Claude, GPT-4, Gemini, Ollama, etc.)
- Deployment: Docker Compose
# Python dependencies
uv sync
# Frontend dependencies
cd frontend && npm install && cd ..cp .env.example .envEdit .env as needed. The defaults work for local development (except LLM features which require an API key).
The dataset is a 1.5 GB Parquet file. If you have it locally at docs/cases_all.parquet:
uv run python backend/import_dataset.py --db data/court.db --source docs/cases_all.parquetOr download directly from HuggingFace (~5 GB):
uv run python backend/import_dataset.py --db data/court.dbImport takes ~5 minutes and produces a ~5 GB SQLite database with FTS5 indexes.
cd frontend && npm run build && cd ..uv run uvicorn backend.app.main:app --host 0.0.0.0 --port 8018Open http://localhost:8000 in your browser.
For development with hot reload, run backend and frontend separately:
# Terminal 1 — Backend (auto-reloads on Python changes)
uv run uvicorn backend.app.main:app --reload --port 8000
# Terminal 2 — Frontend dev server (hot reload, proxies API to backend)
cd frontend && npm run devThen open http://localhost:5173 (the Vite dev server).
cp .env.example .env
# Edit .env as needed
# Import dataset (one-time)
docker compose run --rm app uv run python backend/import_dataset.py --db /data/court.db
# Start the platform
docker compose up -dThe database is persisted in ./data/ via volume mount.
All configuration is via environment variables (or .env file):
| Variable | Default | Description |
|---|---|---|
DATABASE_PATH |
./data/court.db |
Path to SQLite database |
ADMIN_USERNAME |
admin |
Admin login username |
ADMIN_PASSWORD |
changeme |
Admin login password |
LLM_MODEL |
claude-sonnet-4-20250514 |
LiteLLM model for NER |
LLM_API_KEY |
— | API key for the LLM provider |
LLM_MAX_TOKENS |
4096 |
Max output tokens for NER |
LLM_TEMPERATURE |
0.0 |
LLM temperature |
LLM_MAX_INPUT_TOKENS |
8000 |
Max document tokens sent to LLM |
NER_RATE_LIMIT_PER_HOUR |
10 |
NER requests per hour per IP |
API_RATE_LIMIT_PER_HOUR |
100 |
API requests per hour per key |
HOST |
0.0.0.0 |
Server bind host |
PORT |
8000 |
Server bind port |
Interactive API documentation is available at http://localhost:8000/docs (Swagger UI).
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/documents |
Search/list documents (FTS5, filterable, paginated) |
GET |
/api/v1/documents/{hash} |
Get document by SHA-256 hash |
GET |
/api/v1/documents/{hash}/text |
Get raw document text |
POST |
/api/v1/documents/{hash}/ner |
Run NER analysis |
GET |
/api/v1/cases |
Search/list cases |
GET |
/api/v1/cases/{case_desc} |
Get case with documents and participants |
GET |
/api/v1/stats/overview |
Corpus-level statistics |
GET |
/api/v1/stats/by-year |
Document counts by year |
GET |
/api/v1/stats/judges |
Top judges by case count |
GET |
/api/v1/stats/lawyers |
Top lawyers by appearance count |
| Parameter | Type | Description |
|---|---|---|
q |
string | Full-text search query |
year_from / year_to |
int | Year range filter |
type |
string | Document type (החלטה or פסק-דין) |
division |
string | Legal division |
judge / lawyer / party |
string | Name filter (partial match) |
technical |
bool | Technical documents only |
page / per_page |
int | Pagination (default: 1 / 20, max per_page: 100) |
sort |
string | Sort by: date, year, pages, relevance |
website/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entry point
│ │ ├── config.py # Settings from env vars
│ │ ├── database.py # SQLite connection
│ │ ├── models.py # SQLAlchemy ORM models
│ │ ├── schemas.py # Pydantic request/response models
│ │ ├── auth.py # Admin session + API key auth
│ │ ├── routers/
│ │ │ ├── cases.py # /api/v1/cases
│ │ │ ├── documents.py # /api/v1/documents
│ │ │ ├── stats.py # /api/v1/stats/*
│ │ │ ├── ner.py # /api/v1/documents/{hash}/ner
│ │ │ └── admin.py # /admin/api/*
│ │ └── services/
│ │ ├── search.py # FTS5 search logic
│ │ └── ner_service.py # LiteLLM NER integration
│ └── import_dataset.py # ETL: Parquet → SQLite
├── frontend/
│ └── src/
│ ├── pages/ # Route-level components
│ ├── components/ # Reusable UI components
│ └── api/ # API client + types
├── data/ # SQLite database (created by import)
├── pyproject.toml
├── Dockerfile
└── docker-compose.yml
- Dataset: LevMuchnik/SupremeCourtOfIsrael on HuggingFace
- Legal-HeBERT: github.com/avichaychriqui/Legal-HeBERT
Funded by the Israel Innovation Authority (grants #78560, #78561) under the "Kamin" track for applied academic research. Conducted by the Hebrew University of Jerusalem (Prof. Lev Muchnik) and Tel Aviv University (Dr. Inbal Yahav Shenberger).