Skip to content

Latest commit

 

History

History
135 lines (109 loc) · 5.01 KB

File metadata and controls

135 lines (109 loc) · 5.01 KB

CLAUDE.md — Contract Analytics & Benchmarking Engine

Project Summary

Contract analytics tool that ingests executed contracts (PDF/Word), uses Claude AI to extract key commercial terms (liability caps, indemnity scope, payment terms, warranty periods, termination provisions, SLA commitments), and provides analytical views showing distributions, trends, counterparty comparisons, and deviation from playbook targets.

Target users: Non-technical legal professionals (in-house counsel, legal ops, paralegals).

Standards Reference

This project follows the patterns in PORTFOLIO_STANDARDS.md. Key points:

  • API-first architecture: FastAPI backend, Streamlit thin client
  • Three-tier access: hosted URL, ./start.sh, Docker
  • README leads with non-technical users
  • MIT License: Copyright (c) 2026 Noam Raz and Pleasant Secret Labs
  • All sample data must be synthetic — never real company data

Tech Stack

  • Python: 3.11+
  • Backend: FastAPI + Uvicorn
  • Database: SQLAlchemy 2.0 (mapped_column, DeclarativeBase) with SQLite (designed for PostgreSQL migration)
  • Validation: Pydantic v2
  • LLM: Anthropic API (Claude) for contract term extraction
  • Document parsing: python-docx (Word), PyMuPDF/pdfplumber (PDF)
  • Frontend: Streamlit (calls API via HTTP)
  • Data visualization: Plotly
  • Testing: pytest
  • Package management: pyproject.toml (PEP 621)
  • Deployment: Docker + Railway

Architecture

Streamlit UI  →  FastAPI API  →  SQLAlchemy/SQLite
                     ↓
              Anthropic Claude API
              (term extraction)
  • FastAPI handles all business logic; Streamlit is a thin HTTP client
  • The API is independently usable (Swagger docs, curl, integrations)
  • DB sessions via FastAPI dependency injection

Project Structure (Target)

contract-analytics/
├── src/
│   └── contract_analytics/
│       ├── __init__.py
│       ├── main.py              # FastAPI app entry point
│       ├── config.py            # Settings (env vars, defaults)
│       ├── database.py          # SQLAlchemy engine, session, Base
│       ├── models/              # SQLAlchemy ORM models
│       ├── schemas/             # Pydantic request/response schemas
│       ├── api/                 # FastAPI routers
│       │   └── v1/
│       ├── services/            # Business logic layer
│       │   ├── extraction.py    # Claude-powered term extraction
│       │   ├── parsing.py       # PDF/Word document parsing
│       │   └── analytics.py     # Aggregation and analysis
│       └── utils/               # Shared utilities
├── ca_frontend/
│   └── app.py                   # Streamlit application
├── tests/
├── data/
│   └── sample/                  # Synthetic sample contracts
├── CLAUDE.md
├── PROJECT.md
├── PORTFOLIO_STANDARDS.md
├── README.md
├── LICENSE
├── pyproject.toml
├── start.sh
├── start.bat
├── Dockerfile
├── Dockerfile.railway
├── docker-compose.yml
├── railway.json
└── railway_start.sh

Coding Conventions

  • Type hints on all function signatures
  • Pydantic v2 for all request/response validation
  • SQLAlchemy 2.0 style (mapped_column, DeclarativeBase)
  • FastAPI dependency injection for DB sessions
  • Keep modules small and focused
  • Minimum complexity for the current task — don't over-engineer

Dependency Classification

Core [project.dependencies] — required at runtime (installed in Docker via pip install .):

  • fastapi, uvicorn, sqlalchemy, pydantic, anthropic
  • python-docx, pymupdf (or pdfplumber)
  • streamlit, requests, plotly
  • python-multipart (file uploads)

Dev [project.optional-dependencies] — testing/linting only:

  • pytest, pytest-asyncio, httpx, ruff

Rule: If Docker would crash without it, it's a core dependency.

Security Considerations

  • File uploads: Validate file types (PDF/DOCX only), enforce size limits, sanitize filenames, store uploads outside web root
  • API keys: Anthropic API key via environment variable only — never hardcoded or committed
  • SQL injection: Use SQLAlchemy ORM/parameterized queries exclusively — no raw SQL string interpolation
  • Input validation: Pydantic models validate all API inputs
  • Path traversal: Sanitize any user-provided file paths; never pass raw user input to file system operations
  • CORS: Configure restrictively for production
  • Rate limiting: Consider for extraction endpoints (LLM API cost control)
  • Data isolation: Uploaded contracts may contain sensitive data — ensure no cross-user data leakage in multi-tenant scenarios
  • .env files: Never committed; .gitignore already excludes them

Commands

# Run API server
uvicorn src.contract_analytics.main:app --reload --port 8000

# Run Streamlit frontend
streamlit run ca_frontend/app.py

# Run both (via startup script)
./start.sh

# Run tests
pytest

# Lint
ruff check .