A full-stack web application that extracts key clauses from legal contracts and analyzes them against predefined legal playbooks. The system supports PDF and image uploads, performs OCR for scanned files, detects important clause types, assigns confidence scores, and flags risky or missing terms.
Built as a collaborative academic project, with major contributions in backend integration, clause detection, and deployment.
-
Upload contract files in PDF, PNG, JPG, JPEG formats
-
Text extraction using:
- Native PDF parsing for text-based PDFs
- OCR for scanned PDFs and images
-
Automatic clause detection for common legal clauses such as:
- Confidentiality
- Termination
- Liability Cap
- Payment Terms
- IP Ownership
- Governing Law
- Jurisdiction
- Indemnity
-
Confidence scoring for detected clauses
-
Red-flag detection for risky language
-
Missing clause identification
-
Clean React interface for upload and results display
- React
- JavaScript
- HTML
- CSS
- Python
- Flask
- REST API
- Regex-based clause extraction
- PyPDF2
- Tesseract OCR
- pdf2image
- Pillow
NLP_PROJECT/ ├── backend/ │ ├── app.py │ ├── requirements.txt │ └── uploads/
├── frontend/ │ ├── package.json │ ├── public/ │ └── src/ │ ├── components/ │ ├── App.js │ └── App.css
├── README.md └── Project_Report.pdf
- Upload a contract document
- Extract raw text using parser / OCR
- Segment content into logical sections
- Detect clauses using NLP patterns
- Score confidence and compliance
- Display results in dashboard UI
- Played a major role in backend integration between the React frontend and Flask API
- Improved clause detection logic for identifying risky and missing legal terms
- Enhanced PDF / OCR text extraction workflow for better scanned document handling
- Refined frontend UI for upload flow and results presentation
- Contributed to testing, debugging, deployment, and final documentation
cd backend python -m venv venv source venv/bin/activate pip install -r requirements.txt python app.py
Runs on: http://localhost:5000
cd frontend npm install npm start
Runs on: http://localhost:3000
- Open the frontend in browser
- Upload a PDF or image contract
- Click analyze
- Review extracted clauses, risks, and scores
- Transformer / BERT-based clause classification
- Better legal risk scoring
- Export reports as PDF / CSV
- Multi-document comparison
- Admin dashboard
- ✅ End-to-end pipeline working
- ✅ Supports multiple file formats
- ✅ Ready for demonstration
- 🔜 Planned ML upgrades