Skip to content

SudNitro23/gliner-PII

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

GLiNER-PII Evaluation Platform

Local MVP for evaluating NVIDIA GLiNER-PII predictions against manually annotated PDF ground truth.

What This Repo Contains

  • backend/: FastAPI service for PDF extraction, model inference, taxonomy mapping, evaluation, and SQLite persistence.
  • frontend/: Next.js UI for uploading a PDF and ground-truth CSV, running an evaluation, and viewing metrics.
  • scripts/: local helper scripts.

This is intentionally local-first. Docker, AWS, S3, DynamoDB, and production deployment are future scope.

MVP Architecture

User
  -> Next.js frontend
  -> FastAPI backend
  -> PyMuPDF PDF text extraction
  -> GLiNER-PII prediction
  -> Nitro taxonomy mapping
  -> Ground truth CSV comparison
  -> SQLite evaluation record
  -> Next.js results view

The frontend is responsible for selecting a PDF, selecting a ground-truth CSV, choosing a model threshold, and displaying evaluation output.

The backend owns the evaluation pipeline. It extracts text from the PDF, runs GLiNER predictions, maps model labels into Nitro's PII taxonomy, loads the manually annotated CSV, computes metrics, and stores the run locally.

Project Flow

  1. Start the FastAPI service.
  2. Start the Next.js app.
  3. Upload one source PDF and one ground-truth CSV.
  4. The API extracts PDF text and runs the PII detector.
  5. Predictions are normalized into Nitro taxonomy labels.
  6. Predictions are compared with the CSV annotations.
  7. Precision, recall, F1, TP, FP, and FN results are returned to the UI.

Expected Ground Truth CSV

The evaluator accepts flexible column names, but the simplest format is:

text,label
"Jane Smith",PERSON_NAME
"jane@example.com",EMAIL_ADDRESS

Supported text columns include text, entity_text, value, span, and pii_text. Supported label columns include label, entity_type, type, taxonomy, and pii_type.

Local Setup

Copy the example environment file:

cp .env.example .env

Set GLINER_MODEL_PATH to the local Hugging Face model checkout. If this repo sits next to the model folder, that path will usually be something like:

/Users/you/projects/gliner-PII

Install and run the backend:

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

Install and run the frontend:

cd frontend
npm install
npm run dev

Then open:

http://localhost:3000

Current MVP Behavior

The backend will use the configured GLiNER model when the gliner package and model path are available. If the model cannot be loaded, the service falls back to a small regex-based development detector for emails, phone numbers, and SSN-like values so the UI and evaluation flow can still be tested.

Evaluation is currently exact normalized text + label matching. That keeps the MVP transparent and easy to inspect before adding span overlap, page-aware matching, or fuzzy matching.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors