🏥 Hospital Audit Trained Model — Phi-3 LoRA Fine-tune

A fine-tuned clinical AI auditing model built on Microsoft Phi-3-mini-4k-instruct using LoRA (Low-Rank Adaptation) with 8-bit quantization. Given a JSON report of clinical ML model performance metrics, the model classifies the model's health and generates a human-readable explanation.

Overview

Clinical ML models deployed in hospital settings need continuous monitoring. This project automates that auditing process by fine-tuning Phi-3-mini to act as a clinical AI auditor — it reads a model's performance metrics (AUC, ECE, drift, label shift, etc.) and returns a structured audit verdict with a category and explanation.

Example output:

Category: Calibration Failure
Explanation: High calibration error (ECE 0.2781) despite reasonable discrimination (AUC 0.863).
             Recommend recalibration (Platt scaling / isotonic) and threshold review.

Project Structure

Hospital-Audit-Trained-Model/
│
├── data/
│   ├── raw/
│   │   └── audit_dataset_v2_5000.json       # Raw dataset (5000 samples)
│   ├── processed/
│   │   ├── train.jsonl                       # 80% training split
│   │   └── test.jsonl                        # 20% test split
│   └── templates/
│       ├── prompt_template.txt               # Prompt format template
│       └── response_template.txt             # Response format template
│
├── training/
│   ├── dataset_builder.py                    # Builds train/test JSONL from raw data
│   ├── lora_config.py                        # LoRA hyperparameter configuration
│   └── train.py                              # Main training script
│
├── inference/
│   ├── run_inference.py                      # Single sample inference
│   └── Metrics_Test.py                       # Batch evaluation on test set
│
├── models/
│   └── phi3-auditor-lora-8bit/              # Saved LoRA adapter + tokenizer
│       ├── adapter_config.json
│       ├── adapter_model.safetensors
│       ├── tokenizer files...
│       ├── checkpoint-500/
│       └── checkpoint-675/
│
└── merge_lora.py                             # Merges LoRA adapter into base model

How It Works

Input: A JSON object containing clinical ML model metrics (AUC, accuracy, precision, recall, F1, ECE, Brier score, drift, missing rate, label shift, etc.)
Prompt: The metrics are injected into a structured prompt with a system instruction for the auditor role.
Output: The model returns a Category: classification and an Explanation: with actionable recommendations.

Prompt format:

<|system|>
You are an AI auditor analyzing clinical model performance reports.
<|user|>
Instruction: Analyze the clinical model report and classify its health.

Report:
{ ...metrics JSON... }
<|assistant|>

Dataset

Source: audit_dataset_v2_5000.json — a synthetic dataset of 5,000 clinical model audit reports generated on 2025-11-17.
Fields per record: metrics (JSON object), audit_label (category string), explanation (natural language justification).
Split: 80/20 train/test using sklearn.model_selection.train_test_split with random_state=42.
Processed format: JSONL with three fields — instruction, input (metrics JSON), output (category + explanation).

To rebuild the processed dataset:

cd training
python dataset_builder.py

Model Details

Property	Value
Base Model	`microsoft/Phi-3-mini-4k-instruct`
Fine-tuning Method	LoRA (PEFT)
Training Quantization	8-bit (`BitsAndBytesConfig`)
Merged Model Precision	FP16 (F16 safetensors)
Model Size	~4B parameters
Model Weight Files	2 shards (`model-00001-of-00002.safetensors` + `model-00002-of-00002.safetensors`)
Total Repo Size	7.65 GB
PEFT Version	0.18.0
LoRA Rank (`r`)	16
LoRA Alpha	32
LoRA Dropout	0.05
Target Modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`
Task Type	`CAUSAL_LM`
HuggingFace Model	PhantomAjusshi/phi3-auditor-merged

Training

Hyperparameters:

Parameter	Value
Epochs	3
Batch Size	4
Gradient Accumulation Steps	4
Effective Batch Size	16
Learning Rate	1e-4
Warmup Ratio	0.1
Max Sequence Length	512
Precision	FP16 (if CUDA available)
Checkpointing	Every 500 steps

Training loss progression:

Step	Epoch	Loss
50	0.22	1.6228
100	0.44	0.6565
150	0.67	0.4436
500	2.22	0.4109
675	3.00	~0.410

To train from scratch:

cd training
python train.py

Note: Requires a CUDA-enabled GPU with at least 8GB VRAM for 8-bit training.

Inference

Run a single inference using the merged model from HuggingFace:

python inference/run_inference.py

The script loads the model directly from HuggingFace at PhantomAjusshi/phi3-auditor-merged. You can also use it programmatically:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_PATH = "PhantomAjusshi/phi3-auditor-merged"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto" if torch.cuda.is_available() else None,
    trust_remote_code=True   # required for Phi-3 custom modeling code
)

report = """{
  "auc": 0.863,
  "accuracy": 0.83,
  "precision": 0.79,
  "recall": 0.69,
  "f1": 0.79,
  "ece": 0.278,
  "brier": 0.263,
  "drift": 0.03,
  "missing_rate": 0.003,
  "label_shift": 0.06,
  "pos_rate": 0.10,
  "data_integrity_issues": 0
}"""

prompt = (
    f"<|system|> You are a clinical AI auditor model.\n"
    f"<|user|>\nInstruction: Analyze the clinical model report and classify its health.\n\nReport:\n{report}\n"
    f"<|assistant|>\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
    outputs = model.generate(**inputs, max_new_tokens=400, temperature=0.7, top_p=0.9, repetition_penalty=1.2)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: trust_remote_code=True is required because the HuggingFace repo includes custom Phi-3 modeling files (modeling_phi3.py, configuration_phi3.py).

Generation settings: max_new_tokens=400, temperature=0.7, top_p=0.9, repetition_penalty=1.2

Evaluation

Run batch evaluation on the test set to get classification metrics:

python inference/Metrics_Test.py

This generates predictions for all records in data/processed/test.jsonl, computes weighted precision, recall, F1, and accuracy against ground-truth Category: labels, and saves full results to inference_results.json.

Merging LoRA Weights

To merge the LoRA adapter into the base model for standalone deployment (no PEFT dependency at inference time):

python merge_lora.py

Output is saved to models/phi3-auditor-merged/. The merged model is published on HuggingFace as PhantomAjusshi/phi3-auditor-merged and consists of:

model-00001-of-00002.safetensors (4.97 GB)
model-00002-of-00002.safetensors (2.67 GB)
modeling_phi3.py + configuration_phi3.py (custom Phi-3 code, required for trust_remote_code=True)
Tokenizer files (tokenizer.json, tokenizer.model, tokenizer_config.json, etc.)

Note: Large model files (.safetensors, .bin, .pt) are excluded from this repository via .gitignore. Use the HuggingFace hosted weights directly or retrain and merge locally.

Requirements

pip install torch transformers peft bitsandbytes datasets scikit-learn accelerate

Package	Role
`transformers`	Base model loading, tokenizer, training
`peft`	LoRA adapter configuration and application
`bitsandbytes`	8-bit quantization
`datasets`	JSONL dataset loading
`scikit-learn`	Train/test split and evaluation metrics
`accelerate`	Device mapping and mixed precision

License

This project is for research and educational purposes. The base model is subject to Microsoft's Phi-3 license.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
inference		inference
models/phi3-auditor-lora-8bit		models/phi3-auditor-lora-8bit
training		training
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
merge_lora.py		merge_lora.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏥 Hospital Audit Trained Model — Phi-3 LoRA Fine-tune

Table of Contents

Overview

Project Structure

How It Works

Dataset

Model Details

Training

Inference

Evaluation

Merging LoRA Weights

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏥 Hospital Audit Trained Model — Phi-3 LoRA Fine-tune

Table of Contents

Overview

Project Structure

How It Works

Dataset

Model Details

Training

Inference

Evaluation

Merging LoRA Weights

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages