TrustLens

Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.

Why TrustLens · Visual Evidence · How It Works · Architecture & Evolution · Quickstart · Project WriteUp

Your model has 92% accuracy. It's still not safe for deployment.

Standard evaluation stops at accuracy. Accuracy measures what went right. TrustLens measures what can go wrong — in production, on underrepresented subgroups, and at high confidence.

Why Traditional Evaluation Fails

You train a model. The test set reports 92% Accuracy and a 0.95 ROC-AUC. By all traditional metrics, it is ready to ship.

But behind those numbers, silent failures are lurking:

Overconfidence: The model is "90% sure" about its predictions, but it's only right 60% of the time.
Subgroup Collapse: The aggregate accuracy is 92%, but for a specific demographic, performance drops to 40%.
Latent Bleed: In the embedding space, the model cannot distinguish between critical classes, leading to unpredictable edge-case behavior.
Confidently Wrong: The model's most severe mistakes are made with >99% confidence, bypassing human-in-the-loop safety nets.

TrustLens vs. Traditional Metrics

Traditional Metrics	TrustLens Diagnostics	What It Tells You
Accuracy, F1, Precision	Calibration (ECE, Brier)	Does the model know when it's guessing?
Aggregate ROC-AUC	Fairness & Bias	Are minority groups experiencing higher failure rates?
Loss Curve	Latent Space Health	Are the internal embeddings stable and separated?
Manual Error Analysis	Failure Diagnostics	Are the errors concentrated at high confidence?
"Looks good to me"	Deployment Verdict	Is this model mathematically safe to deploy?

TrustLens surfaces all these hidden risks with a single, statistically grounded audit, outputting a machine-readable deployment verdict.

Visual Evidence: TrustLens in Action

TrustLens diagnostics are powered by visual evidence. We don't just give you a score; we show you exactly why a model is failing.

The Deployment Verdict _{What it is: The composite Trust Score.} _{Why it matters: Gives a CI/CD-ready grade.} _{Risk: Blocks shipping unsafe models.}	Calibration _{What it is: Reliability diagram.} _{Why it matters: Shows if the model is overconfident.} _{Risk: High-confidence wrong answers.}	Subgroup Fairness Gaps _{What it is: Error rates across demographics.} _{Why it matters: Uncovers hidden biases.} _{Risk: Regulatory failure and harm.}
Latent Space Health _{What it is: UMAP/t-SNE projection.} _{Why it matters: Visualizes class separability.} _{Risk: Feature collapse and instability.}	Equalized Odds Violations _{What it is: True vs False Positive Rates.} _{Why it matters: Ensures equitable outcomes.} _{Risk: Systemic discrimination.}	Failure Analysis _{What it is: Confidence distribution of errors.} _{Why it matters: Spots systemic failure modes.} _{Risk: Unpredictable production behavior.}

How TrustLens Works

TrustLens evaluates your model through four distinct diagnostic modules, combining the findings into a Trust Score (0–100).

Calibration Engine: Computes Expected Calibration Error (ECE) and Brier Score to detect confidence mismatch.
Fairness Engine: Evaluates Equalized Odds and Subgroup Performance gaps across sensitive features.
Representation Engine: Analyzes latent embedding separability (Silhouette, CKA) to ensure stable decision boundaries.
Decision Engine: Synthesizes the risks into a penalty-based Trust Score and a Ready / Blocked deployment verdict.

The Prediction Resolver Architecture

You don't need to write boilerplate to extract probabilities. TrustLens features a Prediction Resolver Architecture that automatically detects your framework and standardizes the output.

We natively support:

scikit-learn (ClassifierMixin estimators)
XGBoost (XGBClassifier, Booster)
LightGBM (LGBMClassifier, Booster)
CatBoost (CatBoostClassifier)

Scientific Validation

TrustLens is more than a visualization package—it is a statistically grounded diagnostic framework. We have systematically validated its behavior across 6 model architectures and multiple data corruption scenarios (noise, imbalance, bias).

Key Finding: TrustLens empirically decouples Accuracy from Trust, accurately flagging high-accuracy models that exhibit high reliability risks (the "Overconfidence Zone").

View the Model Zoo Benchmark

Community & Project Evolution

TrustLens is an actively evolving framework driven by robust engineering discussions and RFCs (Request for Comments). We treat evaluation as a first-class architectural problem.

Active Architectural Debates & Milestones:

RFC #145: Regression Trust Score — Proposing the scoring framework for regression models.
PR #147: Implements RFC #145 (Regression Trust Score) — Core engine execution for regression contexts.
PR #102: Centralize plotting style — Unifying visual identity across the framework.
PR #68: Fairness multi-feature support — Scaling bias detection across complex datasets.

The Evolution:

v0.1: MVP — Core metrics and visualizations.
v0.4: Framework-Agnostic Core — Native support for XGBoost, LightGBM, CatBoost.
v0.5 (Current): Regression Support, Model Zoo Benchmark, Multiclass Calibration.
v0.6: In Progress — Policy Profiles, TrustComparison, Deep Learning Backends.
v1.0: Planned — CI/CD enterprise integration and Web Dashboards.

Quickstart

Install TrustLens (use [full] for extended plotting and framework support):

pip install trustlens
pip install trustlens[full]

Run a one-line audit on a built-in dataset to see why high accuracy isn't the full story:

from trustlens import quick_analyze

quick_analyze(dataset="breast_cancer")

Or run a comprehensive audit on your own model:

from trustlens import analyze
from xgboost import XGBClassifier

model = XGBClassifier().fit(X_train, y_train)

# TrustLens auto-detects the XGBoost model and extracts probabilities
report = analyze(
    model=model,
    X=X_test,
    y_true=y_test,
    sensitive_features={"gender": gender_test}
)

# Render the rich HTML dashboard or visual plots
report.show()

# Gate your CI/CD pipeline
report.save("trust_report/")

Deep Dive Documentation

The README is just the tip of the iceberg. Explore the full TrustLens documentation site for methodology, API references, and architectural deep-dives:

🌐 Documentation Home
🏛️ Architecture Guide
📖 API Reference

🤝 Contributing

TrustLens is an open ecosystem. We welcome contributions—whether it's new diagnostic plugins, better visualizers, or core engine improvements.

→ Contributing Guide · Open an Issue

A massive thank you to our contributors:

Citation

@software{trustlens2026,
  author = {Shahid Ul Islam},
  title  = {TrustLens: Audit ML models beyond accuracy},
  year   = {2026},
  url    = {https://github.com/Khanz9664/TrustLens}
}

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github		.github
assets		assets
docs		docs
examples		examples
tests		tests
trustlens		trustlens
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TrustLens

Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.

Why Traditional Evaluation Fails

TrustLens vs. Traditional Metrics

Visual Evidence: TrustLens in Action

How TrustLens Works

The Prediction Resolver Architecture

Scientific Validation

Community & Project Evolution

Quickstart

Deep Dive Documentation

🤝 Contributing

Citation

About

Uh oh!

Releases 6

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TrustLens

Audit ML models beyond accuracy — calibration, fairness, latent health, and deployment verdicts.

Why Traditional Evaluation Fails

TrustLens vs. Traditional Metrics

Visual Evidence: TrustLens in Action

How TrustLens Works

The Prediction Resolver Architecture

Scientific Validation

Community & Project Evolution

Quickstart

Deep Dive Documentation

🤝 Contributing

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages