Add human-centered trust & uncertainty metrics to Hugging Face Evaluate #728

dyra-12 · 2026-01-26T16:56:20Z

Summary

This PR adds a new evaluation metric suite, human_ai_trust, to Hugging Face Evaluate.

The goal of this metric is to support human-centered evaluation of AI systems by operationalizing:

trust calibration
belief updating
uncertainty sensitivity
asymmetric harm from overconfident errors
explanation–confidence alignment

Unlike traditional metrics focused solely on predictive performance, this metric surfaces how users interpret, trust, and act on model outputs under uncertainty.

What's Included

This PR introduces:

A new metric module: human_ai_trust
Core metrics:
- Expected Trust Error (ETE) — mean trust-confidence mismatch
- Trust Sensitivity Index (TSI) — correlation between confidence and trust
- Belief Shift Magnitude (BSM) — mean posterior-prior difference
- Overconfidence Penalty (OCP) — weighted cost of confident errors
- Normalized Overconfidence Penalty — bounded [0,1] variant
- Explanation-Confidence Alignment (ECA) — correlation between complexity and confidence
Full unit test coverage
Comprehensive documentation
An example notebook demonstrating real usage
A companion reference dataset:
- https://huggingface.co/datasets/Dyra1204/human_ai_trust_demo

Motivation

Accuracy alone is insufficient for evaluating AI systems in high-stakes or vulnerable contexts.

From a human–AI interaction perspective:

a confident but wrong prediction is more damaging than a hesitant error
users may over-trust highly confident systems (automation bias)
users may ignore uncertainty signals
beliefs may shift even when the model is wrong
explanation style may distort trust

This metric suite provides theory-grounded, computational signals for evaluating these human-centered failure modes.

Design Philosophy

This metric is:

Human-centered: grounded in cognitive science and HCI
Descriptive, not causal: intended for evaluation, not inference
Modular and composable: fits Hugging Face's metric API
NaN-safe and edge-case robust
Domain-agnostic: applicable across classification, QA, NLI, and generation

Related Work

This metric complements existing evaluation approaches:

Traditional metrics (accuracy, F1, AUROC): measure predictive performance
Calibration metrics (ECE, Brier score): measure probability alignment
Human-AI trust metrics (this PR): measure human interpretation and decision-making under uncertainty

Key differences:

Focuses on human beliefs rather than just model probabilities
Captures asymmetric harm from overconfident errors
Evaluates explanation-confidence alignment

Usage Example

import evaluate

metric = evaluate.load("human_ai_trust")

out = metric.compute(
    predictions=[1, 0, 1],
    references=[1, 1, 0],
    confidences=[0.9, 0.7, 0.8],
    human_trust_scores=[0.85, 0.6, 0.75],
    belief_priors=[0.3, 0.4, 0.5],
    belief_posteriors=[0.6, 0.5, 0.7],
    explanation_complexity=[10, 20, 15],
)

print(out)

Testing

All metrics are fully unit-tested with 100% code coverage, including:

Zero-variance inputs
Missing optional inputs
Constant confidence values
All-correct and all-wrong predictions
Edge cases (empty arrays, single samples, etc.)

Run locally:

pytest metrics/human_ai_trust --cov=metrics/human_ai_trust

Companion Dataset

A small, theory-grounded demo dataset is available at:

https://huggingface.co/datasets/Dyra1204/human_ai_trust_demo

It demonstrates:

trust calibration
belief updating
uncertainty communication
explanation–confidence alignment

Limitations

These metrics are intended as descriptive human-centered evaluation signals.

They do not:

replace task performance metrics
infer causal effects
evaluate explanation faithfulness
measure fairness or bias
substitute for real human-subjects data

Checklist

New metric module added
Full unit test coverage
Documentation (README.md with formulas and examples)
Example notebook
Companion dataset published
NaN-safe and edge-case handling
Follows Hugging Face Evaluate API conventions
Maintainer review

Feedback Welcome

I would very much welcome feedback on:

API design
metric naming
default behaviors
edge-case handling
documentation clarity

I'm happy to adjust the implementation to better align with Hugging Face Evaluate conventions.

Updated README to fix formatting and enhance clarity.

dyra-12 added 8 commits January 20, 2026 17:15

Add human-centered trust and uncertainty metrics (core metrics)

b7ed04c

Add human-centered trust & uncertainty metrics to Hugging Face Evaluate

4c5df1f

Fix formatting issues in README.md

dc5a356

Updated README to fix formatting and enhance clarity.

Add human-centered trust & uncertainty metrics to Hugging Face Evaluate

d4138c1

Add documentation for human_ai_trust metric

5467df1

Add example notebook for human-centered trust evaluation

2638b1a

removed file

d3af9c6

Apply changes as per latest updates

a95219a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add human-centered trust & uncertainty metrics to Hugging Face Evaluate #728

Add human-centered trust & uncertainty metrics to Hugging Face Evaluate #728

Uh oh!

dyra-12 commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add human-centered trust & uncertainty metrics to Hugging Face Evaluate #728

Are you sure you want to change the base?

Add human-centered trust & uncertainty metrics to Hugging Face Evaluate #728

Uh oh!

Conversation

dyra-12 commented Jan 26, 2026

Summary

What's Included

Motivation

Design Philosophy

Related Work

Usage Example

Testing

Companion Dataset

Limitations

Checklist

Feedback Welcome

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant