Skip to content

Conversation

@dyra-12
Copy link

@dyra-12 dyra-12 commented Jan 26, 2026

Summary

This PR adds a new evaluation metric suite, human_ai_trust, to Hugging Face Evaluate.

The goal of this metric is to support human-centered evaluation of AI systems by operationalizing:

  • trust calibration
  • belief updating
  • uncertainty sensitivity
  • asymmetric harm from overconfident errors
  • explanation–confidence alignment

Unlike traditional metrics focused solely on predictive performance, this metric surfaces how users interpret, trust, and act on model outputs under uncertainty.


What's Included

This PR introduces:

  • A new metric module: human_ai_trust
  • Core metrics:
    • Expected Trust Error (ETE) — mean trust-confidence mismatch
    • Trust Sensitivity Index (TSI) — correlation between confidence and trust
    • Belief Shift Magnitude (BSM) — mean posterior-prior difference
    • Overconfidence Penalty (OCP) — weighted cost of confident errors
    • Normalized Overconfidence Penalty — bounded [0,1] variant
    • Explanation-Confidence Alignment (ECA) — correlation between complexity and confidence
  • Full unit test coverage
  • Comprehensive documentation
  • An example notebook demonstrating real usage
  • A companion reference dataset:

Motivation

Accuracy alone is insufficient for evaluating AI systems in high-stakes or vulnerable contexts.

From a human–AI interaction perspective:

  • a confident but wrong prediction is more damaging than a hesitant error
  • users may over-trust highly confident systems (automation bias)
  • users may ignore uncertainty signals
  • beliefs may shift even when the model is wrong
  • explanation style may distort trust

This metric suite provides theory-grounded, computational signals for evaluating these human-centered failure modes.


Design Philosophy

This metric is:

  • Human-centered: grounded in cognitive science and HCI
  • Descriptive, not causal: intended for evaluation, not inference
  • Modular and composable: fits Hugging Face's metric API
  • NaN-safe and edge-case robust
  • Domain-agnostic: applicable across classification, QA, NLI, and generation

Related Work

This metric complements existing evaluation approaches:

  • Traditional metrics (accuracy, F1, AUROC): measure predictive performance
  • Calibration metrics (ECE, Brier score): measure probability alignment
  • Human-AI trust metrics (this PR): measure human interpretation and decision-making under uncertainty

Key differences:

  • Focuses on human beliefs rather than just model probabilities
  • Captures asymmetric harm from overconfident errors
  • Evaluates explanation-confidence alignment

Usage Example

import evaluate

metric = evaluate.load("human_ai_trust")

out = metric.compute(
    predictions=[1, 0, 1],
    references=[1, 1, 0],
    confidences=[0.9, 0.7, 0.8],
    human_trust_scores=[0.85, 0.6, 0.75],
    belief_priors=[0.3, 0.4, 0.5],
    belief_posteriors=[0.6, 0.5, 0.7],
    explanation_complexity=[10, 20, 15],
)

print(out)

Testing

All metrics are fully unit-tested with 100% code coverage, including:

  • Zero-variance inputs
  • Missing optional inputs
  • Constant confidence values
  • All-correct and all-wrong predictions
  • Edge cases (empty arrays, single samples, etc.)

Run locally:

pytest metrics/human_ai_trust --cov=metrics/human_ai_trust

Companion Dataset

A small, theory-grounded demo dataset is available at:

https://huggingface.co/datasets/Dyra1204/human_ai_trust_demo

It demonstrates:

  • trust calibration
  • belief updating
  • uncertainty communication
  • explanation–confidence alignment

Limitations

These metrics are intended as descriptive human-centered evaluation signals.

They do not:

  • replace task performance metrics
  • infer causal effects
  • evaluate explanation faithfulness
  • measure fairness or bias
  • substitute for real human-subjects data

Checklist

  • New metric module added
  • Full unit test coverage
  • Documentation (README.md with formulas and examples)
  • Example notebook
  • Companion dataset published
  • NaN-safe and edge-case handling
  • Follows Hugging Face Evaluate API conventions
  • Maintainer review

Feedback Welcome

I would very much welcome feedback on:

  • API design
  • metric naming
  • default behaviors
  • edge-case handling
  • documentation clarity

I'm happy to adjust the implementation to better align with Hugging Face Evaluate conventions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant