Skip to content

alivaezii/ATLAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

3113f537-dadf-4ecc-ae0e-23178e17a761

License Python ML Evaluation Reproducibility Auditability Trustworthy AI

Overview

ATLAS is a research framework for leakage‑resilient machine learning evaluation.
It enforces strict information‑flow constraints so that validation and test data cannot influence model development, helping ensure reliable and reproducible performance estimates.

The framework formalizes a Split‑Before‑Fit protocol, provides automated leakage auditing, and introduces a quantitative Leakage Risk Score (LRS) for evaluation governance.

Repository Structure

ATLAS
│
├── README.md
│
├── data
│   ├── synthetic
│   ├── realworld
│   ├── higgs
│   ├── higgs_negative_control
│   └── audit
│
├── experiments
│
└── figures

Directory Description

  • data/ Contains all experiment outputs used in the manuscript, including synthetic experiments, real-world datasets, HIGGS benchmark results, and protocol audit artifacts.

  • data/synthetic/ Results from controlled synthetic experiments evaluating leakage behavior under different protocol conditions.

  • data/realworld/ Benchmark results on multiple real-world datasets demonstrating leakage pressure in practical settings.

  • data/higgs/ Large-scale experiments conducted on the HIGGS dataset used to evaluate robustness under realistic machine learning pipelines.

  • data/higgs_negative_control/ Negative-control experiments (label-shuffle) verifying that measured optimism gaps are not statistical artifacts.

  • data/audit/ ATLAS protocol audit logs, reproducibility metadata, and leakage risk diagnostics.

  • experiments/ Python scripts used to run the experiments and reproduce the results reported in the paper.

  • figures/ Figures included in the manuscript.

Key Components

1. Split‑Before‑Fit Protocol

Evaluation pipelines must follow:

  1. Define train / validation / test splits before modeling
  2. Fit all operators on train only
  3. Use validation for model selection
  4. Use the test set only once for final reporting

2. ALAV --- Automated Leakage Auditing Verifier

ALAV automatically audits pipeline artifacts and detects protocol violations.

Checks include:

  • split overlap detection
  • preprocessing scope verification
  • test‑reuse detection
  • duplicate leakage detection
  • temporal/group leakage checks
  • cache contamination checks

Output status:

PASS / WARN / FAIL

3. Leakage Risk Score (LRS)

ATLAS quantifies evaluation risk using a Leakage Risk Score (0--100).

Risk levels:

Score Interpretation
0-19 Low
20-39 Medium
40-69 High
70-100 Critical

Computed using surrogate indicators:

  • Duplicate Overlap Rate (DOR)
  • Preprocessing Leakage Indicator (PLI)
  • Test‑Reuse Optimism Proxy (TOP)

Conceptual Pipeline

Data → Split → Train → Select → Evaluate

The evaluation stage is protected by the ATLAS trust layer, preventing information leakage from test data.

Example Usage

from atlas import Protocol, Auditor

protocol = Protocol()
protocol.split(data)

model = protocol.train(model, train_data)
protocol.select(model, validation_data)

results = protocol.evaluate(model, test_data)

Auditor.run(protocol)

Reproducibility Artifacts

ATLAS produces machine‑auditable artifacts such as:

data/audit/split_manifest.json
data/audit/operator_log.csv
data/audit/duplicate_report.csv
data/audit/alav_report.json

These allow independent verification of evaluation integrity.

About

ATLAS : Auditable Trust Layer for AI Systems, A Protocol Framework for Leakage-Resilient Machine Learning Evaluation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages