GitHub - alivaezii/ATLAS: ATLAS : Auditable Trust Layer for AI Systems, A Protocol Framework for Leakage-Resilient Machine Learning Evaluation

Overview

ATLAS is a research framework for leakage‑resilient machine learning evaluation.
It enforces strict information‑flow constraints so that validation and test data cannot influence model development, helping ensure reliable and reproducible performance estimates.

The framework formalizes a Split‑Before‑Fit protocol, provides automated leakage auditing, and introduces a quantitative Leakage Risk Score (LRS) for evaluation governance.

Repository Structure

ATLAS
│
├── README.md
│
├── data
│   ├── synthetic
│   ├── realworld
│   ├── higgs
│   ├── higgs_negative_control
│   └── audit
│
├── experiments
│
└── figures

Directory Description

data/ Contains all experiment outputs used in the manuscript, including synthetic experiments, real-world datasets, HIGGS benchmark results, and protocol audit artifacts.
data/synthetic/ Results from controlled synthetic experiments evaluating leakage behavior under different protocol conditions.
data/realworld/ Benchmark results on multiple real-world datasets demonstrating leakage pressure in practical settings.
data/higgs/ Large-scale experiments conducted on the HIGGS dataset used to evaluate robustness under realistic machine learning pipelines.
data/higgs_negative_control/ Negative-control experiments (label-shuffle) verifying that measured optimism gaps are not statistical artifacts.
data/audit/ ATLAS protocol audit logs, reproducibility metadata, and leakage risk diagnostics.
experiments/ Python scripts used to run the experiments and reproduce the results reported in the paper.
figures/ Figures included in the manuscript.

Key Components

1. Split‑Before‑Fit Protocol

Evaluation pipelines must follow:

Define train / validation / test splits before modeling
Fit all operators on train only
Use validation for model selection
Use the test set only once for final reporting

2. ALAV --- Automated Leakage Auditing Verifier

ALAV automatically audits pipeline artifacts and detects protocol violations.

Checks include:

split overlap detection
preprocessing scope verification
test‑reuse detection
duplicate leakage detection
temporal/group leakage checks
cache contamination checks

Output status:

PASS / WARN / FAIL

3. Leakage Risk Score (LRS)

ATLAS quantifies evaluation risk using a Leakage Risk Score (0--100).

Risk levels:

Score	Interpretation
0-19	Low
20-39	Medium
40-69	High
70-100	Critical

Computed using surrogate indicators:

Duplicate Overlap Rate (DOR)
Preprocessing Leakage Indicator (PLI)
Test‑Reuse Optimism Proxy (TOP)

Conceptual Pipeline

Data → Split → Train → Select → Evaluate

The evaluation stage is protected by the ATLAS trust layer, preventing information leakage from test data.

Example Usage

from atlas import Protocol, Auditor

protocol = Protocol()
protocol.split(data)

model = protocol.train(model, train_data)
protocol.select(model, validation_data)

results = protocol.evaluate(model, test_data)

Auditor.run(protocol)

Reproducibility Artifacts

ATLAS produces machine‑auditable artifacts such as:

data/audit/split_manifest.json
data/audit/operator_log.csv
data/audit/duplicate_report.csv
data/audit/alav_report.json

These allow independent verification of evaluation integrity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Repository Structure

Directory Description

Key Components

1. Split‑Before‑Fit Protocol

2. ALAV --- Automated Leakage Auditing Verifier

3. Leakage Risk Score (LRS)

Conceptual Pipeline

Example Usage

Reproducibility Artifacts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
experiments		experiments
figures		figures
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Overview

Repository Structure

Directory Description

Key Components

1. Split‑Before‑Fit Protocol

2. ALAV --- Automated Leakage Auditing Verifier

3. Leakage Risk Score (LRS)

Conceptual Pipeline

Example Usage

Reproducibility Artifacts

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages