CLiP — Constant-Time File Lifetime & Lifecycle Prediction from File Creation Call Stacks

CLiP (Constant-time Lifecycle Prediction) predicts a file’s lifetime (scalar) or lifecycle (coarse I/O activity histogram over time) as soon as the file is created or first opened, using only the file creation/first-open call stack as context.

This repository contains the analysis notebooks used to reproduce/illustrate the evaluation results described in the paper “Context Matters: Constant-Time File Lifecycle Prediction from File Creation Call Stacks”.
The core idea is simple: the creation context (call stack) acts as a semantic signature of the file’s role in the application, enabling accurate predictions with constant-time inference via a compact per-application lookup table.

Key Idea

CLiP follows a two-phase workflow:

1) Offline Learning (Training)

During training runs, an LD_PRELOAD tracer intercepts POSIX I/O calls to:

capture the call stack at the first open()/create (creation context),
track subsequent I/O to extract the ground truth:
- Lifetime: time from first open/create to last access,
- Lifecycle histogram: #I/O events in coarse time bins after first open.

Per application, CLiP aggregates these observations into compact lookup tables:

Lifetime table: context -> median lifetime,
Lifecycle table: context -> mean histogram (bin-wise).

2) Online Prediction (Inference)

At runtime, CLiP only needs to instrument create/open calls:

hash the current creation call stack,
do a constant-time table lookup to return the predicted lifetime or lifecycle. If a context was never observed during training, CLiP falls back to a global default learned from training data.

What You’ll Find in This Repo

Lifetime Prediction.ipynb — Lifetime prediction analysis (Predicted/Real ratio, PDF/CDF plots, accuracy within ±5% / ±10%, etc.).
Lifetime Prediction-ML.ipynb — Additional/alternative modeling and analysis notebook (baseline exploration).

Note: This repo focuses on analysis and reproducibility notebooks. The paper details the tracing + lookup-table method and the evaluation protocol.

Evaluation Summary

CLiP is evaluated on three representative HPC workloads: Incompact3d, LAMMPS, and NAMD.
For lifetime prediction, the paper reports strong concentration of the Predicted/Real ratio around 1. For example, LAMMPS reaches ~95% of files within ±5% relative error and ~99–100% within ±10%.
For lifecycle prediction (binned I/O counts after first open), accuracy depends on application phases and is sensitive to bin boundaries (small time shifts can move I/O between adjacent bins).

How to Run

Prerequisites

Python 3.9+ (recommended)
Jupyter / JupyterLab
Typical scientific stack: numpy, pandas, matplotlib (and optionally scipy)

Example:

pip install numpy pandas matplotlib jupyter
jupyter lab

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Lifetime Prediction-ML.ipynb		Lifetime Prediction-ML.ipynb
Lifetime Prediction.ipynb		Lifetime Prediction.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLiP — Constant-Time File Lifetime & Lifecycle Prediction from File Creation Call Stacks

Key Idea

1) Offline Learning (Training)

2) Online Prediction (Inference)

What You’ll Find in This Repo

Evaluation Summary

How to Run

Prerequisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CLiP — Constant-Time File Lifetime & Lifecycle Prediction from File Creation Call Stacks

Key Idea

1) Offline Learning (Training)

2) Online Prediction (Inference)

What You’ll Find in This Repo

Evaluation Summary

How to Run

Prerequisites

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages