Skip to content

hocinemahni/CLip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

CLiP — Constant-Time File Lifetime & Lifecycle Prediction from File Creation Call Stacks

CLiP (Constant-time Lifecycle Prediction) predicts a file’s lifetime (scalar) or lifecycle (coarse I/O activity histogram over time) as soon as the file is created or first opened, using only the file creation/first-open call stack as context.

This repository contains the analysis notebooks used to reproduce/illustrate the evaluation results described in the paper “Context Matters: Constant-Time File Lifecycle Prediction from File Creation Call Stacks”.
The core idea is simple: the creation context (call stack) acts as a semantic signature of the file’s role in the application, enabling accurate predictions with constant-time inference via a compact per-application lookup table.


Key Idea

CLiP follows a two-phase workflow:

1) Offline Learning (Training)

During training runs, an LD_PRELOAD tracer intercepts POSIX I/O calls to:

  • capture the call stack at the first open()/create (creation context),
  • track subsequent I/O to extract the ground truth:
    • Lifetime: time from first open/create to last access,
    • Lifecycle histogram: #I/O events in coarse time bins after first open.

Per application, CLiP aggregates these observations into compact lookup tables:

  • Lifetime table: context -> median lifetime,
  • Lifecycle table: context -> mean histogram (bin-wise).

2) Online Prediction (Inference)

At runtime, CLiP only needs to instrument create/open calls:

  • hash the current creation call stack,
  • do a constant-time table lookup to return the predicted lifetime or lifecycle. If a context was never observed during training, CLiP falls back to a global default learned from training data.

What You’ll Find in This Repo

  • Lifetime Prediction.ipynb — Lifetime prediction analysis (Predicted/Real ratio, PDF/CDF plots, accuracy within ±5% / ±10%, etc.).
  • Lifetime Prediction-ML.ipynb — Additional/alternative modeling and analysis notebook (baseline exploration).

Note: This repo focuses on analysis and reproducibility notebooks. The paper details the tracing + lookup-table method and the evaluation protocol.


Evaluation Summary

CLiP is evaluated on three representative HPC workloads: Incompact3d, LAMMPS, and NAMD.
For lifetime prediction, the paper reports strong concentration of the Predicted/Real ratio around 1. For example, LAMMPS reaches ~95% of files within ±5% relative error and ~99–100% within ±10%.
For lifecycle prediction (binned I/O counts after first open), accuracy depends on application phases and is sensitive to bin boundaries (small time shifts can move I/O between adjacent bins).


How to Run

Prerequisites

  • Python 3.9+ (recommended)
  • Jupyter / JupyterLab
  • Typical scientific stack: numpy, pandas, matplotlib (and optionally scipy)

Example:

pip install numpy pandas matplotlib jupyter
jupyter lab

About

Constant-time prediction of file lifetime and I/O lifecycle at create/open time using file creation call stacks (CLiP) — HPC storage optimization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors