RAMPART: Top-k Feature Importance Ranking

RAMPART (Ranked Attributions with MiniPatches And Recursive Trimming) is an efficient method for identifying the most important features in high-dimensional datasets.

This repository contains the official implementation from:

Top-k Feature Importance Ranking Eric Chen, Tiffany Tang, Genevera I. Allen Transactions on Machine Learning Research (TMLR), 2025 OpenReview

Overview

Traditional feature importance methods rank all features, which is computationally expensive and often unnecessary when you only need the top few. RAMPART uses:

Minipatch Ensembling (RAMP): Aggregates feature rankings across random subsamples of observations and features
Recursive Trimming (RAMPART): Iteratively eliminates bottom-ranked features using sequential halving

This approach is:

Efficient: Focuses computation on promising features
Scalable: Works with thousands of features
Flexible: Compatible with any feature importance model

Quick Start

from rampart import ramp, rampart
from rampart.models import RandomForestModel
import numpy as np

# Generate example data
X = np.random.randn(200, 100)
beta = np.zeros(100)
beta[:5] = [5, 4, 3, 2, 1]  # 5 signal features
y = X @ beta + np.random.randn(200)

# Find top-5 features using RAMPART
rankings = rampart(
    X, y,
    k=5,
    model_cls=RandomForestModel,
    n_minipatches=1000,
    n_obs=50,
    n_features=10
)

# Get indices of top-5 features
top_5 = np.argsort(rankings)[:5]
print(f"Top-5 features: {top_5}")

Installation

# Clone the repository
git clone https://github.com/DataSlingers/TopK.git
cd TopK

# Install dependencies
pip install -r requirements.txt

Repository Structure

TopK/
├── rampart/                  # Main package
│   ├── algorithms.py         # RAMP and RAMPART implementations
│   ├── models.py             # Regression models
│   └── classifiers.py        # Classification models
├── examples/                 # Example notebooks
│   ├── quickstart.ipynb      # Basic usage
│   └── custom_data_example.ipynb
├── simulations/              # Paper simulations
│   ├── config.py             # Simulation parameters
│   ├── data_generation.py    # Synthetic data generation
│   ├── metrics.py            # Evaluation metrics (RBO, top-k accuracy)
│   ├── run_simulations.py    # Single experiment runner
│   └── run_batch.py          # Batch runner for all experiments
├── case_studies/             # Real data applications
│   ├── drug_response/        # CCLE drug response (Section 4.2.1)
│   └── breast_cancer/        # TCGA cancer subtyping (Section 4.2.2)
└── figures/                  # Paper figures
    ├── Figure3.ipynb         # Theory validation
    └── plot_results.py       # Generate result plots

Methods

RAMP (Algorithm 1)

Ranks all features by averaging importance rankings across random minipatches:

from rampart import ramp

rankings = ramp(
    X, y,
    model_cls=RandomForestModel,
    n_minipatches=10000,   # Number of minipatches (T)
    n_obs=100,             # Observations per minipatch (n)
    n_features=10          # Features per minipatch (m)
)

RAMPART (Algorithm 2)

Efficiently identifies top-k features using sequential halving:

from rampart import rampart

rankings = rampart(
    X, y,
    k=10,                  # Number of top features to find
    model_cls=RandomForestModel,
    n_minipatches=2000,    # Minipatches per iteration (B)
    n_obs=100,
    n_features=10
)

Available Models

Regression

LinearModel: Linear regression (coefficient-based importance)
DecisionTreeModel: Decision tree (impurity-based importance)
RandomForestModel: Random forest (mean impurity decrease)
KernelRidgePermutation: Kernel ridge regression (permutation importance)

Classification

LogisticModel: Logistic regression
DecisionTreeClassifier: Decision tree classifier
RandomForestClassifier: Random forest classifier
KernelSVMPermutation: Kernel SVM (permutation importance)

Reproducing Paper Results

Simulations (Section 4.1)

cd simulations

# Run single experiment
python run_simulations.py --task regression --covariance IID --algorithm rampart

# Run all experiments (100 seeds)
python run_batch.py --seeds 100 --parallel 4

# Generate plots
cd ../figures
python plot_results.py --results-dir ../simulations/results

Case Studies (Section 4.2)

See notebooks in case_studies/:

Drug response prediction (CCLE dataset)
Breast cancer subtype classification (TCGA dataset)

Parameters Guide

Parameter	Description	Recommended
`k`	Top features to find	Based on domain knowledge
`n_minipatches`	Minipatches per iteration	1000-4000 (higher = more accurate)
`n_obs`	Observations per minipatch	N/2 to N/4
`n_features`	Features per minipatch	10-20 (should be > k)
`model_cls`	Base model class	`RandomForestModel` (default)

Citation

@article{chen2025topk,
  title={Top-k Feature Importance Ranking},
  author={Chen, Eric and Tang, Tiffany and Allen, Genevera I.},
  journal={Transactions on Machine Learning Research},
  year={2025},
  url={https://openreview.net/forum?id=2OSHpccsaV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAMPART: Top-k Feature Importance Ranking

Overview

Quick Start

Installation

Repository Structure

Methods

RAMP (Algorithm 1)

RAMPART (Algorithm 2)

Available Models

Regression

Classification

Reproducing Paper Results

Simulations (Section 4.1)

Case Studies (Section 4.2)

Parameters Guide

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
case_studies		case_studies
examples		examples
figures		figures
rampart		rampart
simulations		simulations
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

DataSlingers/TopK

Folders and files

Latest commit

History

Repository files navigation

RAMPART: Top-k Feature Importance Ranking

Overview

Quick Start

Installation

Repository Structure

Methods

RAMP (Algorithm 1)

RAMPART (Algorithm 2)

Available Models

Regression

Classification

Reproducing Paper Results

Simulations (Section 4.1)

Case Studies (Section 4.2)

Parameters Guide

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages