RAMPART (Ranked Attributions with MiniPatches And Recursive Trimming) is an efficient method for identifying the most important features in high-dimensional datasets.
This repository contains the official implementation from:
Top-k Feature Importance Ranking Eric Chen, Tiffany Tang, Genevera I. Allen Transactions on Machine Learning Research (TMLR), 2025 OpenReview
Traditional feature importance methods rank all features, which is computationally expensive and often unnecessary when you only need the top few. RAMPART uses:
- Minipatch Ensembling (RAMP): Aggregates feature rankings across random subsamples of observations and features
- Recursive Trimming (RAMPART): Iteratively eliminates bottom-ranked features using sequential halving
This approach is:
- Efficient: Focuses computation on promising features
- Scalable: Works with thousands of features
- Flexible: Compatible with any feature importance model
from rampart import ramp, rampart
from rampart.models import RandomForestModel
import numpy as np
# Generate example data
X = np.random.randn(200, 100)
beta = np.zeros(100)
beta[:5] = [5, 4, 3, 2, 1] # 5 signal features
y = X @ beta + np.random.randn(200)
# Find top-5 features using RAMPART
rankings = rampart(
X, y,
k=5,
model_cls=RandomForestModel,
n_minipatches=1000,
n_obs=50,
n_features=10
)
# Get indices of top-5 features
top_5 = np.argsort(rankings)[:5]
print(f"Top-5 features: {top_5}")# Clone the repository
git clone https://github.com/DataSlingers/TopK.git
cd TopK
# Install dependencies
pip install -r requirements.txtTopK/
├── rampart/ # Main package
│ ├── algorithms.py # RAMP and RAMPART implementations
│ ├── models.py # Regression models
│ └── classifiers.py # Classification models
├── examples/ # Example notebooks
│ ├── quickstart.ipynb # Basic usage
│ └── custom_data_example.ipynb
├── simulations/ # Paper simulations
│ ├── config.py # Simulation parameters
│ ├── data_generation.py # Synthetic data generation
│ ├── metrics.py # Evaluation metrics (RBO, top-k accuracy)
│ ├── run_simulations.py # Single experiment runner
│ └── run_batch.py # Batch runner for all experiments
├── case_studies/ # Real data applications
│ ├── drug_response/ # CCLE drug response (Section 4.2.1)
│ └── breast_cancer/ # TCGA cancer subtyping (Section 4.2.2)
└── figures/ # Paper figures
├── Figure3.ipynb # Theory validation
└── plot_results.py # Generate result plots
Ranks all features by averaging importance rankings across random minipatches:
from rampart import ramp
rankings = ramp(
X, y,
model_cls=RandomForestModel,
n_minipatches=10000, # Number of minipatches (T)
n_obs=100, # Observations per minipatch (n)
n_features=10 # Features per minipatch (m)
)Efficiently identifies top-k features using sequential halving:
from rampart import rampart
rankings = rampart(
X, y,
k=10, # Number of top features to find
model_cls=RandomForestModel,
n_minipatches=2000, # Minipatches per iteration (B)
n_obs=100,
n_features=10
)LinearModel: Linear regression (coefficient-based importance)DecisionTreeModel: Decision tree (impurity-based importance)RandomForestModel: Random forest (mean impurity decrease)KernelRidgePermutation: Kernel ridge regression (permutation importance)
LogisticModel: Logistic regressionDecisionTreeClassifier: Decision tree classifierRandomForestClassifier: Random forest classifierKernelSVMPermutation: Kernel SVM (permutation importance)
cd simulations
# Run single experiment
python run_simulations.py --task regression --covariance IID --algorithm rampart
# Run all experiments (100 seeds)
python run_batch.py --seeds 100 --parallel 4
# Generate plots
cd ../figures
python plot_results.py --results-dir ../simulations/resultsSee notebooks in case_studies/:
- Drug response prediction (CCLE dataset)
- Breast cancer subtype classification (TCGA dataset)
| Parameter | Description | Recommended |
|---|---|---|
k |
Top features to find | Based on domain knowledge |
n_minipatches |
Minipatches per iteration | 1000-4000 (higher = more accurate) |
n_obs |
Observations per minipatch | N/2 to N/4 |
n_features |
Features per minipatch | 10-20 (should be > k) |
model_cls |
Base model class | RandomForestModel (default) |
@article{chen2025topk,
title={Top-k Feature Importance Ranking},
author={Chen, Eric and Tang, Tiffany and Allen, Genevera I.},
journal={Transactions on Machine Learning Research},
year={2025},
url={https://openreview.net/forum?id=2OSHpccsaV}
}