Phil is a representation-guided imputation library for missing tabular data.
It generates multiple imputations using a configurable strategy grid, computes Euler Characteristic Transform (ECT) descriptors over each imputed dataset, and selects the most representative imputation from the candidate set.
pip install philphil requires the trailed backend for ECT computation. Install it from the
KRV research index or provide a compatible local build.
- Impute — runs a grid of imputation strategies (sklearn estimators or custom) over the input dataframe, producing a set of candidate datasets
- Describe — computes an ECT descriptor for each candidate via the
trailedbackend - Select — picks the candidate closest to the mean descriptor (most representative imputation)
- Transform — exposes the fitted pipeline for inference on new data
import pandas as pd
from phil import Phil
df = pd.read_csv("data_with_missing.csv")
phil = Phil(samples=30, random_state=42)
imputed_df = phil.fit(df)
# Apply the same fitted pipeline to new data
new_df = phil.transform(new_data)from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from phil import PhilTransformer
pipe = Pipeline([
("imputer", PhilTransformer(samples=20, random_state=0)),
("model", RandomForestClassifier()),
])
pipe.fit(X_train, y_train)Phil ships a FastMCP-based MCP server that lets Claude, Cursor, Gemini CLI, and other MCP-capable agents run imputation sweeps on your pandas or polars dataframes without writing Python.
Install the mcp extra and launch the server with uv tool run or pipx:
pip install "philler[mcp]"
phil-mcp # persistent install
# or, ephemeral via uv:
uv tool run --from "philler[mcp]" phil-mcpExample Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"phil": {
"command": "uv tool run",
"args": ["--from", "philler[mcp]", "phil-mcp"]
}
}
}The server exposes tools for the full sweep workflow — ingest_dataset,
characterize_dataset, list_grids, create_config, validate_config,
run_imputation_sweep, diagnose_sweep, export_imputed_data, and more.
Polars users write to Parquet and ingest the file path. See the
MCP guide for setup tabs, the full tool
table, and an example dialog.
For local end-to-end testing with medical missing-data examples, use
demos/medical.
Phil ships with named grids accessible via GridGallery:
| Name | Methods |
|---|---|
default |
BayesianRidge, DecisionTree, RandomForest, GradientBoosting |
sampling |
DistributionImputer (empirical sampling) |
finance |
IterativeImputer, KNNImputer, SimpleImputer |
healthcare |
KNNImputer, SimpleImputer, IterativeImputer |
marketing |
SimpleImputer, KNNImputer, IterativeImputer |
engineering |
SimpleImputer, KNNImputer, IterativeImputer |
Pass a grid name or an ImputationConfig directly:
from phil import Phil, ImputationConfig
from sklearn.model_selection import ParameterGrid
config = ImputationConfig(
methods=["KNNImputer"],
modules=["sklearn.impute"],
grids=[ParameterGrid({"n_neighbors": [3, 5, 7]})],
)
phil = Phil(param_grid=config)ECT is configured via ECTConfig:
from phil import Phil, ECTConfig
ect_config = ECTConfig(
num_thetas=64,
radius=1.0,
resolution=100,
scale=500,
normalize=True,
seed=42,
)
phil = Phil(config=ect_config)uv sync --all-extras
uv run pytest -v
uv run black phil/ tests/Project documentation lives under docs/source with unified API and guide pages.
Build locally with uv run sphinx-build -M html docs/source docs/build.