Hierarchical tree-based regressor for robust predictions (e.g., parcel sale prices) using path-weighted Wilson means (95% trimmed means for outlier resistance).
- MODEL_SPEC.md: High-level method.
- SPEC.md: Detailed implementation specs.
- Scikit-learn compatible: Inherits
BaseEstimator/RegressorMixin; works withPipeline,GridSearchCV,cross_val_score, pickling. - Automatic feature handling: Categorical (one-vs-rest splits), numeric (binary search breakpoints), NaNs/missing values.
- Robust statistics: Wilson means prevent outlier swings.
- Ensemble Support:
LayeredCompBaggingModelfor reduced variance and automaticweight_falloffoptimization. - Configurable weighting:
weight_falloffbalances local accuracy vs. market normativity. - Explainable:
explain_value(row)shows path, weights, means. - Serializable:
to_json(),to_dict(). - Parallel:
n_jobssupport.
- Categorical: Treated as distinct "NaN" category.
- Numeric: Excluded from splits (robust; per SPEC.md).
- Target
y: Must be finite (raisesValueError). - Strict checks: Use
Pipeline([('imputer', SimpleImputer()), ('model', LayeredCompModel())]).
pip install layeredcompmodelFor development:
git clone https://github.com/JohnKossa/layeredcompmodel.git
cd layeredcompmodel
pip install -e .[dev]import pandas as pd
import numpy as np
from layeredcompmodel import LayeredCompModel
# Synthetic real-estate-like data
rng = np.random.default_rng(42)
n_samples = 100
data = {
'neighborhood': rng.choice(['North', 'South', 'East'], n_samples),
'size_sqft': rng.normal(2000, 500, n_samples),
'price': rng.normal(500000, 100000, n_samples) + 100 * rng.normal(0, 1, n_samples) * (rng.normal(0, 1, n_samples) * 2000)
}
df = pd.DataFrame(data)
X = df[['neighborhood', 'size_sqft']]
y = df['price']
# Train
model = LayeredCompModel(weight_falloff=0.8, n_jobs=1)
model.fit(X, y)
# Predict
predictions = model.predict(X)
print(f"Predictions shape: {predictions.shape}")
print(f"MAE: {np.mean(np.abs(predictions - y)):.0f}")
# Explain single prediction
explanation = model.explain_value(X.iloc[0:1].squeeze())
print(explanation)fit(X, y): Build tree from featuresX(DataFrame), targety(Series).predict(X): Predict using path-weighted means.explain_value(row): Dict with path nodes, depths, weights, wilson_means.to_json(indent=4): JSON tree dump.tree_: RootCompNode.
LayeredCompBaggingModel(tree_count=10, sample_pct=0.8, random_state=None, split_metric='mae', n_jobs=1)
fit(X, y): Build bagging ensemble. Automatically optimizesweight_fallofffor each tree using an internal split.predict(X): Return the average prediction of all trees.estimators_: List of fittedLayeredCompModelinstances.
See docs (TBD).
examples/quickstart.py: Basic usage ofLayeredCompModel.examples/bagging_quickstart.py: Usage ofLayeredCompBaggingModelfor better robustness.
Run the quickstart:
python examples/quickstart.pyExpected output:
Predictions shape: (100,)
MAE: 126914
{'final_prediction': 530354.0426294187, 'weight_falloff': 0.8, 'path': [{'depth': 0, 'wilson_mean': 476353.91361128056, 'count': 100, 'is_leaf': False, 'filter_col': 'size_sqft', 'filter_val': 2101.366485546922}, {'depth': 1, 'wilson_mean': 553953.0606894617, 'count': 42, 'is_leaf': False, 'filter_col': 'neighborhood', 'filter_val': 'North'}, {'depth': 2, 'wilson_mean': 525096.3185716979, 'count': 13, 'is_leaf': True}], 'calculation': '0.199*476354 + 0.512*553953 + 0.289*525096 = 530354'}
pytest tests/ --cov=layeredcompmodel
black src/
mypy src/CI/CD, Sphinx docs: planned.
Kossa, J. (2026). LayeredCompModel. GitHub. https://github.com/JohnKossa/layeredcompmodel