Skip to content

JohnKossa/LayeredCompModel

Repository files navigation

LayeredCompModel

PyPI version Documentation Status Tests License

Hierarchical tree-based regressor for robust predictions (e.g., parcel sale prices) using path-weighted Wilson means (95% trimmed means for outlier resistance).

Features

  • Scikit-learn compatible: Inherits BaseEstimator/RegressorMixin; works with Pipeline, GridSearchCV, cross_val_score, pickling.
  • Automatic feature handling: Categorical (one-vs-rest splits), numeric (binary search breakpoints), NaNs/missing values.
  • Robust statistics: Wilson means prevent outlier swings.
  • Ensemble Support: LayeredCompBaggingModel for reduced variance and automatic weight_falloff optimization.
  • Configurable weighting: weight_falloff balances local accuracy vs. market normativity.
  • Explainable: explain_value(row) shows path, weights, means.
  • Serializable: to_json(), to_dict().
  • Parallel: n_jobs support.

NaN Handling

  • Categorical: Treated as distinct "NaN" category.
  • Numeric: Excluded from splits (robust; per SPEC.md).
  • Target y: Must be finite (raises ValueError).
  • Strict checks: Use Pipeline([('imputer', SimpleImputer()), ('model', LayeredCompModel())]).

Installation

pip install layeredcompmodel

For development:

git clone https://github.com/JohnKossa/layeredcompmodel.git
cd layeredcompmodel
pip install -e .[dev]

Quickstart

import pandas as pd
import numpy as np
from layeredcompmodel import LayeredCompModel

# Synthetic real-estate-like data
rng = np.random.default_rng(42)
n_samples = 100
data = {
    'neighborhood': rng.choice(['North', 'South', 'East'], n_samples),
    'size_sqft': rng.normal(2000, 500, n_samples),
    'price': rng.normal(500000, 100000, n_samples) + 100 * rng.normal(0, 1, n_samples) * (rng.normal(0, 1, n_samples) * 2000)
}
df = pd.DataFrame(data)
X = df[['neighborhood', 'size_sqft']]
y = df['price']

# Train
model = LayeredCompModel(weight_falloff=0.8, n_jobs=1)
model.fit(X, y)

# Predict
predictions = model.predict(X)
print(f"Predictions shape: {predictions.shape}")
print(f"MAE: {np.mean(np.abs(predictions - y)):.0f}")

# Explain single prediction
explanation = model.explain_value(X.iloc[0:1].squeeze())
print(explanation)

API Reference

LayeredCompModel(weight_falloff=0.5, split_metric='mae', n_jobs=1)

  • fit(X, y): Build tree from features X (DataFrame), target y (Series).
  • predict(X): Predict using path-weighted means.
  • explain_value(row): Dict with path nodes, depths, weights, wilson_means.
  • to_json(indent=4): JSON tree dump.
  • tree_: Root CompNode.

LayeredCompBaggingModel(tree_count=10, sample_pct=0.8, random_state=None, split_metric='mae', n_jobs=1)

  • fit(X, y): Build bagging ensemble. Automatically optimizes weight_falloff for each tree using an internal split.
  • predict(X): Return the average prediction of all trees.
  • estimators_: List of fitted LayeredCompModel instances.

See docs (TBD).

Examples

Run the quickstart:

python examples/quickstart.py

Expected output:

Predictions shape: (100,)
MAE: 126914
{'final_prediction': 530354.0426294187, 'weight_falloff': 0.8, 'path': [{'depth': 0, 'wilson_mean': 476353.91361128056, 'count': 100, 'is_leaf': False, 'filter_col': 'size_sqft', 'filter_val': 2101.366485546922}, {'depth': 1, 'wilson_mean': 553953.0606894617, 'count': 42, 'is_leaf': False, 'filter_col': 'neighborhood', 'filter_val': 'North'}, {'depth': 2, 'wilson_mean': 525096.3185716979, 'count': 13, 'is_leaf': True}], 'calculation': '0.199*476354 + 0.512*553953 + 0.289*525096 = 530354'}

Development & Testing

pytest tests/ --cov=layeredcompmodel
black src/
mypy src/

CI/CD, Sphinx docs: planned.

Citing

Kossa, J. (2026). LayeredCompModel. GitHub. https://github.com/JohnKossa/layeredcompmodel

License

MIT

About

A prediction model based on stacking comparables with increasing levels of specificity.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages