Spaghetti is an always-on tabular prediction stack designed to remain usable under heavy missingness (arbitrary NaNs; up to ~70% missing per example) while optionally providing calibrated uncertainty via split conformal.
Install the project dependencies and developer tools:
uv sync
uv add --dev black isort pre-commit pytestFormat the codebase and run tests:
uv run black .
uv run isort --profile black .
uv run pytest -qRun a quick training session (example):
uv run spaghetti --experiment-config spaghetti.yaml --data experiments/data/adult_bench_m20.csv --target-cls classSee CONTRIBUTING.md for contribution guidelines and developer tooling.
This repo has been migrated away from “pass a giant config dict everywhere” toward explicit, IDE-friendly dataclass configs for model/training/difficulty. The CLI still accepts layered YAML (pesto.yaml + spaghetti.yaml) for convenience and converts to dataclasses immediately.
The model is a single trunk with optional heads:
RankColumns → RouterFreeRun → FusionGate → FusedEncoder → {regression, multiclass}
and an auxiliary reconstruction circuit that feeds difficulty and robustness:
Router reconstruction (numeric) + categorical reconstruction (CE) → route_error → difficulty
Key modules:
- Column ranking:
spaghetti.model.pipeline.ColumnRanker - Masked reconstruction router:
spaghetti.model.pipeline.RouterFreeRun(Non-autoregressive) - Fusion gate:
spaghetti.model.pipeline.FusionGate(Mask-constrained) - Encoder:
spaghetti.model.encoders.FusedEncoder - Wrapper:
spaghetti.model.model.SpaghettiModel
Problem: difficulty was previously computed from numeric reconstruction error only, making it blind to categorical corruption.
Implementation:
SpaghettiModelnow encodesX_catas additional feature tokens (per-column embedding + positional embedding) and concatenates them to the fused numeric tokens before the encoder.- A simple categorical reconstruction head (
CategoricalReconstructionHead) predicts per-categorical-feature logits from the pooled representation. - Difficulty’s
route_errornow includes:- numeric reconstruction error on observed numeric entries, plus
- categorical negative log-likelihood on observed categories.
Conventions:
- Categorical missing values are represented as
-1. - Observed categories are assumed to be in
[0, K-1]per feature.
Conformal binning must not use the label
What’s safe here:
- Difficulty is computed from
(X, mask)and reconstruction outputs only — it never readsy_reg/y_cls.
Hardening applied:
- Training now prevents label losses (regression/multiclass) from updating the router by running a second forward pass with
detach_router=True. - The router is trained only through reconstruction-style losses (numeric reconstruction and categorical reconstruction), which are unsupervised w.r.t.
$Y$ .
RouterFreeRun now uses a parallel masked-prediction approach (BERT-style). This allows the model to learn feature correlations without the latency or order-sensitivity of autoregressive models.
Status:
- Implemented and validated.
- Replaces the previous sequential router for better stability and performance.
The FusionGate now enforces a safety constraint
The offline_stress_tests harness supports synthetic missingness regimes. In addition to MCAR/block/feature-biased masking, it now supports NMAR:
type: nmar: masks values preferentially above/below a per-batch quantile threshold.
Example regime config:
evaluation:
offline_stress_tests:
enabled: true
regimes:
- { name: "nmar_high", type: "nmar", p: 0.5, quantile: 0.8, direction: "high" }Note: conformal coverage under NMAR is only expected if calibration data reflects the same NMAR mechanism.
-
.npz:- Required:
X(float32/float64), shape(N,F). - Optional:
X_cat(int64), shape(N,C)with-1for missing. - Optional:
y_reg(float),y_cls(int).
- Required:
-
Table files via Polars (
.csv,.parquet,.ipc) are supported for single-row data.
Install:
uv syncRun training and evaluation:
uv run spaghetti --experiment-config spaghetti.yaml --data path/to/data.csv --target-reg target_colGenerate diagnosis plots:
uv run python experiments/diagnosis_plots.py-
spaghetti/: Core library.model/: Architecture (Trunk, Router, Fusion, Heads).trainer.py: Training loop with history tracking.conformal.py: Split conformal calibration.evaluation.py: Stress tests and feature-wise metrics.
-
experiments/: Benchmark scripts and diagnosis tools. -
paper/: Technical report and generated plots. -
notebooks/: Exploratory analysis and clean benchmarks. -
spaghetti/model/— model + pipeline -
spaghetti/trainer.py— training loop + conformal calibration -
spaghetti/evaluation.py— offline stress tests (now includes NMAR) -
spaghetti/masking.py— training-time augmentation (numeric + categorical) -
spaghetti/conformal.py— split conformal utilities
Spaghetti is a config-driven, always-on tabular prediction system designed for heavy missingness (arbitrary NaNs; up to ~70% missing per row at inference/evaluation via mask-aware routing/reconstruction). It supports:
- Regression (point prediction; optional conformal interval)
- Multiclass classification (top-1; optional conformal prediction set)
- A “route” reconstruction head used for robustness and difficulty scoring
The implementation is intentionally modular and is driven by two YAML configs that are merged “defaults first, experiment overrides” (Hydra/OmegaConf style).
Many tabular pipelines either:
- require extensive imputation and fail under heavy missingness, or
- provide model confidence that is not calibrated as a coverage guarantee.
Spaghetti addresses this by:
- Making missingness explicit via a mask-aware model input.
- Training with denoising-style additional masking (kept moderate by default) so the model remains robust when features are missing at test time.
- Optionally using split conformal calibration to convert model errors into uncertainty sets/intervals with empirical coverage (under standard conformal assumptions).
Spaghetti loads and deep-merges two YAML files:
pesto.yaml— deterministic defaults (“pin down ambiguous behavior”)spaghetti.yaml— experiment-level configuration (architecture choice, losses, conformal binning, etc.)
Conceptually, pesto.yaml is the implementation-defaults layer: it exists to make “general” behavior deterministic (preprocessing/schema inference defaults, conformal conventions, tie-breaking, etc.), while spaghetti.yaml is meant to express high-level experimental choices.
Merge behavior:
- Mappings (dicts) merge recursively.
- Non-mappings (scalars, lists) are replaced by the experiment value.
Entry points:
- Loader:
spaghetti.config.load_layered_config() - Merge implementation:
spaghetti.config.deep_merge()
Each sample is represented as:
x_filled: numeric features after a deterministic fill strategy (e.g., zero/mean/learned token)mask: binary tensor with1=observed,0=missing
The mask is always passed to the encoder.
Spaghetti supports two input modes:
- Single-row:
x_filled: (B, F),mask: (B, F) - Row-set / subtable:
x_filled: (B, S, F),mask: (B, S, F)
- Regression:
y_regas shape(B,) - Multiclass:
y_clsas integer ids0..K-1, shape(B,)
-
NPZ (recommended for row-set support)
- Required:
X - Optional:
X_cat,y_reg,y_cls
- Required:
-
Table files via Polars (single-row only)
.csv,.parquet/.pq,.ipc/.feather/.arrow- You specify target columns via CLI flags.
“General tabular” often breaks on schema inference; here are the defaults as implemented today:
- If
X_catis absent, the model treats everything inXas numeric. - For table inputs (Polars loader):
- Categorical columns default to Polars dtypes
Categorical,Enum,Utf8(strings), andBoolean. - Numeric columns are cast to float (with missing values preserved as
NaN). - Date/Datetime columns are converted deterministically to epoch-based numeric values.
- Categorical columns default to Polars dtypes
- Text hashing and richer datetime feature extraction described in
pesto.yamlare not implemented yet (see “Known gaps”).
Spaghetti is a shared encoder (“trunk”) with up to three heads:
- Regression head → predicts
ŷ(scalar) - Multiclass head → predicts logits for
Kclasses - Route (reconstruction) head → predicts reconstructed feature values (used for robustness + difficulty)
Note: in the current v1 implementation, the route/reconstruction head reconstructs numeric features only. If you provide categoricals (X_cat / --categorical-cols), they are used by the encoder (Case B) but are not reconstructed; route error and difficulty are therefore computed from numeric features.
Implementation:
- Model wrapper:
spaghetti.model.model.SpaghettiModel - Heads:
spaghetti.model.heads.*
The encoder is selected by rules in spaghetti.yaml:model.architecture_selector.
Best for moderate-width mostly-numeric features.
Input representation is a concatenation of:
x_filledmask- optional per-column id embeddings
Implementation: spaghetti.model.encoders.MLPEncoder
Adds categorical embeddings for X_cat and fuses numeric + categorical features via an MLP.
Implementation: spaghetti.model.encoders.MixedTabularEncoder
Treats each feature as a token, built from:
- numeric value embedding
- mask embedding
- optional column-id embedding
Then applies a Transformer encoder across features and pools (CLS or mean).
Implementation: spaghetti.model.encoders.FeatureTokenTransformerEncoder
For permutation-invariant sets of rows (subtables), this uses:
- a row encoder (Case A/B/C)
- a set aggregator (currently DeepSets pooling)
Implementation: spaghetti.model.encoders.SetEncoder + DeepSetsAggregator
Builder/selector:
spaghetti.model.builder.select_encoder_case()spaghetti.model.builder.build_model()
In addition to natural missingness in the input, Spaghetti optionally applies additional random masking during training. This makes the model robust to heavier corruption and supports “always-on” inference.
Masking patterns implemented:
- MCAR (random independent masking)
- Block masking (contiguous feature blocks)
- Feature-biased masking (optional weights)
Implementation: spaghetti.masking.apply_random_masking()
- Regression: MSE or Huber
- Multiclass: cross entropy (optional label smoothing)
- Route loss: masked L1/L2 computed only on additionally masked entries
Training loop: spaghetti.trainer.train()
For coverage claims, Spaghetti keeps conformal calibration clean: it uses four splits.
- train: fit model parameters
- val: early stopping / model selection
- calib: conformal thresholds (not used for fitting/selection)
- test: final evaluation
In the current implementation, train is automatically split into train/val inside the data loader construction; calib remains reserved for conformal calibration.
A scalar difficulty is computed per example from:
- missing rate:
$1 - \mathrm{mean}(mask)$ - route self-consistency error: reconstruction error on observed entries
The route error is normalized using calibration statistics (robust z-score by default) and then combined linearly.
Implementation invariants (to avoid leakage and preserve the “difficulty is observable at test time” property):
- Route self-consistency error is computed only on observed entries.
- Route loss is computed only on additionally masked entries from the training-time augmentation (denoising objective).
Implementation: spaghetti.difficulty.DifficultyScorer
Spaghetti uses split conformal:
- Train on the train split.
- Run the trained model on the calibration split.
- Convert calibration nonconformity scores into quantile thresholds.
Split conformal produces prediction intervals/sets calibrated on a held-out calibration set that is not used to fit or select the model, targeting marginal coverage under the usual exchangeability assumption. Difficulty binning is used for stability/adaptivity and should be validated per bin.
Yes. Conformal calibration is now a separate optional circuit.
To enable it at all, set:
conformal.enabled: true
Then you can optionally disable it per head in spaghetti.yaml:
conformal.regression.enabled: false(no conformal intervals)conformal.multiclass.enabled: false(no conformal prediction sets)
When disabled, Spaghetti still trains and produces point predictions/logits, but it will skip fitting conformal calibrators. The “offline stress tests” section below is currently conformal-metrics-focused (coverage and set/interval size), so it will also be omitted when conformal is fully disabled.
Minimal example:
conformal:
enabled: false
regression:
enabled: false
multiclass:
enabled: falseCalibration points are grouped into a small number of difficulty bins (quantile bins). Each bin has its own threshold, improving stability when calibration sizes are modest.
Implementation: spaghetti.conformal.make_quantile_bins()
Nonconformity score:
Output interval (symmetric):
lo = ŷ - q(bin(d))hi = ŷ + q(bin(d))
Implementation: spaghetti.conformal.RegressionConformal
APS builds the prediction set by sorting classes by predicted probability and including the top classes until the cumulative probability exceeds a calibrated threshold (with deterministic tie-breaking for equal probabilities).
Implementation: spaghetti.conformal.APSConformal
You can evaluate coverage and set/interval sizes under synthetic missingness regimes (e.g., MCAR 10/30/50/70, block missingness, feature-biased missingness). These are configured under evaluation.offline_stress_tests.
Note: the current stress-test implementation reports conformal metrics (coverage, interval width, prediction set size). If conformal is disabled for all heads, the stress-test report is omitted.
Implementation: spaghetti.evaluation.offline_stress_tests()
uv syncuv run spaghetti \
--defaults-config pesto.yaml \
--experiment-config spaghetti.yaml \
--data path/to/dataset.npzIf your .npz includes X_cat, provide cardinalities:
uv run spaghetti \
--data path/to/dataset.npz \
--cat-cardinalities 12,7,100uv run spaghetti \
--data path/to/data.parquet \
--target-reg target_value \
--target-cls target_classOptionally specify categorical columns:
uv run spaghetti \
--data path/to/data.csv \
--target-cls label \
--categorical-cols country,device_typespaghetti.yaml— experiment-level configpesto.yaml— deterministic defaultsspaghetti/— implementationcli.py— CLI entrypointconfig.py— layered YAML mergedata.py— dataset + splits + loaderspolars_io.py— Polars loaders for CSV/Parquet/IPCmasking.py— training-time masking augmentationmodel/— encoders, heads, model wrapper, builderdifficulty.py— difficulty computationconformal.py— split conformal for regression + multiclass (APS)trainer.py— training loop + early stopping + calibrationevaluation.py— stress-test evaluation
- The Polars loader currently supports single-row data only (row-set inputs should use
.npz). - For row-set
.npzdata, the implementation expects a fixed set sizeS. If your samples have variableS, pad to a commonS_maxand setmask=0(andx_filled=0) for padded entries. - The set encoder uses DeepSets pooling; Set Transformer aggregation is not implemented.
pesto.yamldescribes richer “implementation defaults” (schema inference, datetime/text handling, mixed-type reconstruction losses, conformal quantile conventions). The current route head reconstructs numeric features only and uses masked L1/L2; mixed-type route losses are planned but not implemented.