Spaghetti

Spaghetti is an always-on tabular prediction stack designed to remain usable under heavy missingness (arbitrary NaNs; up to ~70% missing per example) while optionally providing calibrated uncertainty via split conformal.

Quick Start

Install the project dependencies and developer tools:

uv sync
uv add --dev black isort pre-commit pytest

Format the codebase and run tests:

uv run black .
uv run isort --profile black .
uv run pytest -q

Run a quick training session (example):

uv run spaghetti --experiment-config spaghetti.yaml --data experiments/data/adult_bench_m20.csv --target-cls class

Contributing

See CONTRIBUTING.md for contribution guidelines and developer tooling.

This repo has been migrated away from “pass a giant config dict everywhere” toward explicit, IDE-friendly dataclass configs for model/training/difficulty. The CLI still accepts layered YAML (pesto.yaml + spaghetti.yaml) for convenience and converts to dataclasses immediately.

Pipeline (what actually runs)

The model is a single trunk with optional heads:

RankColumns → RouterFreeRun → FusionGate → FusedEncoder → {regression, multiclass}

and an auxiliary reconstruction circuit that feeds difficulty and robustness:

Router reconstruction (numeric) + categorical reconstruction (CE) → route_error → difficulty

Key modules:

Column ranking: spaghetti.model.pipeline.ColumnRanker
Masked reconstruction router: spaghetti.model.pipeline.RouterFreeRun (Non-autoregressive)
Fusion gate: spaghetti.model.pipeline.FusionGate (Mask-constrained)
Encoder: spaghetti.model.encoders.FusedEncoder
Wrapper: spaghetti.model.model.SpaghettiModel

A. Fix the categorical gap (done)

Problem: difficulty was previously computed from numeric reconstruction error only, making it blind to categorical corruption.

Implementation:

SpaghettiModel now encodes X_cat as additional feature tokens (per-column embedding + positional embedding) and concatenates them to the fused numeric tokens before the encoder.
A simple categorical reconstruction head (CategoricalReconstructionHead) predicts per-categorical-feature logits from the pooled representation.
Difficulty’s route_error now includes:
- numeric reconstruction error on observed numeric entries, plus
- categorical negative log-likelihood on observed categories.

Conventions:

Categorical missing values are represented as -1.
Observed categories are assumed to be in [0, K-1] per feature.

B. Validate difficulty independence (audited + hardened)

Conformal binning must not use the label $Y$ of the same example.

What’s safe here:

Difficulty is computed from (X, mask) and reconstruction outputs only — it never reads y_reg/y_cls.

Hardening applied:

Training now prevents label losses (regression/multiclass) from updating the router by running a second forward pass with detach_router=True.
The router is trained only through reconstruction-style losses (numeric reconstruction and categorical reconstruction), which are unsupervised w.r.t. $Y$.

C. Non-Autoregressive Masked Router (implemented)

RouterFreeRun now uses a parallel masked-prediction approach (BERT-style). This allows the model to learn feature correlations without the latency or order-sensitivity of autoregressive models.

Status:

Implemented and validated.
Replaces the previous sequential router for better stability and performance.

D. Safe Fusion Gate (implemented)

The FusionGate now enforces a safety constraint $g = (1 - m) \cdot \sigma(\text{MLP}(\cdot))$, where $m$ is the observed mask. This ensures that observed values are never overwritten by model hallucinations, while missing values are filled with a learned mixture of the router's prediction and a default token.

The offline_stress_tests harness supports synthetic missingness regimes. In addition to MCAR/block/feature-biased masking, it now supports NMAR:

type: nmar: masks values preferentially above/below a per-batch quantile threshold.

Example regime config:

evaluation:
  offline_stress_tests:
    enabled: true
    regimes:
      - { name: "nmar_high", type: "nmar", p: 0.5, quantile: 0.8, direction: "high" }

Note: conformal coverage under NMAR is only expected if calibration data reflects the same NMAR mechanism.

Data formats

.npz:
- Required: X (float32/float64), shape (N,F).
- Optional: X_cat (int64), shape (N,C) with -1 for missing.
- Optional: y_reg (float), y_cls (int).
Table files via Polars (.csv, .parquet, .ipc) are supported for single-row data.

Running

Install:

uv sync

Run training and evaluation:

uv run spaghetti --experiment-config spaghetti.yaml --data path/to/data.csv --target-reg target_col

Generate diagnosis plots:

uv run python experiments/diagnosis_plots.py

Repository map

spaghetti/: Core library.
- model/: Architecture (Trunk, Router, Fusion, Heads).
- trainer.py: Training loop with history tracking.
- conformal.py: Split conformal calibration.
- evaluation.py: Stress tests and feature-wise metrics.
experiments/: Benchmark scripts and diagnosis tools.
paper/: Technical report and generated plots.
notebooks/: Exploratory analysis and clean benchmarks.
spaghetti/model/ — model + pipeline
spaghetti/trainer.py — training loop + conformal calibration
spaghetti/evaluation.py — offline stress tests (now includes NMAR)
spaghetti/masking.py — training-time augmentation (numeric + categorical)
spaghetti/conformal.py — split conformal utilities

Spaghetti is a config-driven, always-on tabular prediction system designed for heavy missingness (arbitrary NaNs; up to ~70% missing per row at inference/evaluation via mask-aware routing/reconstruction). It supports:

Regression (point prediction; optional conformal interval)
Multiclass classification (top-1; optional conformal prediction set)
A “route” reconstruction head used for robustness and difficulty scoring

The implementation is intentionally modular and is driven by two YAML configs that are merged “defaults first, experiment overrides” (Hydra/OmegaConf style).

What problem this solves

Many tabular pipelines either:

require extensive imputation and fail under heavy missingness, or
provide model confidence that is not calibrated as a coverage guarantee.

Spaghetti addresses this by:

Making missingness explicit via a mask-aware model input.
Training with denoising-style additional masking (kept moderate by default) so the model remains robust when features are missing at test time.
Optionally using split conformal calibration to convert model errors into uncertainty sets/intervals with empirical coverage (under standard conformal assumptions).

Layered configuration

Spaghetti loads and deep-merges two YAML files:

pesto.yaml — deterministic defaults (“pin down ambiguous behavior”)
spaghetti.yaml — experiment-level configuration (architecture choice, losses, conformal binning, etc.)

Conceptually, pesto.yaml is the implementation-defaults layer: it exists to make “general” behavior deterministic (preprocessing/schema inference defaults, conformal conventions, tie-breaking, etc.), while spaghetti.yaml is meant to express high-level experimental choices.

Merge behavior:

Mappings (dicts) merge recursively.
Non-mappings (scalars, lists) are replaced by the experiment value.

Entry points:

Loader: spaghetti.config.load_layered_config()
Merge implementation: spaghetti.config.deep_merge()

Data model

Core tensors

Each sample is represented as:

x_filled: numeric features after a deterministic fill strategy (e.g., zero/mean/learned token)
mask: binary tensor with 1=observed, 0=missing

The mask is always passed to the encoder.

Input shapes

Spaghetti supports two input modes:

Single-row: x_filled: (B, F), mask: (B, F)
Row-set / subtable: x_filled: (B, S, F), mask: (B, S, F)

Targets

Regression: y_reg as shape (B,)
Multiclass: y_cls as integer ids 0..K-1, shape (B,)

File formats

NPZ (recommended for row-set support)
- Required: X
- Optional: X_cat, y_reg, y_cls
Table files via Polars (single-row only)
- .csv, .parquet/.pq, .ipc/.feather/.arrow
- You specify target columns via CLI flags.

Auto inference rules (current implementation)

“General tabular” often breaks on schema inference; here are the defaults as implemented today:

If X_cat is absent, the model treats everything in X as numeric.
For table inputs (Polars loader):
- Categorical columns default to Polars dtypes Categorical, Enum, Utf8 (strings), and Boolean.
- Numeric columns are cast to float (with missing values preserved as NaN).
- Date/Datetime columns are converted deterministically to epoch-based numeric values.
Text hashing and richer datetime feature extraction described in pesto.yaml are not implemented yet (see “Known gaps”).

Model architecture

Spaghetti is a shared encoder (“trunk”) with up to three heads:

Regression head → predicts ŷ (scalar)
Multiclass head → predicts logits for K classes
Route (reconstruction) head → predicts reconstructed feature values (used for robustness + difficulty)

Note: in the current v1 implementation, the route/reconstruction head reconstructs numeric features only. If you provide categoricals (X_cat / --categorical-cols), they are used by the encoder (Case B) but are not reconstructed; route error and difficulty are therefore computed from numeric features.

Implementation:

Model wrapper: spaghetti.model.model.SpaghettiModel
Heads: spaghetti.model.heads.*

Encoder cases (A–D)

The encoder is selected by rules in spaghetti.yaml:model.architecture_selector.

Case A — MLP baseline

Best for moderate-width mostly-numeric features.

Input representation is a concatenation of:

x_filled
mask
optional per-column id embeddings

Implementation: spaghetti.model.encoders.MLPEncoder

Case B — Mixed numeric + categorical

Adds categorical embeddings for X_cat and fuses numeric + categorical features via an MLP.

Implementation: spaghetti.model.encoders.MixedTabularEncoder

Case C — Feature-token Transformer

Treats each feature as a token, built from:

numeric value embedding
mask embedding
optional column-id embedding

Then applies a Transformer encoder across features and pools (CLS or mean).

Implementation: spaghetti.model.encoders.FeatureTokenTransformerEncoder

Case D — Row-set / subtable encoder

For permutation-invariant sets of rows (subtables), this uses:

a row encoder (Case A/B/C)
a set aggregator (currently DeepSets pooling)

Implementation: spaghetti.model.encoders.SetEncoder + DeepSetsAggregator

Builder/selector:

spaghetti.model.builder.select_encoder_case()
spaghetti.model.builder.build_model()

Training

Denoising-style augmentation

In addition to natural missingness in the input, Spaghetti optionally applies additional random masking during training. This makes the model robust to heavier corruption and supports “always-on” inference.

Masking patterns implemented:

MCAR (random independent masking)
Block masking (contiguous feature blocks)
Feature-biased masking (optional weights)

Implementation: spaghetti.masking.apply_random_masking()

Losses

Regression: MSE or Huber
Multiclass: cross entropy (optional label smoothing)
Route loss: masked L1/L2 computed only on additionally masked entries

Training loop: spaghetti.trainer.train()

Early stopping

For coverage claims, Spaghetti keeps conformal calibration clean: it uses four splits.

train: fit model parameters
val: early stopping / model selection
calib: conformal thresholds (not used for fitting/selection)
test: final evaluation

In the current implementation, train is automatically split into train/val inside the data loader construction; calib remains reserved for conformal calibration.

Difficulty score

A scalar difficulty is computed per example from:

missing rate: $1 - \mathrm{mean}(mask)$
route self-consistency error: reconstruction error on observed entries

The route error is normalized using calibration statistics (robust z-score by default) and then combined linearly.

Implementation invariants (to avoid leakage and preserve the “difficulty is observable at test time” property):

Route self-consistency error is computed only on observed entries.
Route loss is computed only on additionally masked entries from the training-time augmentation (denoising objective).

Implementation: spaghetti.difficulty.DifficultyScorer

Conformal prediction (split conformal)

Spaghetti uses split conformal:

Train on the train split.
Run the trained model on the calibration split.
Convert calibration nonconformity scores into quantile thresholds.

Split conformal produces prediction intervals/sets calibrated on a held-out calibration set that is not used to fit or select the model, targeting marginal coverage under the usual exchangeability assumption. Difficulty binning is used for stability/adaptivity and should be validated per bin.

Is conformal optional?

Yes. Conformal calibration is now a separate optional circuit.

To enable it at all, set:

conformal.enabled: true

Then you can optionally disable it per head in spaghetti.yaml:

conformal.regression.enabled: false (no conformal intervals)
conformal.multiclass.enabled: false (no conformal prediction sets)

When disabled, Spaghetti still trains and produces point predictions/logits, but it will skip fitting conformal calibrators. The “offline stress tests” section below is currently conformal-metrics-focused (coverage and set/interval size), so it will also be omitted when conformal is fully disabled.

Minimal example:

conformal:
  enabled: false
  regression:
    enabled: false
  multiclass:
    enabled: false

Difficulty binning

Calibration points are grouped into a small number of difficulty bins (quantile bins). Each bin has its own threshold, improving stability when calibration sizes are modest.

Implementation: spaghetti.conformal.make_quantile_bins()

Regression intervals

Nonconformity score: $|y - \hat{y}|$.

Output interval (symmetric):

lo = ŷ - q(bin(d))
hi = ŷ + q(bin(d))

Implementation: spaghetti.conformal.RegressionConformal

Multiclass prediction sets (APS)

APS builds the prediction set by sorting classes by predicted probability and including the top classes until the cumulative probability exceeds a calibrated threshold (with deterministic tie-breaking for equal probabilities).

Implementation: spaghetti.conformal.APSConformal

Offline stress tests

You can evaluate coverage and set/interval sizes under synthetic missingness regimes (e.g., MCAR 10/30/50/70, block missingness, feature-biased missingness). These are configured under evaluation.offline_stress_tests.

Note: the current stress-test implementation reports conformal metrics (coverage, interval width, prediction set size). If conformal is disabled for all heads, the stress-test report is omitted.

Implementation: spaghetti.evaluation.offline_stress_tests()

Running the system

Install

uv sync

Run with `.npz`

uv run spaghetti \
  --defaults-config pesto.yaml \
  --experiment-config spaghetti.yaml \
  --data path/to/dataset.npz

If your .npz includes X_cat, provide cardinalities:

uv run spaghetti \
  --data path/to/dataset.npz \
  --cat-cardinalities 12,7,100

Run with CSV/Parquet (Polars)

uv run spaghetti \
  --data path/to/data.parquet \
  --target-reg target_value \
  --target-cls target_class

Optionally specify categorical columns:

uv run spaghetti \
  --data path/to/data.csv \
  --target-cls label \
  --categorical-cols country,device_type

Repository structure

spaghetti.yaml — experiment-level config
pesto.yaml — deterministic defaults
spaghetti/ — implementation
- cli.py — CLI entrypoint
- config.py — layered YAML merge
- data.py — dataset + splits + loaders
- polars_io.py — Polars loaders for CSV/Parquet/IPC
- masking.py — training-time masking augmentation
- model/ — encoders, heads, model wrapper, builder
- difficulty.py — difficulty computation
- conformal.py — split conformal for regression + multiclass (APS)
- trainer.py — training loop + early stopping + calibration
- evaluation.py — stress-test evaluation

Known gaps (by design, for now)

The Polars loader currently supports single-row data only (row-set inputs should use .npz).
For row-set .npz data, the implementation expects a fixed set size S. If your samples have variable S, pad to a common S_max and set mask=0 (and x_filled=0) for padded entries.
The set encoder uses DeepSets pooling; Set Transformer aggregation is not implemented.
pesto.yaml describes richer “implementation defaults” (schema inference, datetime/text handling, mixed-type reconstruction losses, conformal quantile conventions). The current route head reconstructs numeric features only and uses masked L1/L2; mixed-type route losses are planned but not implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
experiments		experiments
paper		paper
spaghetti		spaghetti
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pesto.yaml		pesto.yaml
pyproject.toml		pyproject.toml
spaghetti.yaml		spaghetti.yaml

Folders and files

Latest commit

History

Repository files navigation

Spaghetti

Quick Start

Contributing

Pipeline (what actually runs)

A. Fix the categorical gap (done)

B. Validate difficulty independence (audited + hardened)

C. Non-Autoregressive Masked Router (implemented)

D. Safe Fusion Gate (implemented)

Data formats

Running

Repository map

What problem this solves

Layered configuration

Data model

Core tensors

Input shapes

Targets

File formats

Auto inference rules (current implementation)

Model architecture

Encoder cases (A–D)

Case A — MLP baseline

Case B — Mixed numeric + categorical

Case C — Feature-token Transformer

Case D — Row-set / subtable encoder

Training

Denoising-style augmentation

Losses

Early stopping

Difficulty score

Conformal prediction (split conformal)

Is conformal optional?

Difficulty binning

Regression intervals

Multiclass prediction sets (APS)

Offline stress tests

Running the system

Install

Run with .npz

Run with CSV/Parquet (Polars)

Repository structure

Known gaps (by design, for now)

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Run with `.npz`

Packages