Skip to content

[RFC 0002] Support for cross-version deserialization#324

Merged
andrewherren merged 9 commits intomainfrom
cross_version_deserialization
Mar 20, 2026
Merged

[RFC 0002] Support for cross-version deserialization#324
andrewherren merged 9 commits intomainfrom
cross_version_deserialization

Conversation

@andrewherren
Copy link
Copy Markdown
Collaborator

Implements RFC 0002

Implements RFC 0002 sub-issues #317 and #318:

- #317: Every to_json() / saveBARTModelToJson() / saveBCFModelToJson() call
  now writes a top-level "stochtree_version" string field so that JSONs
  serialized going forward carry an explicit version stamp.

- #318: Two new helpers in both R and Python for use by the forthcoming
  from_json() guards (#319, #320):
  - Python: _get_stochtree_version() and _infer_stochtree_version(json_string)
    in stochtree/utils.py
  - R: getStochtreeVersion() and inferStorchtreeJsonVersion(json_object)
    in R/utils.R

  The inference helper fingerprints a JSON by field presence and returns a
  version bracket string (e.g. "<0.4.1") for use in warning messages when
  deserializing legacy JSONs without a stamp.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
andrewherren and others added 8 commits March 19, 2026 20:41
…319)

Both from_json() and from_json_string_list() in Python (stochtree/bart.py)
and both createBARTModelFromJson() and createBARTModelFromCombinedJson() in
R (R/bart.R) now check for each optional field before reading it, falling
back to a safe default and emitting a descriptive warning that includes the
inferred legacy version bracket from inferStochtreeJsonVersion() /
_infer_stochtree_version().

Fields guarded with defaults:
  - has_rfx_basis / num_rfx_basis → False / 1
  - num_chains → 1
  - keep_every → 1
  - probit_outcome_model → False
  - outcome_model.outcome / outcome_model.link → "continuous" / "identity"
  - rfx_model_spec → "" (warns only when has_rfx=True)
  - covariate_preprocessor / preprocessor_metadata → None / NULL (warns always)

Hard errors are preserved for genuinely unrecoverable fields (forest
structures, outcome_scale, outcome_mean).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guard all optional fields in BCF deserialization (both from_json and
from_json_string_list in Python; createBCFModelFromJson and
createBCFModelFromCombinedJson in R) with presence checks, safe defaults,
and descriptive warnings that include the inferred legacy version bracket.

Fields guarded: has_rfx_basis/num_rfx_basis, multivariate_treatment,
num_chains, keep_every, sample_tau_0, internal_propensity_model,
probit_outcome_model, outcome_model subfolder, rfx_model_spec, and
covariate_preprocessor/preprocessor_metadata.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ializer

createBARTModelFromCombinedJsonString was referencing has_field() and .ver
without defining them, causing R CMD check errors. Also:
- Added the missing guard for preprocessor_metadata in createBARTModelFromCombinedJson
  (was using json_object loop variable instead of json_object_default)
- Made createBARTModelFromCombinedJsonString fully symmetric with
  createBARTModelFromCombinedJson by adding guards for has_rfx_basis,
  num_covariates, num_chains, keep_every, probit_outcome_model,
  outcome_model, and rfx_model_spec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…deserializer

createBCFModelFromCombinedJsonString was missing the .ver/.has_field/
has_subfolder_field helpers entirely, causing errors on any optional-field
guard. Also:
- Guard internal_propensity_model in the initial string-to-object loop
  using json_contains_field_cpp directly (before json_object_default exists)
- Add guards for all optional model_params fields to match
  createBCFModelFromCombinedJson: has_rfx_basis/num_rfx_basis, num_chains,
  keep_every, multivariate_treatment, sample_tau_0, internal_propensity_model,
  probit_outcome_model, outcome_model subfolder, rfx_model_spec
- Guard preprocessor_metadata at the end of the function

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The existing BCF serialization tests only covered createBCFModelFromJsonString.
Add a test covering all five paths: createBCFModelFromJson (in-memory object),
createBCFModelFromJsonString (string), createBCFModelFromJsonFile (file),
createBCFModelFromCombinedJson (list of objects), and
createBCFModelFromCombinedJsonString (list of strings). The combined-string
path would have caught the missing has_field/.ver helpers fixed in the
previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standardize R BCF JSON field names to match Python (canonical):
- Write: initial_sigma2 -> sigma2_init
- Write: b_1_samples -> b1_samples, b_0_samples -> b0_samples

Read side accepts both old and new names across all four deserialization
functions (createBCFModelFromJson, createBCFModelFromCombinedJson,
createBCFModelFromCombinedJsonString) with deprecation warnings for legacy
field names. The R in-memory object fields ($b_0_samples, $b_1_samples,
$model_params$initial_sigma2) are unchanged.

Note: b0/b1 presence check uses has_subfolder_field("parameters", "b1_samples")
since these fields live in the "parameters" subfolder, not at the top level.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two new tests in test-serialization.R:
- Verify that freshly serialized BCF JSON uses canonical names (sigma2_init,
  b1_samples, b0_samples) and not the old names
- Verify that legacy JSON with old names (initial_sigma2, b_1_samples,
  b_0_samples) still deserializes correctly and emits deprecation warnings

The legacy test works by serializing a model, substituting old names with
gsub, then loading the patched JSON string and asserting on warnings and
prediction equality.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add minimal fixture JSONs (~14–22 KB each) for BART and BCF in both
  R and Python test directories; untrack *.json globally except for
  fixtures paths in .gitignore
- Add test/R/testthat/test-serialization-compat.R (26 tests): snapshot
  load + predict tests, and backward-compat tests for missing optional
  fields (outcome_model, multivariate_treatment, internal_propensity_model,
  rfx_model_spec, preprocessor_metadata, num_chains/keep_every, has_rfx_basis)
- Add test/python/test_serialization_compat.py (15 tests): same coverage
  for Python BARTModel and BCFModel
- All R compat tests have skip_on_cran(); cran-bootstrap.R excludes both
  the fixture JSON files and test-serialization-compat.R from the tarball
- Add any::jsonlite to extra-packages in r-test.yml, r-devel-check.yml,
  r-python-slow-api-test.yml, and r-valgrind-check.yml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@andrewherren andrewherren merged commit 8c07433 into main Mar 20, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment