[RFC 0002] Support for cross-version deserialization#324
Merged
andrewherren merged 9 commits intomainfrom Mar 20, 2026
Merged
Conversation
Implements RFC 0002 sub-issues #317 and #318: - #317: Every to_json() / saveBARTModelToJson() / saveBCFModelToJson() call now writes a top-level "stochtree_version" string field so that JSONs serialized going forward carry an explicit version stamp. - #318: Two new helpers in both R and Python for use by the forthcoming from_json() guards (#319, #320): - Python: _get_stochtree_version() and _infer_stochtree_version(json_string) in stochtree/utils.py - R: getStochtreeVersion() and inferStorchtreeJsonVersion(json_object) in R/utils.R The inference helper fingerprints a JSON by field presence and returns a version bracket string (e.g. "<0.4.1") for use in warning messages when deserializing legacy JSONs without a stamp. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This was
linked to
issues
Mar 19, 2026
…319) Both from_json() and from_json_string_list() in Python (stochtree/bart.py) and both createBARTModelFromJson() and createBARTModelFromCombinedJson() in R (R/bart.R) now check for each optional field before reading it, falling back to a safe default and emitting a descriptive warning that includes the inferred legacy version bracket from inferStochtreeJsonVersion() / _infer_stochtree_version(). Fields guarded with defaults: - has_rfx_basis / num_rfx_basis → False / 1 - num_chains → 1 - keep_every → 1 - probit_outcome_model → False - outcome_model.outcome / outcome_model.link → "continuous" / "identity" - rfx_model_spec → "" (warns only when has_rfx=True) - covariate_preprocessor / preprocessor_metadata → None / NULL (warns always) Hard errors are preserved for genuinely unrecoverable fields (forest structures, outcome_scale, outcome_mean). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Guard all optional fields in BCF deserialization (both from_json and from_json_string_list in Python; createBCFModelFromJson and createBCFModelFromCombinedJson in R) with presence checks, safe defaults, and descriptive warnings that include the inferred legacy version bracket. Fields guarded: has_rfx_basis/num_rfx_basis, multivariate_treatment, num_chains, keep_every, sample_tau_0, internal_propensity_model, probit_outcome_model, outcome_model subfolder, rfx_model_spec, and covariate_preprocessor/preprocessor_metadata. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ializer createBARTModelFromCombinedJsonString was referencing has_field() and .ver without defining them, causing R CMD check errors. Also: - Added the missing guard for preprocessor_metadata in createBARTModelFromCombinedJson (was using json_object loop variable instead of json_object_default) - Made createBARTModelFromCombinedJsonString fully symmetric with createBARTModelFromCombinedJson by adding guards for has_rfx_basis, num_covariates, num_chains, keep_every, probit_outcome_model, outcome_model, and rfx_model_spec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…deserializer createBCFModelFromCombinedJsonString was missing the .ver/.has_field/ has_subfolder_field helpers entirely, causing errors on any optional-field guard. Also: - Guard internal_propensity_model in the initial string-to-object loop using json_contains_field_cpp directly (before json_object_default exists) - Add guards for all optional model_params fields to match createBCFModelFromCombinedJson: has_rfx_basis/num_rfx_basis, num_chains, keep_every, multivariate_treatment, sample_tau_0, internal_propensity_model, probit_outcome_model, outcome_model subfolder, rfx_model_spec - Guard preprocessor_metadata at the end of the function Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The existing BCF serialization tests only covered createBCFModelFromJsonString. Add a test covering all five paths: createBCFModelFromJson (in-memory object), createBCFModelFromJsonString (string), createBCFModelFromJsonFile (file), createBCFModelFromCombinedJson (list of objects), and createBCFModelFromCombinedJsonString (list of strings). The combined-string path would have caught the missing has_field/.ver helpers fixed in the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standardize R BCF JSON field names to match Python (canonical):
- Write: initial_sigma2 -> sigma2_init
- Write: b_1_samples -> b1_samples, b_0_samples -> b0_samples
Read side accepts both old and new names across all four deserialization
functions (createBCFModelFromJson, createBCFModelFromCombinedJson,
createBCFModelFromCombinedJsonString) with deprecation warnings for legacy
field names. The R in-memory object fields ($b_0_samples, $b_1_samples,
$model_params$initial_sigma2) are unchanged.
Note: b0/b1 presence check uses has_subfolder_field("parameters", "b1_samples")
since these fields live in the "parameters" subfolder, not at the top level.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two new tests in test-serialization.R: - Verify that freshly serialized BCF JSON uses canonical names (sigma2_init, b1_samples, b0_samples) and not the old names - Verify that legacy JSON with old names (initial_sigma2, b_1_samples, b_0_samples) still deserializes correctly and emits deprecation warnings The legacy test works by serializing a model, substituting old names with gsub, then loading the patched JSON string and asserting on warnings and prediction equality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add minimal fixture JSONs (~14–22 KB each) for BART and BCF in both R and Python test directories; untrack *.json globally except for fixtures paths in .gitignore - Add test/R/testthat/test-serialization-compat.R (26 tests): snapshot load + predict tests, and backward-compat tests for missing optional fields (outcome_model, multivariate_treatment, internal_propensity_model, rfx_model_spec, preprocessor_metadata, num_chains/keep_every, has_rfx_basis) - Add test/python/test_serialization_compat.py (15 tests): same coverage for Python BARTModel and BCFModel - All R compat tests have skip_on_cran(); cran-bootstrap.R excludes both the fixture JSON files and test-serialization-compat.R from the tarball - Add any::jsonlite to extra-packages in r-test.yml, r-devel-check.yml, r-python-slow-api-test.yml, and r-valgrind-check.yml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements RFC 0002