Skip to content

feat: Parse results from JSON#291

Draft
mhovd wants to merge 3 commits intomainfrom
parse-json
Draft

feat: Parse results from JSON#291
mhovd wants to merge 3 commits intomainfrom
parse-json

Conversation

@mhovd
Copy link
Collaborator

@mhovd mhovd commented Feb 8, 2026

Work in progress for parsing the new JSON format.

Copilot summary of the PR below.


This pull request refactors the R6 class constructors and data loading logic for several Pmetrics output classes (PM_cov, PM_cycle, PM_final, PM_op) to support direct loading from a parsed JSON object (typically from result.json) rather than from CSV files. The changes improve integration with the Rust NPAG implementation, streamline data access, and reduce reliance on intermediate files. The PM_parse function is also updated to clarify JSON usage and allow custom result file names.

Refactoring for JSON-based data loading:

Constructor and interface changes

  • All relevant R6 classes (PM_cov, PM_cycle, PM_final, PM_op) now accept a json parameter instead of a file path, allowing objects to be constructed directly from parsed JSON data. [1] [2] [3] [4]

Internal data loading logic

  • The private$make methods in each class are updated to extract required data directly from the JSON structure (e.g., posterior, covariates, cycles, predictions, settings), replacing previous CSV file reading logic. This includes handling of covariate segments, cycle statistics, and prediction results. [1] [2] [3] [4]

Error handling and warnings

  • Error messages are updated to reference missing JSON data rather than missing files, improving clarity for users working with JSON output. [1] [2] [3] [4]

PM_parse improvements

  • The PM_parse function documentation and interface are updated to clarify that it reads a single JSON file, and a new result_file argument allows customization of the file name. [1] [2]

Constructor and interface changes

  • All R6 classes now accept a json parameter for direct JSON-based initialization, replacing the previous path parameter. [1] [2] [3] [4]

Internal data loading logic

  • Data extraction logic in private$make methods is refactored to parse relevant fields from the JSON structure, eliminating CSV file reads and handling complex data structures such as covariate segments and cycle statistics. [1] [2] [3] [4]

Error handling and warnings

  • User-facing error messages now reference missing JSON data instead of missing files, providing clearer guidance when expected JSON fields are absent. [1] [2] [3] [4]

PM_parse improvements

  • The PM_parse function is updated to document JSON usage and allow specification of the result file name via a new result_file argument. [1] [2]

Copilot AI review requested due to automatic review settings February 8, 2026 13:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a work-in-progress refactor to support parsing Rust NPAG output directly from a parsed result.json object, reducing reliance on intermediate CSV/JSON files and aligning Pmetrics’ R6 result objects with the new backend output format.

Changes:

  • Update PM_parse() to read a single result JSON file (configurable name) and pass the parsed JSON into component constructors.
  • Refactor several result R6 classes to build their $data directly from the parsed JSON structure instead of reading per-component CSV files.
  • Update .Rd documentation to describe the new JSON-based inputs and result_file option.

Reviewed changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
R/PM_parse.R Reads result.json and constructs a PM_result using JSON-backed component constructors.
R/PM_op.R Switches observed-vs-predicted construction to use JSON predictions/settings.
R/PM_cycle.R Switches cycle log construction to use JSON cyclelog/predictions/settings.
R/PM_final.R Switches final-cycle/posterior construction to use JSON theta/posterior/settings.
R/PM_cov.R Switches covariate + posterior join construction to use JSON subject covariate segments and posterior matrix.
R/PM_pop.R Switches population predictions construction to use JSON predictions.
R/PM_post.R Switches posterior predictions construction to use JSON predictions.
man/PM_parse.Rd Documents result_file and clarifies single-file JSON parsing behavior.
man/PM_op.Rd Documents json input and deprecates path for this constructor.
man/PM_cycle.Rd Documents json input and deprecates path for this constructor.
man/PM_final.Rd Documents json input and deprecates path for this constructor.
man/PM_cov.Rd Documents json input and deprecates path for this constructor.
man/PM_pop.Rd Documents json input and deprecates path for this constructor.
man/PM_post.Rd Documents json input and deprecates path for this constructor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cli::cli_abort(c("x" = "The result file {.file {result_path}} does not exist."))
}
result_json <- jsonlite::fromJSON(
suppressWarnings(readLines(result_path, warn = FALSE)),
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PM_parse() reads result.json via readLines() and passes the character vector to jsonlite::fromJSON(). This duplicates the file in memory and can be fragile for large/pretty-printed JSON. Prefer calling jsonlite::fromJSON(result_path, ...) directly (or use a connection/streaming parser) to avoid unnecessary memory overhead and parsing edge cases.

Suggested change
suppressWarnings(readLines(result_path, warn = FALSE)),
result_path,

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +98
data = fit_object$data,
model = fit_object$model,
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the fit file is missing, fit_object is set to NULL, but core still unconditionally accesses fit_object$data and fit_object$model, which will error ($ operator is invalid for atomic vectors/NULL). PM_parse() should either require fit (and abort with a clear message) or gracefully set data/model to NULL when fit_object is NULL.

Suggested change
data = fit_object$data,
model = fit_object$model,
data = if (!is.null(fit_object)) fit_object$data else NULL,
model = if (!is.null(fit_object)) fit_object$model else NULL,

Copilot uses AI. Check for mistakes.
backend = "rust",
algorithm = "NPAG",
numeqt = 1,
converge = cycle$data$converged,
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core$converge is populated from cycle$data$converged, but PM_cycle objects don’t appear to create a converged field (they expose status). This will make converge always NULL and can break downstream logic expecting a boolean. Consider setting converge based on cycle$data$status == "Converged" (or add an explicit converged field to PM_cycle$data).

Suggested change
converge = cycle$data$converged,
converge = {
if (!is.null(cycle) && !is.null(cycle$data)) {
if (!is.null(cycle$data$converged)) {
cycle$data$converged
} else if (!is.null(cycle$data$status)) {
cycle$data$status == "Converged"
} else {
NA
}
} else {
NA
}
},

Copilot uses AI. Check for mistakes.
if (!is.null(json) && !is.null(json$data$subjects)) {
subjects <- json$data$subjects
n_subj <- nrow(subjects)
cov_list <- vector("list", n_subj * 50L)
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cov_list is preallocated to n_subj * 50L, but the nested loops can generate more than 50 rows per subject (multiple occasions × multiple segments × multiple unique times). If list_idx exceeds this size, cov_list[[list_idx]] <- ... will throw “subscript out of bounds”. Use a growable approach (e.g., append to a list, or dynamically extend the preallocated vector when needed).

Suggested change
cov_list <- vector("list", n_subj * 50L)
cov_list <- list()

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +201
# Deduplicate: keep one row per unique covariate combination
# within each subject/block, retaining the earliest time
covs <- covs %>%
dplyr::group_by(id, block) %>%
dplyr::distinct(
dplyr::across(dplyr::all_of(cov_names)),
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deduplication step uses cov_names from the last occasion processed (distinct(across(all_of(cov_names)))). If covariate names differ across occasions/subjects, this will ignore some covariate columns and produce incorrect deduplication. Build the full set of covariate column names from covs (e.g., setdiff(names(covs), c('id','time','block'))) before calling distinct().

Suggested change
# Deduplicate: keep one row per unique covariate combination
# within each subject/block, retaining the earliest time
covs <- covs %>%
dplyr::group_by(id, block) %>%
dplyr::distinct(
dplyr::across(dplyr::all_of(cov_names)),
# Determine all covariate columns present (exclude id, time, block)
cov_colnames <- setdiff(names(covs), c("id", "time", "block"))
# Deduplicate: keep one row per unique covariate combination
# within each subject/block, retaining the earliest time
covs <- covs %>%
dplyr::group_by(id, block) %>%
dplyr::distinct(
dplyr::across(dplyr::all_of(cov_colnames)),

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +207
if (!is.null(json) && !is.null(json$settings)) {
config <- json$settings
} else if (inherits(data, "PM_cycle") & !is.null(data$data)) {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config <- json$settings assigns errormodels$models from the JSON array. Later in this function, model types are derived via names(config$errormodels$models), but JSON arrays typically have no names, which will cause a length mismatch/error when building model_types. Derive model types by inspecting each model element (e.g., take names(m)[1] for list entries and "None" for string entries) instead of relying on names().

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant