Some bugs found by the bot#438
Draft
PascalIversen wants to merge 4 commits into
Draft
Conversation
…ures load_and_select_gene_features built indices_to_keep by iterating the source CSV columns, so feature values kept the source (CSV) column order while meta_info was relabeled to the gene-list order. When the two orders differ, every value is misaligned with its gene name. This is invisible within a single dataset (train and predict share the CSV column order) but silently corrupts cross-study prediction across datasets whose CSVs list genes in different orders (e.g. CTRPv2 -> GDSC for methylation and for DIPK's expression input). Build indices_to_keep by mapping each gene in ordered_genes to its source column index, so values are selected AND reordered to match ordered_genes / meta_info. Add a regression test asserting values stay aligned to their gene names when the gene-list order differs from the CSV order; the existing tests sorted colnames or only compared shape, so they missed this.
#2 cli_model_testing: _prep_data_for_final_prediction re-fetched the raw, unmasked split["validation_es"]/split["early_stopping"] after get_datasets_from_cv_split had already returned drug-masked copies. For single-drug models this re-introduced all drugs' rows into the single-drug train/early-stopping sets. Drop the redundant override and rely on the helper's return values (which already mask per drug and set es_dataset=None for non-early-stopping models), matching run_train_and_predict_cv. #3 curvecurator: _prepare_raw_data unconditionally used "replicate" in the duplicate check and groupby even though the no-replicate input shape is explicitly supported (pivot_columns=["dose"]), raising KeyError. Build the duplicate-key subset from pivot_columns so both shapes work.
…B reg_lambda, CLI subcommand detection #4 consolidate_single_drug_model_predictions rebound the out_path parameter inside the model loop (out_path = os.path.join(out_path, ...)), so a second single-drug model nested under the first (out_path/model1/model2). Use a local model_out_path for the write paths. #5 cross_study_prediction added early-stopping rows to the caller's train_dataset in place; that object is reused across cross-study datasets and by randomization/robustness tests. Copy train_dataset first (mirroring dataset). #6 SimpleNeuralNetwork checkpoint guard used `is not None`, but Lightning's best_model_path defaults to "" (never None), so the fallback was dead and a missing checkpoint hit torch.load(""). Use a truthiness check. #7 MultiViewXGBoost tuned reg_lambda in hyperparameters.yaml but never passed it to XGBRegressor (stayed at default 1). Forward reg_lambda. #8 _active_list_options treated any token matching a subcommand name as the subcommand, so an option value like `--run_id report` suppressed root list expansion. The pipeline is an invoke_without_command group, so only argv[0] can be the subcommand; inspect just that. Add a regression test.
…s, KNN dead hparam
evaluation.evaluate: only prediction NaNs were masked; NaN *responses* with valid
predictions corrupted the metric. Mask rows where either is NaN, NaN if <2 valid pairs.
FeatureDataset.add_meta_info: drop the dead `if self.meta_info is None` branch
(meta_info is a read-only property always initialized to {}; the assignment would
have raised). Update in place.
SimpleNeuralNetwork: forward used x.squeeze(), collapsing a batch-of-1 prediction
to a 0-d scalar; use squeeze(-1). DataLoader drop_last=True could empty the loader
for a training set smaller than one batch; guard with len(train_dataset) > batch_size.
KNNRegressor: remove the `variance` grid from hyperparameters.yaml -- build_model
never read it, so it only multiplied the tuning grid 8x with identical models.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft, to review.