Skip to content

Add warning to prevent biased transfer when batch effects are missing#394

Open
bhumigaddam wants to merge 2 commits intoamarquand:devfrom
bhumigaddam:improve-transfer-warning
Open

Add warning to prevent biased transfer when batch effects are missing#394
bhumigaddam wants to merge 2 commits intoamarquand:devfrom
bhumigaddam:improve-transfer-warning

Conversation

@bhumigaddam
Copy link

While exploring the transfer functionality in PCNtoolkit, I noticed that it is possible to run a transfer when the new dataset contains fewer batch effects than the dataset used to train the original model. In such situations, the transfer step may still run, but the correction could potentially be biased because the model was trained with a larger set of batch effect levels.

To make this situation clearer to users, this change adds a warning when the transfer dataset contains fewer batch effects than the training dataset. The goal is simply to make users aware of the potential issue so they can interpret the results more carefully.

This change does not modify the underlying behavior of the model; it only adds a warning to improve transparency during transfer operations.

Happy to adjust this if a different warning message or placement would be preferred.

@contsili contsili changed the base branch from master to dev March 13, 2026 10:48
@contsili
Copy link
Collaborator

Hi @bhumigaddam, why did you remove json and in its place add Filelock? We already import filelock in line 45.

Regarding the batch effect warning, I think it is a really good idea. It would help if you could provide an example script that I can reproduce this warning and test it

regarding the warning we use our own warning function that exists inside the output.py file. Please see how we use the warnings in other locations e.g.
Output.warning(Warnings.BLABLA), where BLABLA is a constant inside the output.py file, that way we can reuse warnings and keep them all in one location

@divye-joshi
Copy link

divye-joshi commented Mar 13, 2026

@bhumigaddam Hi! I am also a new contributor.

You can check out the reproduction notebooks I created in issues #396, #378, and #383.
Maybe they will help you replicate your intended idea in the future.

You can also check out my PR #379 where i added a warning for the correct way to create errors/warnings and edit output. py.

I hope this helps you!

@bhumigaddam
Copy link
Author

Hi @bhumigaddam, why did you remove json and in its place add Filelock? We already import filelock in line 45.

Regarding the batch effect warning, I think it is a really good idea. It would help if you could provide an example script that I can reproduce this warning and test it

regarding the warning we use our own warning function that exists inside the output.py file. Please see how we use the warnings in other locations e.g.
Output.warning(Warnings.BLABLA), where BLABLA is a constant inside the output.py file, that way we can reuse warnings and keep them all in one location

Hi @contsili, thanks for taking the time to review the PR and for the helpful suggestions!

Regarding the "json" → "FileLock" change, my intention was to address a file locking issue I encountered while testing on Windows. I thought explicitly using "FileLock" there might help prevent conflicts when accessing files. However, I see now that "filelock" is already imported earlier in the file, so I’ll revisit that part and adjust the implementation accordingly.

For the batch effect warning, the situation I had in mind is when a model is trained on data containing multiple batch effect levels (for example multiple sites), but the dataset used for transfer only contains a subset of those levels. In that case the transfer step still runs, but the correction could potentially be biased because some batch levels seen during training are missing.

A simplified example workflow would look like this:

train a normative model on data containing multiple batch effects

train_data = NormData(...)
model = NormativeModel(...)
model.fit(train_data)

transfer the model to a dataset with fewer batch effects

transfer_data = NormData(...)
model.transfer(transfer_data)

In this scenario the transfer dataset contains fewer batch effect levels than the dataset used to train the model, which is where the warning would be triggered.

I’ll also update the implementation to use the project's internal warning mechanism ("Output.warning") as suggested so it aligns with how warnings are handled in "output.py".

Thanks again for the guidance!

@bhumigaddam
Copy link
Author

Hi @divye-joshi, thanks for sharing these resources and pointing me to your notebooks and PR.

I’ll take a look at the issues (#396, #378, #383) and your PR #379 to better understand how warnings are implemented in the project.

Appreciate the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants