Add warning to prevent biased transfer when batch effects are missing#394
Add warning to prevent biased transfer when batch effects are missing#394bhumigaddam wants to merge 2 commits intoamarquand:devfrom
Conversation
|
Hi @bhumigaddam, why did you remove json and in its place add Filelock? We already import filelock in line 45. Regarding the batch effect warning, I think it is a really good idea. It would help if you could provide an example script that I can reproduce this warning and test it regarding the warning we use our own warning function that exists inside the output.py file. Please see how we use the warnings in other locations e.g. |
|
@bhumigaddam Hi! I am also a new contributor. You can check out the reproduction notebooks I created in issues #396, #378, and #383. You can also check out my PR #379 where i added a warning for the correct way to create errors/warnings and edit output. py. I hope this helps you! |
Hi @contsili, thanks for taking the time to review the PR and for the helpful suggestions! Regarding the "json" → "FileLock" change, my intention was to address a file locking issue I encountered while testing on Windows. I thought explicitly using "FileLock" there might help prevent conflicts when accessing files. However, I see now that "filelock" is already imported earlier in the file, so I’ll revisit that part and adjust the implementation accordingly. For the batch effect warning, the situation I had in mind is when a model is trained on data containing multiple batch effect levels (for example multiple sites), but the dataset used for transfer only contains a subset of those levels. In that case the transfer step still runs, but the correction could potentially be biased because some batch levels seen during training are missing. A simplified example workflow would look like this: train a normative model on data containing multiple batch effectstrain_data = NormData(...) transfer the model to a dataset with fewer batch effectstransfer_data = NormData(...) In this scenario the transfer dataset contains fewer batch effect levels than the dataset used to train the model, which is where the warning would be triggered. I’ll also update the implementation to use the project's internal warning mechanism ("Output.warning") as suggested so it aligns with how warnings are handled in "output.py". Thanks again for the guidance! |
|
Hi @divye-joshi, thanks for sharing these resources and pointing me to your notebooks and PR. I’ll take a look at the issues (#396, #378, #383) and your PR #379 to better understand how warnings are implemented in the project. Appreciate the help! |
While exploring the transfer functionality in PCNtoolkit, I noticed that it is possible to run a transfer when the new dataset contains fewer batch effects than the dataset used to train the original model. In such situations, the transfer step may still run, but the correction could potentially be biased because the model was trained with a larger set of batch effect levels.
To make this situation clearer to users, this change adds a warning when the transfer dataset contains fewer batch effects than the training dataset. The goal is simply to make users aware of the potential issue so they can interpret the results more carefully.
This change does not modify the underlying behavior of the model; it only adds a warning to improve transparency during transfer operations.
Happy to adjust this if a different warning message or placement would be preferred.