Skip to content

enh - add Y transformation keyword argument in NormativeModel to enforce positivity#372

Merged
contsili merged 26 commits intodevfrom
log_positivity
Mar 5, 2026
Merged

enh - add Y transformation keyword argument in NormativeModel to enforce positivity#372
contsili merged 26 commits intodevfrom
log_positivity

Conversation

@contsili
Copy link
Collaborator

@contsili contsili commented Feb 12, 2026

Closes #356

If Y<-1 then log(Y+1) cant be computed (ValueError) but I expect Y to always be >=-1 (since the functionality i added in this pr is only for data that are always positive). If we want to be conservative we can add though a check like:

# Validate that values are > -1 (required for log1p)
min_val = float(np.min(data[var].values))
if min_val <= -1.0:
    raise ValueError(
        f"y_transform='log1p' requires all {var} values > -1 but your {var} are not"
    )

We can still consider to add a test and/or an example.

@AuguB
Copy link
Collaborator

AuguB commented Feb 12, 2026

Will this solve the problem addressed in issue #365? As I see it, since the standardization happens after the log transform, this can and probably will still yield negative values, resulting in invalid input to WarpLog.

Additionally, why would we apply a log transform as a preparation for a log transform?

Some ideas:

  • Explicitly disallowing standardize+WarpLog. A minmaxscaler can be used instead. Informative messages to the user can be helpful here.
  • Adding a new scaler that only divides the data by its standard deviation, which amounts to a scaling around y=0; positive data stays positive. Such a scaler could be automatically applied when all of the following hold:
  1. Y is strictly positive
  2. BLR with Warplog as regressionmodel
  3. The selected outscaler is standardize
    Then we can safely overwrite the outscaler with the new scaler (e.g. scale_only_standardize)

I do agree the option to add a log transform on Y as it is implemented now is a good idea, and it should be kept somehow. I just don't think it really solves the problem of using WarpLog + standardize. EDIT: I realize now that this PR is not intended to solve #365.

@contsili
Copy link
Collaborator Author

contsili commented Feb 13, 2026

as you mentioned in the end of your previous comment, this pr is not meant to solve #365, but #356. #356 was proposed by @amarquand. It is meant to not allow for strictly positive response variables (like brain volumes or WM hypoinstensities) to have negative centiles (see for example, fig. 6(e) in [1]

For #365 for now we implemented something close to your first idea: see #366. However, your second idea is also a good one. We can make an issue to implement it.

@amarquand
Copy link
Owner

Indeed. We have a user that wants to apply the shash model to strictly positive data, and get the centiles back in the original space. But @AuguB is right that the interaction with the inscaler and outscaler needs to be carefully considered.

Has this been tested?

@contsili
Copy link
Collaborator Author

contsili commented Feb 13, 2026

I wrote an example script which i have locally.

I used inscaler and outscaler = 'standardise', dataset fcon1000:

1. BLR(heteroskedastic=True)
Not enforced positivity:
image

Enforced positivity:
image

2. HBR() (default parameters)
Not enforced positivity
image

Enforced positivity:
image

@amarquand how do these results look?

@contsili
Copy link
Collaborator Author

contsili commented Feb 24, 2026

I will add a test script for this feature and then we are ready to merge it

contsili added 8 commits March 2, 2026 16:49
That way it is clear what is tested:
main: tests the main normative model functions
helper: tests the helper functions
- Add BLR model fixtures (instead of test model fixtures)
- adjust assertions: centiles and yhat should be > -1
test_normative_model is the same as test_normative_model_helper
@contsili
Copy link
Collaborator Author

contsili commented Mar 4, 2026

After discussion with @amarquand: I will implement two keyword arguments

  1. y_transform = 'log'
  2. y_transform = 'log1p'

In the case of 'log' we expect all centiles >=0 vs 'log1p' we expect centiles >=-1

@contsili
Copy link
Collaborator Author

contsili commented Mar 5, 2026

from the tests the centile plots below are created (model parameters: BLR with linear basis, homoskedasticity, no warping) :
y-transform = None
centiles_response_var_0_from_arrays_harmonized

y-transform = 'log'
centiles_response_var_0_from_arrays_harmonized

y-transform = 'log1p'
centiles_response_var_0_from_arrays_harmonized

We see that the fit is not good as we have non linear data with negative skewed and fit a very simple blr.

An interesting finding is that with the y-transform = 'log' all the centiles are positive but the 95th is quite far from the others. I belive that happens cause when the model’s predictions are made in log space and then convert them back to the original scale using exp(), the exp() "inflates" the differences between centiles.

The reason why the 95th centile is more inflated in the log case vs the log1p case is:
log1p(Y) maps this to [0, ...]
log(Y) maps this to [-inf, ...]

This wide range of log(Y) make the std of Y a lot higher than the log1p(Y), leading to an inflated centile = exp( Z * std + mean1)

@contsili contsili merged commit 4970689 into dev Mar 5, 2026
2 checks passed
@contsili contsili deleted the log_positivity branch March 5, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants