Hello! Thank you for developing PRS-CSx - it is a very useful package.
I have a conceptual question about parameter tuning and weight estimation for the final PRS model. I want to confirm whether my understanding is correct. In a validation dataset, one selects the optimal global shrinkage parameter by identifying the ancestry-specific PRS that shows the best predictive performance, and then, in the same validation sample, fits a model such as
y = w1·PRS1 + w2·PRS2 + w3·PRS3
to estimate the linear combination weights. These selected parameters and weights are then applied to an independent test dataset to evaluate final performance.
Is this understanding correct? And is it statistically appropriate to use the same validation dataset both to tune the individual PRS parameters and to estimate the weights for the combined PRS, assuming that final evaluation is conducted in a fully independent test sample? Are there any potential concerns with this approach?
Hello! Thank you for developing PRS-CSx - it is a very useful package.
I have a conceptual question about parameter tuning and weight estimation for the final PRS model. I want to confirm whether my understanding is correct. In a validation dataset, one selects the optimal global shrinkage parameter by identifying the ancestry-specific PRS that shows the best predictive performance, and then, in the same validation sample, fits a model such as
y = w1·PRS1 + w2·PRS2 + w3·PRS3
to estimate the linear combination weights. These selected parameters and weights are then applied to an independent test dataset to evaluate final performance.
Is this understanding correct? And is it statistically appropriate to use the same validation dataset both to tune the individual PRS parameters and to estimate the weights for the combined PRS, assuming that final evaluation is conducted in a fully independent test sample? Are there any potential concerns with this approach?