-
Notifications
You must be signed in to change notification settings - Fork 35
Description
In muscat:::.limma, the call to limma::lmFit passes weights = w. I believe this tells limma to completely ignore the voom weights stored in y and use w instead. I'm guessing this is not intended, as the resulting analysis could not be accurately described as "limma-voom" since it is not using the voom weights at all.
I believe a more correct procedure would be:
- Compute the sample precision weights
wbefore runningvoom - Pass
weights = wtovoom - Multiply
winto the voom weights before runninglmFit, and don't passweights = wtolmFit.
In code, this would look something like:
w <- .n_cells(x)[k, colnames(x)]
if (method == "voom") {
trend <- robust <- FALSE
y <- suppressMessages(DGEList(y, remove.zeros = TRUE))
y <- calcNormFactors(y)
y <- voom(y, design, weights = w)
# Obviously for release, do the usual package namespace thing instead of "dream::"
y <- dream::applyQualityWeights(y, w)
fit <- lmFit(y, design)
} else {
# Non-voom case: still need to pass the weights to lmFit
fit <- lmFit(y, design, weights = w)
}Now, I said "more correct" and not "correct" because this procedure has a potential issue: voom already partially accounts for sequencing depth in its computed weights, because it converts each observation's logCPM back to a raw count scale before mapping it onto the mean-variance trend to compute its precision weight, which means that in the common case of the trend having a negative slope, samples with a lower sequencing depth also get a lower weight for all genes. (This occurs regardless of whether or not you provide the cell counts as prior weights.) The problem comes when you multiply in the sample precision weights, which are equal to the cell count, which is inherently highly correlated to the "sequencing depth" of a pseudobulk sample, i.e. samples with lower total counts have lower cell counts and therefore lower weights. This means that when you multiply the voom weights by the sample precision weights, you are doubly down-weighting low-cell-count, low-read-count samples, and conversely you are doubly up-weighting high-count samples. The core issue, I believe, is that voom implicitly assumes that any prior weights passed to it are based on something unrelated to sequencing depth, such as differences in biological variability between groups, e.g. tumor vs. normal. This is how voom is used within voomWithQualityWeights.
I am not sure exactly how to address this issue. A more conservative procedure would be to just use the voom weights as-is in lmFit without multiplying w into them and without passing w to lmFit, under the assumption that the voom weights already adequately account for the differences in sequencing depth that result from differences in cell counts (incidentally, I believe this is what dreamlet currently does). There are also several other approaches I can think of that might be reasonable, and I can't say a priori which one is most correct in theory or which one performs best in practice.