pbDS limma-voom method appears to ignore voom weights after computing them

In [`muscat:::.limma`](https://github.com/HelenaLC/muscat/blob/master/R/utils-pbDS.R#L104), the call to `limma::lmFit` passes `weights = w`. I believe this tells limma to completely ignore the voom weights stored in `y` and use `w` instead. I'm guessing this is not intended, as the resulting analysis could not be accurately described as "limma-voom" since it is not using the voom weights at all.

I believe a more correct procedure would be:

1. Compute the sample precision weights `w` before running `voom`
2. Pass `weights = w` to `voom`
3. Multiply `w` into the voom weights before running `lmFit`, and *don't* pass `weights = w` to `lmFit`.

In code, this would look something like:

```R
w <- .n_cells(x)[k, colnames(x)]
if (method == "voom") {
  trend <- robust <- FALSE
  y <- suppressMessages(DGEList(y, remove.zeros = TRUE))
  y <- calcNormFactors(y)
  y <- voom(y, design, weights = w)
  # Obviously for release, do the usual package namespace thing instead of "dream::"
  y <- dream::applyQualityWeights(y, w)
  fit <- lmFit(y, design)
} else {
  # Non-voom case: still need to pass the weights to lmFit
  fit <- lmFit(y, design, weights = w)
}
```

Now, I said "more correct" and not "correct" because this procedure has a potential issue: voom already partially accounts for sequencing depth in its computed weights, because it converts each observation's logCPM back to a raw count scale before mapping it onto the mean-variance trend to compute its precision weight, which means that in the common case of the trend having a negative slope, samples with a lower sequencing depth also get a lower weight for all genes. (This occurs regardless of whether or not you provide the cell counts as prior weights.) The problem comes when you multiply in the sample precision weights, which are equal to the cell count, which is inherently highly correlated to the "sequencing depth" of a pseudobulk sample, i.e. samples with lower total counts have lower cell counts and therefore lower weights. This means that when you multiply the voom weights by the sample precision weights, you are *doubly* down-weighting low-cell-count, low-read-count samples, and conversely you are doubly up-weighting high-count samples. The core issue, I believe, is that `voom` implicitly assumes that any prior weights passed to it are based on something unrelated to sequencing depth, such as differences in biological variability between groups, e.g. tumor vs. normal. This is how voom is used within `voomWithQualityWeights`.

I am not sure exactly how to address this issue. A more conservative procedure would be to just use the voom weights as-is in `lmFit` without multiplying `w` into them *and* without passing `w` to `lmFit`, under the assumption that the voom weights already adequately account for the differences in sequencing depth that result from differences in cell counts (incidentally, I believe this is what dreamlet currently does). There are also several other approaches I can think of that might be reasonable, and I can't say *a priori* which one is most correct in theory or which one performs best in practice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pbDS limma-voom method appears to ignore voom weights after computing them #119

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

pbDS limma-voom method appears to ignore voom weights after computing them #119

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions