Skip to content

Add Poisson Log-normal likelihood#123

Open
eweine wants to merge 4 commits into
hrue:develfrom
eweine:add_pln
Open

Add Poisson Log-normal likelihood#123
eweine wants to merge 4 commits into
hrue:develfrom
eweine:add_pln

Conversation

@eweine

@eweine eweine commented May 25, 2026

Copy link
Copy Markdown

I'm working on a scientific project involving single cell rna sequencing with sparse data for which I am using a Poisson likelihood, with iid effects as well as other structured effects (e.g., spatial, but the full details are not necessary here). I prefer structured effects + iid with a Poisson likelihood to structured effects alone with a negative binomial likelihood because the former makes it much easier to compare the relative importance of the structured and iid effects.

However, I often notice that when fitting the model to genes with particularly sparse counts, I get 'vb.correction' is aborted warnings. In some simulations, I have found that in these cases the fits can be quite bad relative to mcmc.

It struck me that in these models, perhaps the most challenging aspect is estimating the posterior distribution over the iid effects, which can often have a long left tail. It serves to reason that integrating out these iid effects might then make posterior inference easier.

Below, I've implemented the Poisson log-normal distribution, using adaptive Gauss-Hermite quadrature to numerically evaluate the likelihood.

If you run the example below, you will see in the second case with small rates that the new Poisson log-normal likelihood recovers substantially more accurate parameter estimates than the Poisson + iid model, which throws a vb.correction warning.

## Test Poisson log-normal likelihood
## Compares:
##   Model 1: Poisson + per-observation iid random effect (reference)
##   Model 2: poissonlognormal likelihood across nquad = 10, 25, 50

library(INLA)

pln_summary <- function(m, label) {
    cat(sprintf("  %-20s intercept=%.4f  prec=%.4f  DIC=%.1f\n",
                label,
                m$summary.fixed[1, "mean"],
                inla.mmarginal(m$marginals.hyperpar[[1]]),
                m$dic$dic))
}

## ============================================================
## Large-rate data  (beta0=5, sigma=1, mean(y) ~ 231)
## ============================================================
cat("=== Large-rate data (beta0=5, sigma=1) ===\n")
set.seed(42)
n          <- 500
beta0_true <- 5.0
sigma_true <- 1.0
prec_true  <- 1 / sigma_true^2

u <- rnorm(n, 0, sigma_true)
y <- rpois(n, exp(beta0_true + u))
cat(sprintf("  n=%d  mean(y)=%.1f  fraction zero=%.3f\n\n",
            n, mean(y), mean(y == 0)))

cat("  True: intercept=5.0  prec=1.0\n\n")

## Poisson + iid (reference)
idx <- 1:n
m1 <- inla(y ~ 1 + f(idx, model = "iid"),
           family          = "poisson",
           data            = data.frame(y = y, idx = idx),
           control.compute = list(dic = TRUE))
pln_summary(m1, "Poisson+iid")

## poissonlognormal across nquad values
for (nq in c(10L, 15L, 25L, 50L)) {
    m <- inla(y ~ 1,
              family          = "poissonlognormal",
              data            = data.frame(y = y),
              control.family  = list(nquad = nq),
              control.compute = list(dic = TRUE))
    pln_summary(m, sprintf("PLN nquad=%d", nq))
}

## ============================================================
## Sparse data  (beta0=-3.5, sigma=2, ~89% zeros)
## ============================================================
cat("\n=== Sparse data (beta0=-3.5, sigma=2) ===\n")
set.seed(123)
n_s          <- 10000
beta0_sparse <- -3.5
sigma_sparse <- 2.0
prec_sparse  <- 1 / sigma_sparse^2

u_s <- rnorm(n_s, 0, sigma_sparse)
y_s <- rpois(n_s, exp(beta0_sparse + u_s))
cat(sprintf("  n=%d  mean(y)=%.4f  fraction zero=%.3f\n\n",
            n_s, mean(y_s), mean(y_s == 0)))

cat(sprintf("  True: intercept=%.1f  prec=%.4f\n\n", beta0_sparse, prec_sparse))

## Poisson + iid (reference)
idx_s <- 1:n_s
m1s <- inla(y ~ 1 + f(idx, model = "iid"),
            family          = "poisson",
            data            = data.frame(y = y_s, idx = idx_s),
            control.compute = list(dic = TRUE),
            safe            = TRUE)
pln_summary(m1s, "Poisson+iid")

## poissonlognormal across nquad values
for (nq in c(10L, 15L, 25L, 50L)) {
    m <- inla(y ~ 1,
              family          = "poissonlognormal",
              data            = data.frame(y = y_s),
              control.family  = list(nquad = nq),
              control.compute = list(dic = TRUE))
    pln_summary(m, sprintf("PLN nquad=%d", nq))
}

cat("\n=== Done ===\n")

@hrue

hrue commented May 25, 2026 via email

Copy link
Copy Markdown
Owner

@eweine

eweine commented May 26, 2026

Copy link
Copy Markdown
Author

No worries, you've more than made up for any bad past decisions by providing this great tool!

I'm going the cloglik route. I think I have it working!

@hrue

hrue commented May 26, 2026 via email

Copy link
Copy Markdown
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants