Add Poisson Log-normal likelihood#123
Open
eweine wants to merge 4 commits into
Open
Conversation
Owner
|
Thank you. Impressive. Sorry that you had to dog into that jungle of a code, to
define this. as you understand, bad choice made long time ago... Sorry about
that.
Anyway, there are two ways forward as I see it.
- If you add your c-code into 'cloglike', see inla.doc("cloglike"), then you can
load your own code directly into inla-binary, using dynamic loading, without the
hassle to add all the other stuff. In this way you can support your own
likelihood (or a latent model via 'cgeneric'). This can also be bundled into
your own R-package, and there is also an option to compile your code into inla-
binary at R-INLA build time. This is done already for projects like
'INLAspacetime', 'rSPDE' etc. In this way, you can 'own' your own code and
maintain it separately.
the added work for you is minor, as the body of your likelihood is the same and
already done, you only need to add the evaluatation of the prior. data comes in
a little different, but its minor. please check it out and let me know what you
think.
- the main reason for the above suggestion, is that INLA already have an
implementation of what you suggests, but in the general case. this is a little
old code, and needs a little more care to make it more robust, along the lines
you suggest. essentially, looking at the plain log-likelihood, and then mix that
with the iid to be integrated, and then integrate that one correcting the
results for what should be integrated. I have used this strategy for some cross-
validation stuff, but it need to be done also for the 'mix'-feature.
let me know and we can take it from there.
Best
Havard
…On Sun, 2026-05-24 at 18:56 -0700, eweine wrote:
I'm working on a scientific project involving single cell rna sequencing with
sparse data for which I am using a Poisson likelihood, with iid effects as
well as other structured effects (e.g., spatial, but the full details are not
necessary here). I prefer structured effects + iid with a Poisson likelihood
to structured effects alone with a negative binomial likelihood because the
former makes it much easier to compare the relative importance of the
structured and iid effects.
However, I often notice that when fitting the model to genes with particularly
sparse counts, I get 'vb.correction' is aborted warnings. In some simulations,
I have found that in these cases the fits can be quite bad relative to mcmc.
It struck me that in these models, perhaps the most challenging aspect is
estimating the posterior distribution over the iid effects, which can often
have a long left tail. It serves to reason that integrating out these iid
effects might then make posterior inference easier.
Below, I've implemented the Poisson log-normal distribution, using adaptive
Gauss-Hermite quadrature to numerically evaluate the likelihood.
If you run the example below, you will see in the second case with small rates
that the new Poisson log-normal likelihood recovers substantially more
accurate parameter estimates than the Poisson + iid model, which throws a
vb.correction warning.
## Test Poisson log-normal likelihood ## Compares: ## Model 1: Poisson + per-
observation iid random effect (reference) ## Model 2: poissonlognormal
likelihood across nquad = 10, 25, 50 library(INLA) pln_summary <- function(m,
label) { cat(sprintf(" %-20s intercept=%.4f prec=%.4f DIC=%.1f\n", label,
m$summary.fixed[1, "mean"], inla.mmarginal(m$marginals.hyperpar[[1]]),
m$dic$dic)) } ## ============================================================
## Large-rate data (beta0=5, sigma=1, mean(y) ~ 231) ##
============================================================ cat("=== Large-
rate data (beta0=5, sigma=1) ===\n") set.seed(42) n <- 500 beta0_true <- 5.0
sigma_true <- 1.0 prec_true <- 1 / sigma_true^2 u <- rnorm(n, 0, sigma_true) y
<- rpois(n, exp(beta0_true + u)) cat(sprintf(" n=%d mean(y)=%.1f fraction
zero=%.3f\n\n", n, mean(y), mean(y == 0))) cat(" True: intercept=5.0
prec=1.0\n\n") ## Poisson + iid (reference) idx <- 1:n m1 <- inla(y ~ 1 +
f(idx, model = "iid"), family = "poisson", data = data.frame(y = y, idx =
idx), control.compute = list(dic = TRUE)) pln_summary(m1, "Poisson+iid") ##
poissonlognormal across nquad values for (nq in c(10L, 15L, 25L, 50L)) { m <-
inla(y ~ 1, family = "poissonlognormal", data = data.frame(y = y),
control.family = list(nquad = nq), control.compute = list(dic = TRUE))
pln_summary(m, sprintf("PLN nquad=%d", nq)) } ##
============================================================ ## Sparse data
(beta0=-3.5, sigma=2, ~89% zeros) ##
============================================================ cat("\n=== Sparse
data (beta0=-3.5, sigma=2) ===\n") set.seed(123) n_s <- 10000 beta0_sparse <-
-3.5 sigma_sparse <- 2.0 prec_sparse <- 1 / sigma_sparse^2 u_s <- rnorm(n_s,
0, sigma_sparse) y_s <- rpois(n_s, exp(beta0_sparse + u_s)) cat(sprintf(" n=%d
mean(y)=%.4f fraction zero=%.3f\n\n", n_s, mean(y_s), mean(y_s == 0)))
cat(sprintf(" True: intercept=%.1f prec=%.4f\n\n", beta0_sparse, prec_sparse))
## Poisson + iid (reference) idx_s <- 1:n_s m1s <- inla(y ~ 1 + f(idx, model =
"iid"), family = "poisson", data = data.frame(y = y_s, idx = idx_s),
control.compute = list(dic = TRUE), safe = TRUE) pln_summary(m1s,
"Poisson+iid") ## poissonlognormal across nquad values for (nq in c(10L, 15L,
25L, 50L)) { m <- inla(y ~ 1, family = "poissonlognormal", data = data.frame(y
= y_s), control.family = list(nquad = nq), control.compute = list(dic = TRUE))
pln_summary(m, sprintf("PLN nquad=%d", nq)) } cat("\n=== Done ===\n")
You can view, comment on, or merge this pull request online
at: #123
Commit Summary * 7fd342c added Poisson log-normal likelihood with adaptive
gauss hermite quadrature
* a8ea85c added tests
* 008557a made quadrature more efficient
* 52f48cf reduced the default number of quadrature points to 15
File Changes
(10 files)
* A inlaprog/inla (0)
* M inlaprog/src/inla-likelihood.c (134)
* M inlaprog/src/inla-parse.c (54)
* M inlaprog/src/inla.c (10)
* M inlaprog/src/inla.h (8)
* M rinla/R/create.data.file.R (1)
* M rinla/R/models.R (23)
* M rinla/R/sections.R (8)
* M rinla/R/set.default.arguments.R (4)
* A rinla/tests/testthat/test-poissonlognormal.R (82)
Patch Links: * https://github.com/hrue/r-inla/pull/123.patch
* https://github.com/hrue/r-inla/pull/123.diff
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID:
***@***.***>
--
Håvard Rue
Professor of Statistics
Chair of the Statistics Program
CEMSE Division
King Abdullah University of Science and Technology
Thuwal 23955-6900
Kingdom of Saudi Arabia
***@***.***
Office: +966 (0)12 808 0640
Mobile: +966 (0)54 470 0421
Research group: bayescomp.kaust.edu.sa
R-INLA project: www.r-inla.org
Zoom: kaust.zoom.us/my/haavard.rue
--
This message and its contents, including attachments are intended solely
for the original recipient. If you are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
|
Author
|
No worries, you've more than made up for any bad past decisions by providing this great tool! I'm going the cloglik route. I think I have it working! |
Owner
|
Thx. the 'cloglike' makes it much easier and there is a unified interface, and
no need to dig into that part of the code that is messy.
Please let me know how this goes. If this goes into a package on CRAN, we can
discuss if this should be compiled automatically in the R-INLA package.
…On Mon, 2026-05-25 at 17:45 -0700, eweine wrote:
eweine left a comment (hrue/r-inla#123)
No worries, you've more than made up for any bad past decisions by providing
this great tool!
I'm going the cloglik route. I think I have it working!
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID:
***@***.***>
--
Håvard Rue
Professor of Statistics
Chair of the Statistics Program
CEMSE Division
King Abdullah University of Science and Technology
Thuwal 23955-6900
Kingdom of Saudi Arabia
***@***.***
Office: +966 (0)12 808 0640
Mobile: +966 (0)54 470 0421
Research group: bayescomp.kaust.edu.sa
R-INLA project: www.r-inla.org
Zoom: kaust.zoom.us/my/haavard.rue
--
This message and its contents, including attachments are intended solely
for the original recipient. If you are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I'm working on a scientific project involving single cell rna sequencing with sparse data for which I am using a Poisson likelihood, with iid effects as well as other structured effects (e.g., spatial, but the full details are not necessary here). I prefer structured effects + iid with a Poisson likelihood to structured effects alone with a negative binomial likelihood because the former makes it much easier to compare the relative importance of the structured and iid effects.
However, I often notice that when fitting the model to genes with particularly sparse counts, I get
'vb.correction' is abortedwarnings. In some simulations, I have found that in these cases the fits can be quite bad relative to mcmc.It struck me that in these models, perhaps the most challenging aspect is estimating the posterior distribution over the iid effects, which can often have a long left tail. It serves to reason that integrating out these iid effects might then make posterior inference easier.
Below, I've implemented the Poisson log-normal distribution, using adaptive Gauss-Hermite quadrature to numerically evaluate the likelihood.
If you run the example below, you will see in the second case with small rates that the new Poisson log-normal likelihood recovers substantially more accurate parameter estimates than the Poisson + iid model, which throws a vb.correction warning.