Skip to content

Incorrect log likelihood scores #51

@danielcgingerich

Description

@danielcgingerich

I am checking atSNP against motifmatchr to ensure my analysis is correct.

The first weird thing I am seeing is all negative scores.

range(results$log_lik_ref)
[1] -83.298867  -0.225367

I checked against the motif scores from motifmatchr and they are way off.

# a data frame of motifmatchr positions with corresponding snp overlaps: 
head(snp_motif_pairs) 
          snp   snp_position    motif           motif_position motif_score
1  rs557439872  chr6-33592353 MA0030.1   chr6-33592343-33592356    8.033355
2 rs1212546096 chr7-143882185 MA0030.1 chr7-143882185-143882198    8.517413
3  rs977166267 chr7-143882189 MA0030.1 chr7-143882185-143882198    8.517413
4 rs1365257832 chr7-143882193 MA0030.1 chr7-143882185-143882198    8.517413
5 rs1284885174 chr15-59169011 MA0030.1  chr15-59169011-59169024    8.997481
6 rs1487076959 chr15-59169021 MA0030.1  chr15-59169011-59169024    8.997481

n <- sample(1:nrow(snp_motif_pairs), 1)

test <- snp_motif_pairs[n, ]
ix <- atSNP_results$snpid == test$snp & atSNP_results$motif == test$motif
results[ix,]$log_lik_ref
[1] -14.18652
test$motif_score
[1] 6.6079

I used natural log likelihood ratio for PWMs input to motifmatchR, because atSNP uses natural log. I used PPM with pseudocount =1 for atSNP, as I am aware it does the log transformation automatically. As far as I know, the resulting PWMs should be the same for both algorithms.

my code for atSNP is here:

pwms <- getMatrixSet(JASPAR2020, opts = list(species = 9606, all_versions = F)) # returns raw frequency pfms
# convert to ppm
pwms <- lapply(pwms, 
               function(x){
                 mtx <- x@profileMatrix
                 mtx <- mtx + 0.25 # pseudocount
                 mtx <- mtx %*% diag(1/colSums(mtx))
                 mtx <- t(mtx)
                 return(mtx)
               })

snp_info <- LoadSNPData(snpids = rsid,    # a vector of refSNP ID's
                        genome.lib ="BSgenome.Hsapiens.UCSC.hg38", 
                        snp.lib = "SNPlocs.Hsapiens.dbSNP151.GRCh38", 
                        half.window.size = 25, default.par = FALSE, mutation = FALSE)

atsnp.scores <- ComputeMotifScore(pwms, snp_info, ncores = 1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions