I am checking atSNP against motifmatchr to ensure my analysis is correct.
The first weird thing I am seeing is all negative scores.
range(results$log_lik_ref)
[1] -83.298867 -0.225367
I checked against the motif scores from motifmatchr and they are way off.
# a data frame of motifmatchr positions with corresponding snp overlaps:
head(snp_motif_pairs)
snp snp_position motif motif_position motif_score
1 rs557439872 chr6-33592353 MA0030.1 chr6-33592343-33592356 8.033355
2 rs1212546096 chr7-143882185 MA0030.1 chr7-143882185-143882198 8.517413
3 rs977166267 chr7-143882189 MA0030.1 chr7-143882185-143882198 8.517413
4 rs1365257832 chr7-143882193 MA0030.1 chr7-143882185-143882198 8.517413
5 rs1284885174 chr15-59169011 MA0030.1 chr15-59169011-59169024 8.997481
6 rs1487076959 chr15-59169021 MA0030.1 chr15-59169011-59169024 8.997481
n <- sample(1:nrow(snp_motif_pairs), 1)
test <- snp_motif_pairs[n, ]
ix <- atSNP_results$snpid == test$snp & atSNP_results$motif == test$motif
results[ix,]$log_lik_ref
[1] -14.18652
test$motif_score
[1] 6.6079
I used natural log likelihood ratio for PWMs input to motifmatchR, because atSNP uses natural log. I used PPM with pseudocount =1 for atSNP, as I am aware it does the log transformation automatically. As far as I know, the resulting PWMs should be the same for both algorithms.
my code for atSNP is here:
pwms <- getMatrixSet(JASPAR2020, opts = list(species = 9606, all_versions = F)) # returns raw frequency pfms
# convert to ppm
pwms <- lapply(pwms,
function(x){
mtx <- x@profileMatrix
mtx <- mtx + 0.25 # pseudocount
mtx <- mtx %*% diag(1/colSums(mtx))
mtx <- t(mtx)
return(mtx)
})
snp_info <- LoadSNPData(snpids = rsid, # a vector of refSNP ID's
genome.lib ="BSgenome.Hsapiens.UCSC.hg38",
snp.lib = "SNPlocs.Hsapiens.dbSNP151.GRCh38",
half.window.size = 25, default.par = FALSE, mutation = FALSE)
atsnp.scores <- ComputeMotifScore(pwms, snp_info, ncores = 1)
I am checking atSNP against motifmatchr to ensure my analysis is correct.
The first weird thing I am seeing is all negative scores.
I checked against the motif scores from motifmatchr and they are way off.
I used natural log likelihood ratio for PWMs input to motifmatchR, because atSNP uses natural log. I used PPM with pseudocount =1 for atSNP, as I am aware it does the log transformation automatically. As far as I know, the resulting PWMs should be the same for both algorithms.
my code for atSNP is here: