-
Notifications
You must be signed in to change notification settings - Fork 5
GeneWideStats
Our data consists of m genetic markers (e.g. within a gene or biological pathway) and their p-values obtained from a one sided test of significance. We wish to calculate a single statistics for the group of genetic markers.
The simplest strategy is to correct the minimum p-value within the region with an estimate of the number of effective tests using Sidak’s correction (Sidak, 1971) ,
[ p_{corrected}=1; -; \left( 1; -; p_{raw} \right)^{k} ]
where praw is the minimum p-value and k is the number of effective tests (Gao et al., 2008).
In order to obtain a combined test (T) taking the correlation among the genetic markers into account, we use the method originally derived by Brown (1975) with the modifications proposed by Kost and McDermott (2002) and Makambi (2003), leading to the chi-square test ( T=0.5\cdot \upsilon \cdot M_{F,m} ), with ( \upsilon ) degrees of freedom, where ( M_{F,m}=-2\sum_{i=1}^{m}{w_{i}; \ln \left( p_{i} \right)} ) is the weighted version of the Fisher’s method, ( p_{i} ) is the p-value of ith marker and ( w_{i} ) are weights greater than zero that sum to one. The degrees of freedom are ( \upsilon =\frac{8}{var\left( M_{F,m} \right)} ) with
[ var\left( M_{F,m} \right)=4\sum_{i=1}^{m}{w_{i}^{2}}+\sum_{i=1}^{m}{\sum_{j\neq i=1}^{m}{w_{i}}w_{j}\left( 3.263|\rho _{ij}|; +; 0.71|\rho _{ij}|^{2}; +; 0.027|\rho _{ij}|^{3} \right)} ]
where ( \rho _{ij} ) is the correlation between the ith and jth p-value.
We can also calculate
[ Z_{fix}=\frac{\frac{\sum_{i=1}^{m}{w_{i}z_{i}}}{\sum_{i=1}^{m}{w_{i}}}}{\sqrt{V_{fix}}} ]
(Huedo-Medina et al., 2006), where ( z_{i} ) are the p-values transformed to z-scores using the standard normal distribution inverse cumulative distribution function c.d.f. and ( V_{fix} ) is the variance of the test. Using the approximation of the multivariate-normal distribution ( V_{fix}=\sum_{i=1}^{m}{\sum_{j=1}^{m}{w_{i}w_{j}\rho _{ij}}} )
A random-effects estimate is given by
[ Z_{random}=\frac{\frac{\sum_{i=1}^{m}{w_{{i}^{*}}z{i}}}{\sum_{i=1}^{m}{w_{{i}^{*}}}}}{\sqrt{V{random}}} ]
with variance ( V_{random}=\sum_{i=1}^{m}{\sum_{j=1}^{m}{w_{{i}^{*}}w{{j}^{*}}\rho {ij}}} ) and weights equal to ( w{{i}^{*}}=\left( \tau ^{2}; +; \frac{1}{w_{i}} \right)^{-1} ) which are adjusted with the statistical heterogeneity measure
[ \tau ^{2}=\max \left[ 0,\frac{Q'; -; \left( m-1 \right)}{\sum_{i=1}^{m}{w_{i}; -; \frac{\sum_{i=1}^{m}{w_{{i}^{2}}}}{{\sum_{i=1}^{m}{w_{i}}}}}} \right] ]
In calculating ( \tau ^{2} ) one would normally use Cochran's heterogeneity statistics
[ Q=\sum_{i=1}^{m}{w_{i}\left( \frac{\sum_{i=1}^{m}{w_{{i}}z{i}}}{\sum_{i=1}^{m}{w_{{i}}}}; -; z{i} \right)^{2}} ]
which is an approximately distributed chi-square variable with m-1 degrees of freedom (Huedo-Medina et al., 2006). To account for the correlation among the genetic markers( \tau ^{2} ) is calculated using Q’, which is Q re-scaled into a chi-square variable with m-1 degrees of freedom. This is achieved by calculating Q’s tail probability using the method described in 2.1 and then Q’ is the probability’s chi-square value from a chi-square distribution with m-1 degrees of freedom.
- Brown, M.B. (1975) A method for combining non-independent, one-sided tests of significance. Biometrics. 31, 987-992.
- Gao, X. et al. (2008) A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet. Epidemiol. 32, 361-369.
- Huedo-Medina, T.B. et al. (2006) Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods. 11, 193-206.
- Kost, J.T. and McDermott, M.P. (2002) Combining dependent p-values. Stat. Probab. Lett. 60, 183-190.
- Makambi, K. (2003) Weighted inverse chi-square method for correlated significance tests. J. Appl. Stat. 30, 225-234.
- Sidak. Z. (1971) On probabilities of rectangles in multivariate Student distributions: their dependence on correlations. Ann. Math. Statist. 42, 169-175.