MetaME is an R-based pipeline for running a genome-wide association study (GWAS) meta-analysis of chronic fatigue syndrome (CFS) summary statistics. The workflow automates the downloading of five public summary statistics: DecodeME, Million Veteran Program, UK Biobank from the Neale Lab, UK Biobank from the European Bioinformatics Institute, and FinnGen. The files are then brought to a common ground by column renaming, lift-over to GRCh38 and allele flipping (when necessary). Next, a meta-GWAS can be built using METAL, a popular software for GWAS meta-analysis, from a subset of these five GWASs, specified by the user in a YAML file. In particular, I present here a meta-GWAS of 21,561 cases (European ancestry) built using DecodeME, MVP, and UK Biobank. Given the presence of different regression models across the summary statistics, the sample-size-based method was employed to compute weighted Z scores for the variants in the meta-GWAS. The summary statistics are generated with respect to both GRCh38 and GRCh37. The latter was input to FUMA, using the 1000G Phase 3 EUR as the reference population to select candidate genes and perform tissue-level and cell-type analyses via FUMA's proprietary regression. Post-GWAS analysis revealed three risk loci, mapped to 10 candidate genes. Tissue-level regression associates ME/CFS with several brain regions, spanning the cerebellum, basal ganglia, hypothalamus, amygdala, and cortex. MAGMA gene-set analysis reveals a significant association with genes involved in glutamatergic synapses. Cell-type analysis on human brain scRNA-seq datasets identifies a significant regression with excitatory neurons of the white matter. The same analysis on the mouse brain revealed significant regressions with both excitatory neurons and inhibitory ones in the mesencephalon, the cerebellum, and the frontal/posterior cortex. While this analysis cannot specify further the glutamatergic hypothesis of ME/CFS, a deficiency in glutamatergic signalling would offer a possible explanation for mental and physical fatigue.
The main results of FUMA output are reported here, and the reader can explore and download the complete output at this link: meta-GWAS analysis. The GRCh37 summary statistics generated by this pipeline can be downloaded HERE while the GRCh38 version is HERE.
META_main.R retrieves and saves in the \Data folder the following five summary statistics.
| Database | Symbol | cases | controls | Trait | Regression | Ancestry | Assembly | Reference | Summary Statistics |
|---|---|---|---|---|---|---|---|---|---|
| DecodeME | DME | 15579 | 259909 | CFS (CCC/IOM) | Logistic | EUR | GRCh38 | (Preprint_2025) | (GWAS-1) |
| MillionVeteranProgram | MVP | 3891 | 439202 | PheCode_798.1_CFS | Logistic | EUR | GRCh38 | (Verma_2024) | (GCST90479178) |
| UKBiobank (NealeLab) | UKBNL | 1659 | 359482 | self_reported CFS | Linear | EUR | GRCh37 | (NealeLab) | (20002_1482) |
| UKBiobank (EIB) | UKBEIB | 2092 | 482506 | self_reported CFS | Linear | EUR | GRCh37 | (Dönertaş_2021) | (GCST90038694) |
| FinnGen | FG | 283 | 463029 | Post-viral fatigue | Logistic | FIN | GRCh38 | (Kurki_2023) | (R12_G6_POSTVIRFAT) |
The pipeline filters the input GWAS, keeping only common variants (MAF above 0.01). For DecodeME, UKBNL, and UKBEIB, only variants with INFO above 0.9 were kept. For UKBNL, low-confidence variants were removed. For UKBNL, variants with p_HWE below 1e-06 were removed.
The pipeline standardises the five summary statistics using the R package MungeSumstats (Murphy 2021), which munges allele columns and lifts coordinates from GRCh37 to GRCh38, if necessary. Munged sumstats are saved in \Munged. The column labels and their meaning are as follows:
| Column | Description |
|---|---|
| SNP | rs ID |
| CHR | chromosome number (GRCh38) |
| BP | base pair position (GRCh38) |
| A1 | reference allele (GRCh38) |
| A2 | effect allele |
| Z | Z-score |
| BETA | regression coefficient between trait and effect allele |
| SE | standard error |
| P | p-value |
| N | size (cases + controls) |
| N_CAS | number of cases |
| N_CON | number of controls |
I used the commands bi_allelic_filter=F and flip_frq_as_biallelic=T to include the non-biallelic SNPs. Note that with this setting, if a non-biallelic SNP need allele flipping, the frequency of the alternate allele is computed as 1-FRQ. Therefore, it is always bigger than it is in reality. Note that while this affects FRQ, SE, and BETA, it does not affect the analysis by METAL (based on p-values only) and by FUMA, which uses BETAs only for direction of the effect.
Each sumstat can have additional columns (like OR, for logistic regression, or LOG10P). As an example, here you can see the rows of the munged sumstat of DecodeMe (\Munged\DME_GRCh38.tsv.gz) that contain the 5 most significant variants:
| SNP | CHR | BP | A1 | A2 | VARIANT_ID | FRQ | N | N_CAS | N_CON | BETA | SE | LOG10P | P | Z |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rs6066909 | 20 | 48913376 | C | T | 20:48913376:C:T | 0.634113 | 275488 | 15579 | 259909 | 0.0904424 | 0.0133438 | 10.91380 | 1.219551e-11 | 6.777859 |
| rs6012555 | 20 | 48911205 | A | C | 20:48911205:A:C | 0.633910 | 275488 | 15579 | 259909 | 0.0901852 | 0.0133428 | 10.85740 | 1.388673e-11 | 6.759091 |
| rs6125539 | 20 | 49087273 | C | A | 20:49087273:C:A | 0.405455 | 275488 | 15579 | 259909 | -0.0807783 | 0.0130701 | 9.19416 | 6.394992e-10 | -6.180389 |
| rs6125576 | 20 | 49160382 | A | T | 20:49160382:A:T | 0.405720 | 275488 | 15579 | 259909 | -0.0806653 | 0.0130677 | 9.17348 | 6.706872e-10 | -6.172877 |
| rs4810909 | 20 | 49004835 | G | A | 20:49004835:G:A | 0.594451 | 275488 | 15579 | 259909 | 0.0804680 | 0.0130685 | 9.13114 | 7.393669e-10 | 6.157401 |
This pipeline utilises METAL, a widely used tool for meta-analysis of genome-wide association studies (Willer 2010). META_Main_2.R builds the script with instructions for METAL and passes it to a local installation of the latest release (available here). The source code was compiled in the WSL2 environment. META_Main_2.R generates the instruction script for METAL (see file metal_script.txt in this repository), then builds the command for METAL and calls it using the R function system().
While DecodeME, MVP, and FinnGen rely on logistic regression for trait-variant marginal associations, the two UK Biobank GWASs employed a linear regression model. In a case like this, Inverse Variance Weighted (IVW) meta-analysis is not feasible, since BETAs and SEs from different regression models are not directly comparable. We must rely on zeta scores instead, using a sample-size weighted meta-analysis, as described in (Willer 2010). In METAL, this kind of analysis can be set with the command SCHEME SAMPLESIZE. I wrote an introduction about GWAS meta-analysis (Maccallini 2025).
Also, given a possible overlap between DecodeME and UKBiobank healthy controls, we ask METAL to perform a correction using the instruction OVERLAP ON. This correction is based on the assumption that for small Z scores (the default cut-off is 1), the Z scores of different GWAS should be independent Gaussian random variables (Lin DY 2009).
For each GWAS, this pipeline passes METAL the effective sample size as weight, defined as follows:
Once METAL generates the results,META_Main_2.R estimates for each SNP included the standard error using the following formula:
where N is the total effective size, after correction for overlapping samples (see paragraph below for justification of this formula). After that, the pipeline calculates BETA using the well-known relation between BETA, Z, and SE:
Figure 1. The nine most significant variants in the meta-GWAS generated by METAL. In blue, the effect-allele frequency is reported; in red, the effective sample size, after correction for overlapping samples. Coordinates are with respect to GRCh38.
The formula used for the approximation of the standard error comes from EQ. A.3 of (Vukcevic D, 2011), where the noncentrality parameter of the Wald trend test is:
Here,
Substituting
In this pipeline
Substituting:
which corresponds to the code:
SE = (0.5 * N * FRQ * (1 - FRQ))^(-0.5)where N is the overlap-corrected effective sample size from METAL.
I submitted the GRCh37 summary statistics to FUMA (Functional Mapping and Annotation of GWAS), a web service that includes several modules for various stages of GWAS analysis. The SNP2GENE module is the first step: it identifies risk loci and performs gene mapping according to various criteria (positional mapping, eQTL-based and chromatin-based mapping) (Watanabe 2017). In SNP2GENE, I requested the 1000G Phase EUR reference population, which is in GRCh37. I used GTEx v8 for gene mapping by eQTL, and included positional mapping. I requested MAGMA analysis, which is a necessary step for subsequently running the Cell Type module. I left the default values for the other parameters. Next, tissue-level analysis was assessed using MAGMA gene-property analysis as implemented in FUMA, which performs a regression of gene-level association Z-scores on tissue-specific gene expression levels (from GTEx and other datasets) to identify tissues in which genetic associations are overrepresented. The analysis is available at this link: meta-GWAS analysis.
I used the results from the SNP2GENE analysis as input to the Cell Type module (Watanabe 2019), using both the mouse brain and the human brain. For the mouse brain, I employed the datasets from DropViz, only level 2 (L2); the regions considered are Cerebellum (CB), Entopeduncular nucleus & subthalamic nucleus (EP/STN), Frontal cortex (FC), Globus pallidus externus & nucleus basalis (GB/NB), Hippocampus (HP), Posterior cortex (PC), Striatum (STR), Substantia nigra & ventral tegmental area (SN/VTA), and Thalamus (TH) (Saunders 2018). For the human brain, I used L2 of Siletti datasets (Siletti 2023), from all the available regions of the adult brain. For the white matter, I used L2 of the dataset from (Seeker 2023). Level 2 allows us to distinguish between the main cell classes (neurons, glial, microglia, oligodendrocytes etc). I focused on the brain because MAGMA tissue expression analysis over the 53 tissues of GTEx v8 suggests a significant regression of several anatomical regions of the central nervous system. I included steps one, two, and three of the standard cell type analysis (they are explained below). I requested a Bonferroni multiple comparison test. The analysis consists of three steps. In particular, the first step tests the significance of the estimate of
for i=1, 2,..., N, where N is the total number of genes included in the analysis (usually around 18,000 genes),
The results of the meta-analysis described in this repository are collected in a HuggingFace repository and can be downloaded from the links below.
| Name | Description | Reference | Download |
|---|---|---|---|
| GWAS_METAL_DME_MVP_UKBEIB_GRCh37.tsv.gz | Meta-GWAS from DecodeME, MVP, UKBiobank (European Institute of Bioinformatics) | GRCh37 | (LINK) |
| GWAS_METAL_DME_MVP_UKBEIB_GRCh38.tsv.gz | Meta-GWAS from DecodeME, MVP, UKBiobank (European Institute of Bioinformatics) | GRCh38 | (LINK) |
The SNP2GENE module of FUMA identifies three risk loci, reported below (GRCh37). You can explore and download the results of the analysis of FUMA on my meta-GWAS at this link: meta-GWAS analysis.
| Genomic Locus | uniqID | rsID | chr | pos | P | start | end | nSNPs | nGWASSNPs | nIndSigSNPs | IndSigSNPs | nLeadSNPs | LeadSNPs |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2:64085114:A:G | rs10183479 | 2 | 64085114 | 4.016e-08 | 63911761 | 64287632 | 73 | 60 | 1 | rs10183479 | 1 | rs10183479 |
| 2 | 6:98537145:A:G | rs2503773 | 6 | 98537145 | 2.91e-09 | 98310291 | 98546547 | 282 | 252 | 2 | rs2503773;rs4363043 | 1 | rs2503773 |
| 3 | 20:47532999:A:G | rs4810894 | 20 | 47532999 | 1.453e-10 | 47511792 | 47914180 | 209 | 174 | 3 | rs4810894;rs3091574;rs1977121 | 1 | rs4810894 |
These chromosomal regions map to ten genes by positional criteria and eQTLs (GTEx v8):
| ensg | symbol | chr | start | end | strand | type | entrezID | pLI | ncRVIS | posMapSNPs | posMapMaxCADD | eqtlMapSNPs | eqtlMapminP | eqtlMapminQ | eqtlMap tissues (GTEx v8) | eqtlDirection | minGwasP | IndSigSNPs | GenomicLocus |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ENSG00000115507 | OTX1 | 2 | 63277192 | 63284971 | + | protein_coding | 5013 | 0.2198 | -1.7532 | 0 | 0 | 32 | 2.0e-05 | 6.68e-22 | Whole_Blood | − | 4.02e-08 | rs10183479 | 1 |
| ENSG00000143951 | WDPCP | 2 | 63348518 | 64054977 | − | protein_coding | 51057 | 2.3e-09 | -0.8846 | 6 | 14.37 | 1 | 2.58e-04 | 9.56e-08 | Fibroblasts | + | 9.17e-08 | rs10183479 | 1 |
| ENSG00000169764 | UGP2 | 2 | 64068074 | 64118696 | + | protein_coding | 7360 | 0.0010 | -0.3076 | 25 | 22.1 | 64 | 4.54e-18 | 6.06e-21 | Multiple tissues | − | 4.02e-08 | rs10183479 | 1 |
| ENSG00000143952 | VPS54 | 2 | 64119280 | 64246206 | − | protein_coding | 51542 | 0.9498 | NA | 46 | 22.1 | 56 | 3.29e-06 | 3.51e-03 | Muscle_Skeletal | − | 4.02e-08 | rs10183479 | 1 |
| ENSG00000124126 | PREX1 | 20 | 47240790 | 47444420 | − | protein_coding | 57580 | 1.0000 | 0.2866 | 0 | 0 | 178 | 1.70e-11 | 8.84e-20 | Muscle_Skeletal | + | 1.45e-10 | rs4810894;rs1977121;rs3091574 | 3 |
| ENSG00000124198 | ARFGEF2 | 20 | 47538427 | 47653230 | + | protein_coding | 10564 | 1.0000 | -0.7337 | 59 | 15.91 | 181 | 3.79e-11 | 8.97e-19 | Brain & peripheral tissues | − | 1.45e-10 | rs4810894;rs3091574;rs1977121 | 3 |
| ENSG00000124207 | CSE1L | 20 | 47662849 | 47713489 | + | protein_coding | 1434 | 0.99998 | 0.5269 | 48 | 15.91 | 181 | 1.15e-61 | 3.60e-53 | Broad multi-tissue | − | 1.45e-10 | rs4810894;rs3091574;rs1977121 | 3 |
| ENSG00000124214 | STAU1 | 20 | 47729878 | 47804904 | − | protein_coding | 6780 | 0.9964 | -0.3059 | 72 | 11.52 | 181 | 7.96e-14 | 1.95e-11 | Brain, blood, muscle | + | 1.45e-10 | rs4810894;rs3091574;rs1977121 | 3 |
| ENSG00000124228 | DDX27 | 20 | 47835884 | 47860614 | + | protein_coding | 55661 | 0.2502 | -0.3918 | 5 | 6.36 | 154 | 2.30e-07 | 3.72e-04 | Adipose, fibroblasts | + | 9.06e-10 | rs4810894;rs3091574;rs1977121 | 3 |
| ENSG00000124201 | ZNFX1 | 20 | 47854483 | 47894963 | − | protein_coding | 57169 | 0.99998 | -0.2881 | 8 | 14.09 | 163 | 1.28e-09 | 5.23e-14 | Blood, brain, thyroid | − | 1.45e-10 | rs4810894;rs1977121;rs3091574 | 3 |
MAGMA-proprietary gene-set analysis identifies the Gene Ontology (cellular-component level) term GLUTAMATERGIC_SYNAPSE as significant after Bonferroni correction. The term POSTSYNAPTIC_MEMBRANE is almost significant:
| Gene Set | N genes | Beta | Beta STD | SE | P | Pbon |
|---|---|---|---|---|---|---|
| GOCC_GLUTAMATERGIC_SYNAPSE | 387 | 0.20115 | 0.028462 | 0.043295 | 1.7048e-06 | 0.0289969432 |
| GOCC_POSTSYNAPTIC_MEMBRANE | 257 | 0.252 | 0.02916 | 0.055769 | 3.1319e-06 | 0.0532673552 |
| GOCC_POSTSYNAPTIC_DENSITY_MEMBRANE | 97 | 0.37883 | 0.027046 | 0.086776 | 6.3744e-06 | 0.1084094208 |
| GOMF_INORGANIC_MOLECULAR_ENTITY_TRANSMEMBRANE_TRANSPORTER_ACTIVITY | 624 | 0.1543 | 0.027545 | 0.035677 | 7.6765e-06 | 0.130546559 |
| REACTOME_NEURONAL_SYSTEM | 383 | 0.19456 | 0.02739 | 0.045112 | 8.1036e-06 | 0.137801718 |
| GOCC_SYNAPTIC_MEMBRANE | 363 | 0.19771 | 0.027111 | 0.046605 | 1.1125e-05 | 0.1891695 |
| GOBP_SYNAPTIC_SIGNALING | 714 | 0.1378 | 0.02625 | 0.032721 | 1.2758e-05 | 0.216924274 |
| GOCC_POSTSYNAPTIC_SPECIALIZATION | 342 | 0.19729 | 0.026275 | 0.047463 | 1.6221e-05 | 0.275789442 |
| GOCC_AMPA_GLUTAMATE_RECEPTOR_COMPLEX | 23 | 0.71988 | 0.025075 | 0.17531 | 2.0197e-05 | 0.343369197 |
| GOCC_SOMATODENDRITIC_COMPARTMENT | 791 | 0.12708 | 0.025427 | 0.031224 | 2.3609e-05 | 0.401353 |
MAGMA-proprietary tissue analysis, based on a linear regression between zeta scores assigned to all the human genes from the GWAS analysis and tissue-specific gene-expression profiles, highlights several significant brain associations, spanning the basal ganglia, the cerebellum, and the cortex (Figure 2).
Figure 2. MAGMA tissue analysis showing significant associations across basal ganglia, cerebellum, and cortex.
The results of cell-type analysis for DropViz level 2 (L2) scRNAseq datasets, from step one to step three, are summarised in Figure 3. In step one (top-left), all the significant regressions are reported, after correction for multiple comparisons. In step two (centre), the independent signals for each dataset are selected. In step three (bottom right), independence across datasets is detected: stars indicate the collinear covariates of the regression model, while the element of row i and column j indicates the PS of cell type j conditioning on cell type i.
Figure 3. Results of cell-type analysis, using the DropViz scRNA-seq datasets for the mouse brain. STR: striatum; GP: Globus Pallidus; SN: Substantia Nigra; CB: cerebellum; FC: Frontal Cortex; PC: Posterior Cortex. Top-left: all the significant regressions, after correction for multiple comparisons. Centre: only independent signals for each dataset. Bottom-right: collinear covariates are indicated by a star in the bottom right.
Below are the significant regressions from step 2. The marker (last column) is the gene most significantly overexpressed in the corresponding cell (column two), differentiating it from the other neurons from the same anatomical region. It helps identify the function of the corresponding cell type.
| Dataset | Cell-type | Link_DropViz | Region | Marker |
|---|---|---|---|---|
| DropViz_STR_level2 | Neuron_Gad1Gad2_Drd1-Cxcl14.10_5 | (LINK) | Striatum | Gad1 |
| DropViz_GP_level2 | Neuron_Gad1Gad2-Th_Adora2a-Th.3_9 | (LINK) | Globus Pallidus | Gad1 |
| DropViz_SN_level2 | Neuron_Th_Cbln1.4_2 | (LINK) | Substantia Nigra | Cbln1 |
| DropViz_CB_level2 | Neuron_Slc17a7_Gabra6.1_1 | (LINK) | Cerebellum | Slc17a7 |
| DropViz_FC_level2 | Neuron_Gad1Gad2_Synpr-Pcdh11x.1_6 | (LINK) | Frontal Cortex | Gad1 |
| DropViz_PC_level2 | Neuron_Sc17a7_Calb1-Lpl-Penk.2_7 | (LINK) | Posterior Cortex | Slc17a7 |
| DropViz_SN_level2 | Neuron_Slc17a6.2_1 | (LINK) | Substantia Nigra | Slc17a6 |
While Slc17a6, Slc17a7, and Cbln1 identify mainly excitatory neurons, Gad1 is a marker of inhibitory neurons.
The results of cell-type analysis for Siletti (all regions) and Seeker (White Matter) level 2 (L2) scRNAseq datasets, step one and step three, are summarised in Figure 4. There is a positive regression with excitatory neurons of the white matter in both the young and old human brain.
Figure 4. Results of cell-type analysis, using the Siletti (all regions) and Seeker (White matter) scRNA-seq datasets for the human brain. Top-left: all the significant regressions, after correction for multiple comparisons. Bottom-right: collinear covariates are indicated by a star in the bottom right. No collinearity detected.
The present meta-GWAS on more than 21,500 ME/CFS cases shows a genetic profile that points to the brain, in particular to glutamatergic neurons and glutamatergic synapses. An increase in glutamatergic signalling would cause excitotoxicity, which is associated with neuronal death. I think this is not documented in ME/CFS and should be ruled out. Another possibility is that ME/CFS results from a reduction in glutamatergic signalling, as suggested by the following lines of evidence. A recent study on long COVID found an increased density of AMPA receptors (Fujimoto Y et al. 2025), which may be an adaptation to reduced glutamatergic signalling. In a survey on 150 ME/CFS patients, 66% reported being less able to tolerate alcohol compared to their pre-illness state (Lily C et al. 2019). This could be explained by the inhibitory effect of alcohol on NMDA receptors: under the hypothesis of deficient glutamatergic transmission, alcohol would be expected to exacerbate the disease. One of the few Mendelian ME/CFS cases reported so far involves a woman carrying a structural variant that leads to increased levels of GABAergic neurosteroids (Oakley J et al. 2023). Increased GABAergic tone may induce symptoms similar to reduced glutamatergic tone, given the interplay between the two systems. A GWAS on 1200 females with self-reported ME/CFS from the UK Biobank pointed to rs2017696, see (this table). This variant is associated with altered expression of SLC25A15 (ornithine transporter type I) in several tissues, the brain included. In particular, cases are associated with the reference allele, which is associated with reduced expression of SLC25A15 in the caudate, cingulate cortex, and other brain regions (see Figure 5). What happens if ornithine cannot enter the mitochondria in the brain? There is a local accumulation of ammonia, and the only way to remove it is to consume glutamate. Therefore, the available glutamate for neurotransmission is used for ammonia detoxification, and glutamatergic signalling is weakened. This may be one of dozens of possible mechanisms that lead to glutamatergic deficit.
While a deficit in glutamatergic transmission may explain baseline symptoms, post-exertional malaise (PEM) may be linked to the life cycle of glutamatergic synapses, a dynamic process that spans hours to days (Lisman J 2017).
If the glutamatergic hypothesis of ME/CFS holds, there are several viable therapies. This is a list of possible drugs.
| Drug | Mechanism | DrugBank |
|---|---|---|
| D-serine | NMDA co-agonist | (DB03929) |
| Aniracetam | AMPA positive allosteric modulator | (DB04599) |
| Sarcosine | Glycine reuptake inhibitor | (DB12519) |
| Ketamine | NMDA antagonist | (DB01221) |
| Esketamine | NMDA antagonist | (DB11823) |
Figure 5. Differential expression of SLC25A15 in three brain regions associated with rs2017696, according to the GTEx portal.
META_main_2.R– orchestrates package installation, summary statistic munging, and harmonisation steps for each cohort.META_func.R– helper functions that download input datasets, build the project directory structure, and load configuration values.META_config.yml– user-editable configuration file for p-value, minor allele frequency, and imputation quality cut-offs as well as cohort selection flags.LICENSE– licensing information for the project.
- R (≥ 4.0 recommended)
- Access to Bioconductor and CRAN to install dependencies
- Internet connection to retrieve cohort summary statistics from OSF, GWAS Catalogue, and UK Biobank distribution endpoints
The main script installs the required packages automatically, including:
BiocManager,MungeSumstats,BSgenome.Hsapiens.NCBI.GRCh38,SNPlocs.Hsapiens.dbSNP155.GRCh38,SNPlocs.Hsapiens.dbSNP155.GRCh37, andBSgenome.Hsapiens.1000genomes.hs37d5httr,R.utils,data.table,gtexr,otargen,yaml,corrplot
You may prefer to install these ahead of time to avoid repeated installations when running the pipeline on a cluster.
Edit META_config.yml to control filtering thresholds and which cohorts to include. The file exposes:
filters: genome-wide significance thresholds for common and uncommon variants (p_value_common,p_value_uncommon), Hardy–Weinberg equilibrium cut-off (hwe_p_value), minimum allele frequency thresholds (maf_common,maf_uncommon), and an imputation INFO score cut-off (info_cutoff).samples: binary flags (1 = include, 0 = skip) for DecodeME, MVP, UK Biobank (Neale Lab), UK Biobank (EIB), and FinnGen cohorts.
Update these values before launching the analysis to ensure only the desired datasets are downloaded and processed.
-
Open an R session in the repository root (or ensure the working directory is set to the project).
-
Run the main script:
Rscript META_main.R
-
The script creates required directories (
Data/,Munged/) and downloads each cohort selected inMETA_config.yml. -
Summary statistics are munged via
MungeSumstatsinto harmonisedTSV.GZfiles stored underMunged/.
Depending on connectivity and download sizes, the first execution can take a while. Temporary outputs and downloaded data are preserved for reuse.
- Harmonised summary statistics per cohort saved in the
Munged/directory (e.g.,Munged/DME_GRCh38.tsv.gz). - Downloaded raw summary statistics and metadata stored beneath
Data/in cohort-specific subdirectories. - Meta-GWAS summary statistics in both GrCh37 and GRCh38 stored beneath
Output/.
These munged files are intended for subsequent meta-analysis steps, which are not included in this repository.
This project is released under the terms described in the included LICENSE file.


