Meta-analysis of three GWAS summary statistics including 21,500 ME/CFS cases

Abstract

MetaME is an R-based pipeline for running a genome-wide association study (GWAS) meta-analysis of chronic fatigue syndrome (CFS) summary statistics. The workflow automates the downloading of five public summary statistics: DecodeME, Million Veteran Program, UK Biobank from the Neale Lab, UK Biobank from the European Bioinformatics Institute, and FinnGen. The files are then brought to a common ground by column renaming, lift-over to GRCh38 and allele flipping (when necessary). Next, a meta-GWAS can be built using METAL, a popular software for GWAS meta-analysis, from a subset of these five GWASs, specified by the user in a YAML file. In particular, I present here a meta-GWAS of 21,561 cases (European ancestry) built using DecodeME, MVP, and UK Biobank. Given the presence of different regression models across the summary statistics, the sample-size-based method was employed to compute weighted Z scores for the variants in the meta-GWAS. The summary statistics are generated with respect to both GRCh38 and GRCh37. The latter was input to FUMA, using the 1000G Phase 3 EUR as the reference population to select candidate genes and perform tissue-level and cell-type analyses via FUMA's proprietary regression. Post-GWAS analysis revealed three risk loci, mapped to 10 candidate genes. Tissue-level regression associates ME/CFS with several brain regions, spanning the cerebellum, basal ganglia, hypothalamus, amygdala, and cortex. MAGMA gene-set analysis reveals a significant association with genes involved in glutamatergic synapses. Cell-type analysis on human brain scRNA-seq datasets identifies a significant regression with excitatory neurons of the white matter. The same analysis on the mouse brain revealed significant regressions with both excitatory neurons and inhibitory ones in the mesencephalon, the cerebellum, and the frontal/posterior cortex. While this analysis cannot specify further the glutamatergic hypothesis of ME/CFS, a deficiency in glutamatergic signalling would offer a possible explanation for mental and physical fatigue.

The main results of FUMA output are reported here, and the reader can explore and download the complete output at this link: meta-GWAS analysis. The GRCh37 summary statistics generated by this pipeline can be downloaded HERE while the GRCh38 version is HERE.

Methods

Data sources

META_main.R retrieves and saves in the \Data folder the following five summary statistics.

Database	Symbol	cases	controls	Trait	Regression	Ancestry	Assembly	Reference	Summary Statistics
DecodeME	DME	15579	259909	CFS (CCC/IOM)	Logistic	EUR	GRCh38	(Preprint_2025)	(GWAS-1)
MillionVeteranProgram	MVP	3891	439202	PheCode_798.1_CFS	Logistic	EUR	GRCh38	(Verma_2024)	(GCST90479178)
UKBiobank (NealeLab)	UKBNL	1659	359482	self_reported CFS	Linear	EUR	GRCh37	(NealeLab)	(20002_1482)
UKBiobank (EIB)	UKBEIB	2092	482506	self_reported CFS	Linear	EUR	GRCh37	(Dönertaş_2021)	(GCST90038694)
FinnGen	FG	283	463029	Post-viral fatigue	Logistic	FIN	GRCh38	(Kurki_2023)	(R12_G6_POSTVIRFAT)

Filtering criteria

The pipeline filters the input GWAS, keeping only common variants (MAF above 0.01). For DecodeME, UKBNL, and UKBEIB, only variants with INFO above 0.9 were kept. For UKBNL, low-confidence variants were removed. For UKBNL, variants with p_HWE below 1e-06 were removed.

Munging and Liftover

The pipeline standardises the five summary statistics using the R package MungeSumstats (Murphy 2021), which munges allele columns and lifts coordinates from GRCh37 to GRCh38, if necessary. Munged sumstats are saved in \Munged. The column labels and their meaning are as follows:

Column	Description
SNP	rs ID
CHR	chromosome number (GRCh38)
BP	base pair position (GRCh38)
A1	reference allele (GRCh38)
A2	effect allele
Z	Z-score
BETA	regression coefficient between trait and effect allele
SE	standard error
P	p-value
N	size (cases + controls)
N_CAS	number of cases
N_CON	number of controls

I used the commands bi_allelic_filter=F and flip_frq_as_biallelic=T to include the non-biallelic SNPs. Note that with this setting, if a non-biallelic SNP need allele flipping, the frequency of the alternate allele is computed as 1-FRQ. Therefore, it is always bigger than it is in reality. Note that while this affects FRQ, SE, and BETA, it does not affect the analysis by METAL (based on p-values only) and by FUMA, which uses BETAs only for direction of the effect.

Each sumstat can have additional columns (like OR, for logistic regression, or LOG10P). As an example, here you can see the rows of the munged sumstat of DecodeMe (\Munged\DME_GRCh38.tsv.gz) that contain the 5 most significant variants:

SNP	CHR	BP	A1	A2	VARIANT_ID	FRQ	N	N_CAS	N_CON	BETA	SE	LOG10P	P	Z
rs6066909	20	48913376	C	T	20:48913376:C:T	0.634113	275488	15579	259909	0.0904424	0.0133438	10.91380	1.219551e-11	6.777859
rs6012555	20	48911205	A	C	20:48911205:A:C	0.633910	275488	15579	259909	0.0901852	0.0133428	10.85740	1.388673e-11	6.759091
rs6125539	20	49087273	C	A	20:49087273:C:A	0.405455	275488	15579	259909	-0.0807783	0.0130701	9.19416	6.394992e-10	-6.180389
rs6125576	20	49160382	A	T	20:49160382:A:T	0.405720	275488	15579	259909	-0.0806653	0.0130677	9.17348	6.706872e-10	-6.172877
rs4810909	20	49004835	G	A	20:49004835:G:A	0.594451	275488	15579	259909	0.0804680	0.0130685	9.13114	7.393669e-10	6.157401

Meta-analysis

This pipeline utilises METAL, a widely used tool for meta-analysis of genome-wide association studies (Willer 2010). META_Main_2.R builds the script with instructions for METAL and passes it to a local installation of the latest release (available here). The source code was compiled in the WSL2 environment. META_Main_2.R generates the instruction script for METAL (see file metal_script.txt in this repository), then builds the command for METAL and calls it using the R function system().

While DecodeME, MVP, and FinnGen rely on logistic regression for trait-variant marginal associations, the two UK Biobank GWASs employed a linear regression model. In a case like this, Inverse Variance Weighted (IVW) meta-analysis is not feasible, since BETAs and SEs from different regression models are not directly comparable. We must rely on zeta scores instead, using a sample-size weighted meta-analysis, as described in (Willer 2010). In METAL, this kind of analysis can be set with the command SCHEME SAMPLESIZE. I wrote an introduction about GWAS meta-analysis (Maccallini 2025).

Also, given a possible overlap between DecodeME and UKBiobank healthy controls, we ask METAL to perform a correction using the instruction OVERLAP ON. This correction is based on the assumption that for small Z scores (the default cut-off is 1), the Z scores of different GWAS should be independent Gaussian random variables (Lin DY 2009).

For each GWAS, this pipeline passes METAL the effective sample size as weight, defined as follows:

$$ N_{eff}=\frac{4}{\frac{1}{N_{CAS}}+\frac{1}{N_{CON}}} $$

Once METAL generates the results,META_Main_2.R estimates for each SNP included the standard error using the following formula:

$$ SE=\frac{1}{\sqrt{{0.5N(1-FRQ)FRQ}}} $$

where N is the total effective size, after correction for overlapping samples (see paragraph below for justification of this formula). After that, the pipeline calculates BETA using the well-known relation between BETA, Z, and SE: $\ BETA=SE \cdot Z$. After that, the output from METAL is munged by MungeSumstats (as described before), and the summary statistics of the meta-GWAS are generated with respect to both GRCh37 and GRCh38. A comparison between the Z scores across the input GWAS is plotted as a PDF for the nine SNPs with the lowest p-value (see below, and see among the files of this repository).

Figure 1. The nine most significant variants in the meta-GWAS generated by METAL. In blue, the effect-allele frequency is reported; in red, the effective sample size, after correction for overlapping samples. Coordinates are with respect to GRCh38.

Derivation of SE

The formula used for the approximation of the standard error comes from EQ. A.3 of (Vukcevic D, 2011), where the noncentrality parameter of the Wald trend test is:

$$\eta \approx 2N \cdot f(1-f) \cdot \phi(1-\phi) \cdot \beta^2$$

Here, $N$ is total sample size, $f$ is allele frequency, $\phi = N_{CAS}/N$ is the proportion of cases, and $\beta$ is the additive effect. Since the Wald statistic follows a $\chi^2_1$ distribution with noncentrality parameter $\eta = \beta^2 / \text{var}(\hat\beta)$, the asymptotic variance of $\hat\beta$ is:

$$\text{var}(\hat\beta) = \frac{1}{2N \cdot f(1-f) \cdot \phi(1-\phi)}$$

Substituting $\phi(1-\phi) = \frac{N_{CAS}}{N}\frac{N_{CON}}{N} = \frac{N_{CAS} N_{CON}}{N^2}$:

$$\text{var}(\hat\beta) = \frac{N}{2 \cdot f(1-f) \cdot N_{CAS} N_{CON}}$$

In this pipeline $N_{eff}$ is defined as the harmonic mean of cases and controls scaled by 4:

$$N_{eff} = \frac{4}{\frac{1}{N_{CAS}} + \frac{1}{N_{CON}}} = \frac{4 N_{CAS} N_{CON}}{N} = 4 \frac{N}{N_{CAS} N_{CON}}$$

Substituting:

$$SE = \frac{1}{\sqrt{2 \cdot f(1-f) \cdot \frac{N_{eff}}{4}}} = \frac{1}{\sqrt{0.5 \cdot N_{eff} \cdot f(1-f)}}$$

which corresponds to the code:

SE = (0.5 * N * FRQ * (1 - FRQ))^(-0.5)

where N is the overlap-corrected effective sample size from METAL.

FUMA: gene-mapping and tissue analysis

I submitted the GRCh37 summary statistics to FUMA (Functional Mapping and Annotation of GWAS), a web service that includes several modules for various stages of GWAS analysis. The SNP2GENE module is the first step: it identifies risk loci and performs gene mapping according to various criteria (positional mapping, eQTL-based and chromatin-based mapping) (Watanabe 2017). In SNP2GENE, I requested the 1000G Phase EUR reference population, which is in GRCh37. I used GTEx v8 for gene mapping by eQTL, and included positional mapping. I requested MAGMA analysis, which is a necessary step for subsequently running the Cell Type module. I left the default values for the other parameters. Next, tissue-level analysis was assessed using MAGMA gene-property analysis as implemented in FUMA, which performs a regression of gene-level association Z-scores on tissue-specific gene expression levels (from GTEx and other datasets) to identify tissues in which genetic associations are overrepresented. The analysis is available at this link: meta-GWAS analysis.

FUMA: cell-type analysis

I used the results from the SNP2GENE analysis as input to the Cell Type module (Watanabe 2019), using both the mouse brain and the human brain. For the mouse brain, I employed the datasets from DropViz, only level 2 (L2); the regions considered are Cerebellum (CB), Entopeduncular nucleus & subthalamic nucleus (EP/STN), Frontal cortex (FC), Globus pallidus externus & nucleus basalis (GB/NB), Hippocampus (HP), Posterior cortex (PC), Striatum (STR), Substantia nigra & ventral tegmental area (SN/VTA), and Thalamus (TH) (Saunders 2018). For the human brain, I used L2 of Siletti datasets (Siletti 2023), from all the available regions of the adult brain. For the white matter, I used L2 of the dataset from (Seeker 2023). Level 2 allows us to distinguish between the main cell classes (neurons, glial, microglia, oligodendrocytes etc). I focused on the brain because MAGMA tissue expression analysis over the 53 tissues of GTEx v8 suggests a significant regression of several anatomical regions of the central nervous system. I included steps one, two, and three of the standard cell type analysis (they are explained below). I requested a Bonferroni multiple comparison test. The analysis consists of three steps. In particular, the first step tests the significance of the estimate of $\ B_{E}$ in the following linear model, which is computed for each one of the cell types (across all the datasets selected):

$$ Z_{gene_{i}}=B_{0}+E_{gene_i}^{c}B_{E}+{E}_{gene_{i}}B_{A}+G_i^{(1)}B_1+G_i^{(2)}B_2+\cdot\cdot\cdot+G_i^{(n)}B_n $$

for i=1, 2,..., N, where N is the total number of genes included in the analysis (usually around 18,000 genes), $\ Z_{gene_{i}}$ is the zeta-score computed for gene i by the SNP2GENE module, $\ E_{gene_i}^{c}$ is the gene expression of gene i in cell type c, $\ E_{gene_i}$ is the average expression of of gene i across multiple cell types, and $\ G_i^{(n)}$ with j in 1, 2, ..., n are confounders such as gene length and correation between genes calculate from the LD matrices of the reference population (Watanabe 2019). In step one, p-values are calculated for each cell type from the statistical test on the estimate $\ B_{E}$. These p-values are then corrected for multiple comparisons (Bonferroni) within each dataset. In step two, a conditional analysis is performed within each dataset, and independent signals are selected by forward-selection. Sep three detects independent associations across all the datasets employed.

Results

Summary statistics

The results of the meta-analysis described in this repository are collected in a HuggingFace repository and can be downloaded from the links below.

Name	Description	Reference	Download
GWAS_METAL_DME_MVP_UKBEIB_GRCh37.tsv.gz	Meta-GWAS from DecodeME, MVP, UKBiobank (European Institute of Bioinformatics)	GRCh37	(LINK)
GWAS_METAL_DME_MVP_UKBEIB_GRCh38.tsv.gz	Meta-GWAS from DecodeME, MVP, UKBiobank (European Institute of Bioinformatics)	GRCh38	(LINK)

Risk loci and candidate genes

The SNP2GENE module of FUMA identifies three risk loci, reported below (GRCh37). You can explore and download the results of the analysis of FUMA on my meta-GWAS at this link: meta-GWAS analysis.

Genomic Locus	uniqID	rsID	chr	pos	P	start	end	nSNPs	nGWASSNPs	nIndSigSNPs	IndSigSNPs	nLeadSNPs	LeadSNPs
1	2:64085114:A:G	rs10183479	2	64085114	4.016e-08	63911761	64287632	73	60	1	rs10183479	1	rs10183479
2	6:98537145:A:G	rs2503773	6	98537145	2.91e-09	98310291	98546547	282	252	2	rs2503773;rs4363043	1	rs2503773
3	20:47532999:A:G	rs4810894	20	47532999	1.453e-10	47511792	47914180	209	174	3	rs4810894;rs3091574;rs1977121	1	rs4810894

These chromosomal regions map to ten genes by positional criteria and eQTLs (GTEx v8):

ensg	symbol	chr	start	end	strand	type	entrezID	pLI	ncRVIS	posMapSNPs	posMapMaxCADD	eqtlMapSNPs	eqtlMapminP	eqtlMapminQ	eqtlMap tissues (GTEx v8)	eqtlDirection	minGwasP	IndSigSNPs	GenomicLocus
ENSG00000115507	OTX1	2	63277192	63284971	+	protein_coding	5013	0.2198	-1.7532	0	0	32	2.0e-05	6.68e-22	Whole_Blood	−	4.02e-08	rs10183479	1
ENSG00000143951	WDPCP	2	63348518	64054977	−	protein_coding	51057	2.3e-09	-0.8846	6	14.37	1	2.58e-04	9.56e-08	Fibroblasts	+	9.17e-08	rs10183479	1
ENSG00000169764	UGP2	2	64068074	64118696	+	protein_coding	7360	0.0010	-0.3076	25	22.1	64	4.54e-18	6.06e-21	Multiple tissues	−	4.02e-08	rs10183479	1
ENSG00000143952	VPS54	2	64119280	64246206	−	protein_coding	51542	0.9498	NA	46	22.1	56	3.29e-06	3.51e-03	Muscle_Skeletal	−	4.02e-08	rs10183479	1
ENSG00000124126	PREX1	20	47240790	47444420	−	protein_coding	57580	1.0000	0.2866	0	0	178	1.70e-11	8.84e-20	Muscle_Skeletal	+	1.45e-10	rs4810894;rs1977121;rs3091574	3
ENSG00000124198	ARFGEF2	20	47538427	47653230	+	protein_coding	10564	1.0000	-0.7337	59	15.91	181	3.79e-11	8.97e-19	Brain & peripheral tissues	−	1.45e-10	rs4810894;rs3091574;rs1977121	3
ENSG00000124207	CSE1L	20	47662849	47713489	+	protein_coding	1434	0.99998	0.5269	48	15.91	181	1.15e-61	3.60e-53	Broad multi-tissue	−	1.45e-10	rs4810894;rs3091574;rs1977121	3
ENSG00000124214	STAU1	20	47729878	47804904	−	protein_coding	6780	0.9964	-0.3059	72	11.52	181	7.96e-14	1.95e-11	Brain, blood, muscle	+	1.45e-10	rs4810894;rs3091574;rs1977121	3
ENSG00000124228	DDX27	20	47835884	47860614	+	protein_coding	55661	0.2502	-0.3918	5	6.36	154	2.30e-07	3.72e-04	Adipose, fibroblasts	+	9.06e-10	rs4810894;rs3091574;rs1977121	3
ENSG00000124201	ZNFX1	20	47854483	47894963	−	protein_coding	57169	0.99998	-0.2881	8	14.09	163	1.28e-09	5.23e-14	Blood, brain, thyroid	−	1.45e-10	rs4810894;rs1977121;rs3091574	3

Gene-set analysis

MAGMA-proprietary gene-set analysis identifies the Gene Ontology (cellular-component level) term GLUTAMATERGIC_SYNAPSE as significant after Bonferroni correction. The term POSTSYNAPTIC_MEMBRANE is almost significant:

Gene Set	N genes	Beta	Beta STD	SE	P	Pbon
GOCC_GLUTAMATERGIC_SYNAPSE	387	0.20115	0.028462	0.043295	1.7048e-06	0.0289969432
GOCC_POSTSYNAPTIC_MEMBRANE	257	0.252	0.02916	0.055769	3.1319e-06	0.0532673552
GOCC_POSTSYNAPTIC_DENSITY_MEMBRANE	97	0.37883	0.027046	0.086776	6.3744e-06	0.1084094208
GOMF_INORGANIC_MOLECULAR_ENTITY_TRANSMEMBRANE_TRANSPORTER_ACTIVITY	624	0.1543	0.027545	0.035677	7.6765e-06	0.130546559
REACTOME_NEURONAL_SYSTEM	383	0.19456	0.02739	0.045112	8.1036e-06	0.137801718
GOCC_SYNAPTIC_MEMBRANE	363	0.19771	0.027111	0.046605	1.1125e-05	0.1891695
GOBP_SYNAPTIC_SIGNALING	714	0.1378	0.02625	0.032721	1.2758e-05	0.216924274
GOCC_POSTSYNAPTIC_SPECIALIZATION	342	0.19729	0.026275	0.047463	1.6221e-05	0.275789442
GOCC_AMPA_GLUTAMATE_RECEPTOR_COMPLEX	23	0.71988	0.025075	0.17531	2.0197e-05	0.343369197
GOCC_SOMATODENDRITIC_COMPARTMENT	791	0.12708	0.025427	0.031224	2.3609e-05	0.401353

Tissue analysis

MAGMA-proprietary tissue analysis, based on a linear regression between zeta scores assigned to all the human genes from the GWAS analysis and tissue-specific gene-expression profiles, highlights several significant brain associations, spanning the basal ganglia, the cerebellum, and the cortex (Figure 2).

Figure 2. MAGMA tissue analysis showing significant associations across basal ganglia, cerebellum, and cortex.

Cell-type analysis

Mouse Brain

The results of cell-type analysis for DropViz level 2 (L2) scRNAseq datasets, from step one to step three, are summarised in Figure 3. In step one (top-left), all the significant regressions are reported, after correction for multiple comparisons. In step two (centre), the independent signals for each dataset are selected. In step three (bottom right), independence across datasets is detected: stars indicate the collinear covariates of the regression model, while the element of row i and column j indicates the PS of cell type j conditioning on cell type i.

Figure 3. Results of cell-type analysis, using the DropViz scRNA-seq datasets for the mouse brain. STR: striatum; GP: Globus Pallidus; SN: Substantia Nigra; CB: cerebellum; FC: Frontal Cortex; PC: Posterior Cortex. Top-left: all the significant regressions, after correction for multiple comparisons. Centre: only independent signals for each dataset. Bottom-right: collinear covariates are indicated by a star in the bottom right.

Below are the significant regressions from step 2. The marker (last column) is the gene most significantly overexpressed in the corresponding cell (column two), differentiating it from the other neurons from the same anatomical region. It helps identify the function of the corresponding cell type.

Dataset	Cell-type	Link_DropViz	Region	Marker
DropViz_STR_level2	Neuron_Gad1Gad2_Drd1-Cxcl14.10_5	(LINK)	Striatum	Gad1
DropViz_GP_level2	Neuron_Gad1Gad2-Th_Adora2a-Th.3_9	(LINK)	Globus Pallidus	Gad1
DropViz_SN_level2	Neuron_Th_Cbln1.4_2	(LINK)	Substantia Nigra	Cbln1
DropViz_CB_level2	Neuron_Slc17a7_Gabra6.1_1	(LINK)	Cerebellum	Slc17a7
DropViz_FC_level2	Neuron_Gad1Gad2_Synpr-Pcdh11x.1_6	(LINK)	Frontal Cortex	Gad1
DropViz_PC_level2	Neuron_Sc17a7_Calb1-Lpl-Penk.2_7	(LINK)	Posterior Cortex	Slc17a7
DropViz_SN_level2	Neuron_Slc17a6.2_1	(LINK)	Substantia Nigra	Slc17a6

While Slc17a6, Slc17a7, and Cbln1 identify mainly excitatory neurons, Gad1 is a marker of inhibitory neurons.

Human brain

The results of cell-type analysis for Siletti (all regions) and Seeker (White Matter) level 2 (L2) scRNAseq datasets, step one and step three, are summarised in Figure 4. There is a positive regression with excitatory neurons of the white matter in both the young and old human brain.

Figure 4. Results of cell-type analysis, using the Siletti (all regions) and Seeker (White matter) scRNA-seq datasets for the human brain. Top-left: all the significant regressions, after correction for multiple comparisons. Bottom-right: collinear covariates are indicated by a star in the bottom right. No collinearity detected.

Discussion

The present meta-GWAS on more than 21,500 ME/CFS cases shows a genetic profile that points to the brain, in particular to glutamatergic neurons and glutamatergic synapses. An increase in glutamatergic signalling would cause excitotoxicity, which is associated with neuronal death. I think this is not documented in ME/CFS and should be ruled out. Another possibility is that ME/CFS results from a reduction in glutamatergic signalling, as suggested by the following lines of evidence. A recent study on long COVID found an increased density of AMPA receptors (Fujimoto Y et al. 2025), which may be an adaptation to reduced glutamatergic signalling. In a survey on 150 ME/CFS patients, 66% reported being less able to tolerate alcohol compared to their pre-illness state (Lily C et al. 2019). This could be explained by the inhibitory effect of alcohol on NMDA receptors: under the hypothesis of deficient glutamatergic transmission, alcohol would be expected to exacerbate the disease. One of the few Mendelian ME/CFS cases reported so far involves a woman carrying a structural variant that leads to increased levels of GABAergic neurosteroids (Oakley J et al. 2023). Increased GABAergic tone may induce symptoms similar to reduced glutamatergic tone, given the interplay between the two systems. A GWAS on 1200 females with self-reported ME/CFS from the UK Biobank pointed to rs2017696, see (this table). This variant is associated with altered expression of SLC25A15 (ornithine transporter type I) in several tissues, the brain included. In particular, cases are associated with the reference allele, which is associated with reduced expression of SLC25A15 in the caudate, cingulate cortex, and other brain regions (see Figure 5). What happens if ornithine cannot enter the mitochondria in the brain? There is a local accumulation of ammonia, and the only way to remove it is to consume glutamate. Therefore, the available glutamate for neurotransmission is used for ammonia detoxification, and glutamatergic signalling is weakened. This may be one of dozens of possible mechanisms that lead to glutamatergic deficit.

While a deficit in glutamatergic transmission may explain baseline symptoms, post-exertional malaise (PEM) may be linked to the life cycle of glutamatergic synapses, a dynamic process that spans hours to days (Lisman J 2017).

If the glutamatergic hypothesis of ME/CFS holds, there are several viable therapies. This is a list of possible drugs.

Drug	Mechanism	DrugBank
D-serine	NMDA co-agonist	(DB03929)
Aniracetam	AMPA positive allosteric modulator	(DB04599)
Sarcosine	Glycine reuptake inhibitor	(DB12519)
Ketamine	NMDA antagonist	(DB01221)
Esketamine	NMDA antagonist	(DB11823)

Figure 5. Differential expression of SLC25A15 in three brain regions associated with rs2017696, according to the GTEx portal.

About the pipeline

Repository components

META_main_2.R – orchestrates package installation, summary statistic munging, and harmonisation steps for each cohort.
META_func.R – helper functions that download input datasets, build the project directory structure, and load configuration values.
META_config.yml – user-editable configuration file for p-value, minor allele frequency, and imputation quality cut-offs as well as cohort selection flags.
LICENSE – licensing information for the project.

Requirements

R (≥ 4.0 recommended)
Access to Bioconductor and CRAN to install dependencies
Internet connection to retrieve cohort summary statistics from OSF, GWAS Catalogue, and UK Biobank distribution endpoints

The main script installs the required packages automatically, including:

BiocManager, MungeSumstats, BSgenome.Hsapiens.NCBI.GRCh38, SNPlocs.Hsapiens.dbSNP155.GRCh38, SNPlocs.Hsapiens.dbSNP155.GRCh37, and BSgenome.Hsapiens.1000genomes.hs37d5
httr, R.utils, data.table, gtexr, otargen, yaml, corrplot

You may prefer to install these ahead of time to avoid repeated installations when running the pipeline on a cluster.

Configuration

Edit META_config.yml to control filtering thresholds and which cohorts to include. The file exposes:

filters: genome-wide significance thresholds for common and uncommon variants (p_value_common, p_value_uncommon), Hardy–Weinberg equilibrium cut-off (hwe_p_value), minimum allele frequency thresholds (maf_common, maf_uncommon), and an imputation INFO score cut-off (info_cutoff).
samples: binary flags (1 = include, 0 = skip) for DecodeME, MVP, UK Biobank (Neale Lab), UK Biobank (EIB), and FinnGen cohorts.

Update these values before launching the analysis to ensure only the desired datasets are downloaded and processed.

Running the pipeline

Open an R session in the repository root (or ensure the working directory is set to the project).
Run the main script:
```
Rscript META_main.R
```
The script creates required directories (Data/, Munged/) and downloads each cohort selected in META_config.yml.
Summary statistics are munged via MungeSumstats into harmonised TSV.GZ files stored under Munged/.

Depending on connectivity and download sizes, the first execution can take a while. Temporary outputs and downloaded data are preserved for reuse.

Outputs

Harmonised summary statistics per cohort saved in the Munged/ directory (e.g., Munged/DME_GRCh38.tsv.gz).
Downloaded raw summary statistics and metadata stored beneath Data/ in cohort-specific subdirectories.
Meta-GWAS summary statistics in both GrCh37 and GRCh38 stored beneath Output/.

These munged files are intended for subsequent meta-analysis steps, which are not included in this repository.

License

This project is released under the terms described in the included LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Meta-analysis of three GWAS summary statistics including 21,500 ME/CFS cases

Abstract

Methods

Data sources

Filtering criteria

Munging and Liftover

Meta-analysis

Derivation of SE

FUMA: gene-mapping and tissue analysis

FUMA: cell-type analysis

Results

Summary statistics

Risk loci and candidate genes

Gene-set analysis

Tissue analysis

Cell-type analysis

Mouse Brain

Human brain

Discussion

About the pipeline

Repository components

Requirements

Configuration

Running the pipeline

Outputs

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
GWAS_METAL_DME_MVP_UKBEIB_GRCh38.pdf		GWAS_METAL_DME_MVP_UKBEIB_GRCh38.pdf
Human_Brain_step1_2_summary.tsv		Human_Brain_step1_2_summary.tsv
LICENSE		LICENSE
META_config.yml		META_config.yml
META_func.R		META_func.R
META_main_2.R		META_main_2.R
Mouse_Brain_step1_2_summary.tsv		Mouse_Brain_step1_2_summary.tsv
README.md		README.md
metal_script.txt		metal_script.txt

Folders and files

Latest commit

History

Repository files navigation

Meta-analysis of three GWAS summary statistics including 21,500 ME/CFS cases

Abstract

Methods

Data sources

Filtering criteria

Munging and Liftover

Meta-analysis

Derivation of SE

FUMA: gene-mapping and tissue analysis

FUMA: cell-type analysis

Results

Summary statistics

Risk loci and candidate genes

Gene-set analysis

Tissue analysis

Cell-type analysis

Mouse Brain

Human brain

Discussion

About the pipeline

Repository components

Requirements

Configuration

Running the pipeline

Outputs

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages