📊 View the Interactive Case Study Website
The interactive tutorial demonstrates DIVAS analysis using pre-computed results:
- Multi-omics data integration using DIVAS
- Component visualization and interpretation
- GO enrichment analysis
- Multi-omics correlation networks (Circos plots)
Note: The tutorial focuses on DIVAS analysis and interpretation. For the complete data preprocessing pipeline (raw data download, quality control, cell type annotation, etc.), see the detailed instructions in the sections below.
Due to file size limitations, large data files are hosted on Zenodo: https://doi.org/10.5281/zenodo.17430294
Required download for tutorial:
divas_results_combined_6block_renamed.rds(419 MB) → place inscRNA_celltyist_analysis/DIVAS_run/DIVAS_Results/
Optional:
all_cells_metadata_complete.csv(57.8 MB) → place inscRNA_celltyist_analysis/celltype_annotation/
All other data files (Combined_*.csv, metadata.rds) are already included in this repository.
This repository contains a complete reproducible case study demonstrating the application of DIVAS (Data Integration via Analysis of Subspaces) to COVID-19 multi-omics data.
DIVAS is a novel statistical method for integrating multiple high-dimensional datasets. This case study applies DIVAS to COVID-19 patient data including:
- Single-cell RNA sequencing (scRNA-seq)
- Single-cell protein expression (CITE-seq)
- Bulk proteomics
- Metabolomics
- Complete Workflow: From raw data processing to final DIVAS analysis
- Reproducible: All scripts with updated paths and dependencies
- Well-documented: Comprehensive README files in each directory
- Cell Type Analysis: Integration with CellTypist for automated annotation
- Multi-timepoint: Analysis of T1 and T2 timepoints (114 samples)
- R (>= 4.0) with DIVAS package
- Python (>= 3.8) with scanpy, pandas, celltypist
- Required R packages: devtools, CVXR
- Required Python packages: scanpy, celltypist, pandas, numpy
- Clone this repository:
git clone https://github.com/ByronSyun/DIVAS_COVID19_CaseStudy.git
cd DIVAS_COVID19_CaseStudy- Install DIVAS package:
library(devtools)
install_github("ByronSyun/DIVAS_Develop/pkg", ref = "main")- Install Python dependencies:
pip install scanpy celltypist pandas numpyThis case study follows a 4-phase workflow. Each phase has detailed instructions in its respective directory:
cd data_acquisition
# See data_acquisition/README.md for detailed instructions
bash download_arrayexpress_data.shcd preprocessing
# See preprocessing/README.md for complete pipeline
# Processes raw data → 120-patient standardized datasetscd multi_omics_integration
# See multi_omics_integration/README.md for analysis details
Rscript run_divas_analysis.Rcd scRNA_celltyist_analysis
# See scRNA_celltyist_analysis/README.md for complete workflow
# Includes: CellTypist annotation → T1+T2 preparation → DIVAS analysisNote: Each directory contains a comprehensive
README.mdwith detailed step-by-step instructions, troubleshooting guides, and expected outputs.
Note: The interactive tutorial (see link at top) uses pre-computed results from Zenodo and does not require running the preprocessing pipeline below.
If you want to reproduce the full preprocessing workflow from raw data:
Required input data (not included, must be downloaded separately):
- Raw scRNA-seq data (.h5ad format)
- Bulk proteomics data
- Metabolomics data
- Sample metadata
Generated data directories (empty in Git, created during preprocessing):
preprocessing/processed_omics_120/preprocessing/processed_omics_all/scRNA_celltyist_analysis/celltype_annotation/annotated_data_majority_voting/multi_omics_integration/DIVAS_Results/scRNA_celltyist_analysis/DIVAS_run/divas_results/
- Integrates scRNA-seq, sc-proteomics, bulk proteomics, and metabolomics
- 120 samples across two timepoints
- Identifies joint and individual variation patterns
- 114 samples (57 T1 + 57 T2)
- 4 cell types: CD4+ T, CD8+ T, CD14+ Monocytes, NK cells
- 2 bulk omics: proteomics, metabolomics
- 8,634 genes with comprehensive joint component analysis
If you use this code or methodology, please cite:
Prothero, J., ..., Marron J. S. (2024).
Data integration via analysis of subspaces (DIVAS).
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3) - see the LICENSE file for details.