Skip to content

collaborativebioinformatics/OncoLearn

Repository files navigation

OncoLearn

Python R uv renv License

A comprehensive toolkit for cancer genomics analysis and biomarker discovery using RNA-seq data from The Cancer Genome Atlas (TCGA). OncoLearn leverages machine learning and statistical methods for cancer subtyping and identifying potential diagnostic and prognostic markers.

Contributors

Aryan Sharan Guda (aryanshg@andrew.cmu.edu), Seungjin Han (seungjih@andrew.cmu.edu), Seohyun Lee (seohyun4@andrew.cmu.edu), Yosen Lin (yosenl@andrew.cmu.edu), Isha Parikh (parikh.i@northeastern.edu), Diya Patidar (dpatidar@andrew.cmu.edu), Arunannamalai Sujatha Bharath Raj (asujatha@andrew.cmu.edu), Andrew Scouten (yzb2@txstate.edu), Jeffrey Wang (jdw2@andrew.cmu.edu), Qiyu (Charlie) Yang (qiyuy@andrew.cmu.edu), Xinru Zhang (mayzxr2203@gmail.com), River Zhu (riverz@andrew.cmu.edu), Zhaoyi (Zoey) You (zhaoyiyou.zoey@gmail.com), Heena Dalal (dalalhina@gmail.com/heena.dalal@kcl.ac.uk)

Table of Contents

Quickstart

  1. Install Docker Desktop from docker.com

  2. Clone and setup:

    git clone https://github.com/collaborativebioinformatics/OncoLearn.git
    cd OncoLearn
    git submodule update --init --recursive
    docker compose up -d
  3. Download sample data:

    docker compose exec dev bash ./scripts/data/download_tcga_brca.sh
  4. Start exploring with the Jupyter notebooks in notebooks/data/

For detailed setup options and local installation, see Getting Started.

Getting Started

Prerequisites

This project supports two installation methods:

Option A: Docker (Recommended)

  • Docker Desktop or Docker Engine
  • Docker Compose
  • VSCode with Dev Containers extension (optional but recommended)

Option B: Local Installation

  • Python 3.10+
  • R 4.0+
  • uv - Fast Python package installer and resolver

Option A: Docker Installation (Recommended)

Docker provides a consistent development environment and eliminates dependency and compatibility issues.

  1. Install Docker Desktop:

    • Download from docker.com
    • Or install Docker Engine on Linux
  2. Clone the repository:

    git clone https://github.com/collaborativebioinformatics/OncoLearn.git
    cd OncoLearn
    git submodule update --init --recursive
  3. Start the environment:

    # Build and start the container
    docker compose up -d
  4. Open in VSCode Dev Container (optional):

    • Install the Dev Containers extension
    • Press F1 → "Dev Containers: Reopen in Container"
    • VSCode will connect to the container with all extensions and tools configured
    • Jupyter notebooks (.ipynb files) will work natively in VSCode without a browser

Useful Docker Commands:

# Stop containers
docker compose down

# Rebuild after dependency changes
docker compose build

# Execute commands in container
docker compose exec dev bash

# Add new Python packages
docker compose exec dev uv add <package-name>

# View running containers
docker compose ps

Option B: Local Installation

  1. Install uv (if not already installed) from here.

  2. Clone the repository:

    git clone https://github.com/collaborativebioinformatics/OncoLearn.git
    cd OncoLearn
    git submodule update --init --recursive
  3. Install Python dependencies:

    # Install base dependencies
    uv sync
    
    # Or install with PyTorch extras (choose one based on your hardware):
    uv sync --extra cpu          # CPU-only version
    uv sync --extra cu128        # CUDA 12.8
    uv sync --extra cu130        # CUDA 13.0
    uv sync --extra rocm         # AMD ROCm
  4. Install R dependencies with renv:

    # Install renv if not already installed
    install.packages("renv")
    
    # Restore R package dependencies
    renv::restore()

Recommended VSCode Extensions

For the best development experience, we recommend installing the following VSCode extensions:

  • Python (ms-python.python) - IntelliSense, debugging, and linting for Python
  • Ruff (charliermarsh.ruff) - Fast Python linter and formatter
  • autopep8 (ms-python.autopep8) - Python code formatter following PEP 8 style guide
  • R (REditorSupport.r) - R language support with syntax highlighting and code execution
  • Jupyter (ms-toolsai.jupyter) - Interactive Jupyter notebook support
  • Dev Containers (ms-vscode-remote.remote-containers) - For Docker development (if using Docker)

Documentation

Comprehensive guides and documentation are available in the docs/ folder:

Project Structure

  • data/ - Data storage directory (downloaded TCGA datasets)
  • docs/ - Project documentation and guides
  • notebooks/ - Jupyter notebooks for data exploration and analysis
  • scripts/ - Data download and preprocessing scripts
  • src/oncolearn/ - Core Python package for cancer genomics analysis
  • src/multimodal/ - Multimodal learning framework for integrating multi-omic data
  • configs/ - Configuration files for training and testing

Additional Resources

For more information on downloading and working with TCGA data, see the TCGA Data Download Guide.


License

This project is licensed under the MIT License - see the LICENSE file for details.

AI Disclosure

Artificial intelligence tools, including large language models (LLMs), were used during the development of this project to support writing, clarify technical concepts, and assist in generating code snippets. These tools served as an aid for idea refinement, debugging, and improving the readability of explanations and documentation. All AI-generated text and code were thoroughly reviewed, verified for correctness, and understood in full before being incorporated into this work. The responsibility for all final decisions, interpretations, and implementations remains solely with the contributors.

About

Cancer Biomarker Discovery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 16