Skip to content

biocypher/iggytop

Repository files navigation

IggyTop: Immunological Graph Yielding Top receptor-epitope pairings

Python Version License

figure1

This repository uses BioCypher framework for harmonization of databases with existing immunoreceptor-epitope matching information.

BioCypher is designed to facilitate the standardized integration of heterogeneous data sources through a regulated framework. The BioCypher framework implements a modular architecture where each data source is processed through dedicated transformation scripts called adapters. These adapters serve as the primary interface between raw data sources and the BioCypher knowledge graph infrastructure. This project provides adapters for the following databases:

These include data from both, original sources, extracting data directly from studies, such es McPAS-TCR, and from already pulled sources such as TRAIT. A script is provided to build a knowledge graph with all these adapters. On a consumer laptop, building the full graph typically takes 20-30 mins.

The final output is the IggyTop database, which integrates immunoreceptor-epitope matching information from all supported data sources in the unified list of AIRR cells.

Node and Edge Types

Nodes

  • tra sequence
  • trb sequence
  • igh sequence
  • igl sequence
  • epitope

Edges

  • alpha sequence to beta sequence association
  • heavy sequence to light sequence association
  • t cell receptor sequence to epitope association
  • b cell receptor sequence to epitope association

Prerequisites

  • uv for dependency management
  • docker optional for containerization

Installation

  1. Clone the repository:

    git clone https://github.com/biocypher/iggytop.git
    cd iggytop
  2. Install dependencies using uv:

    # Core installation (includes dev dependencies)
    uv sync
    
    # Include documentation and Jupyter tools
    uv sync --group docs
  3. You are ready to go!

    uv run create_knowledge_graph.py

More information can be found in the documentation (see below).

Pipeline

  • create_knowledge_graph.py: the main script that orchestrates the pipeline. It brings together the BioCypher package with the data sources. It calls the io.create_knowledge_graph() function which creates a knowledge graph including all available databases and saves it to airr format in a json file.

  • create_anndata.py: this script can be used to obtain the harmonized, merged (and deduplicated) data from all (or selected) available databases in anndata format. It will initialize the adapters but not generate the knowledge graph. the main purpose is integration of the available data into Scirpy.

  • src/iggytop/adapters contains modules that define the adapter to the data source.

  • src/iggytop/config/schema_config.yaml: a configuration file that defines the schema of the knowledge graph. It is used by BioCypher to map the data source to the knowledge representation on the basis of ontology (see this part of the BioCypher tutorial).

  • src/iggytop/config/biocypher_config.yaml: a configuration file that defines some BioCypher parameters, such as the mode, the separators used, and other options. More on its use can be found in the Documentation.

Documentation

This repository uses Sphinx for documentation.

Building the Documentation

To build the documentation, ensure you have the docs dependency group installed:

uv sync --group docs

Then, execute the following command:

uv run ./update_docs.sh

This will generate the documentation in the docs/build directory.

Hosting the Documentation Locally

To host the documentation locally, run:

uv run python3 -m http.server --directory docs/build 8000

You can then access the documentation in your browser at http://localhost:8000.

Note for docstrings: The Sphinx's autodoc and napoleon extensions expect reStructuredText (reST) format by default, laso make sure to use Google-style headers (Args:, Returns:, Raises:).

🐳 Docker

This repo also contains a docker compose workflow to create the example database using BioCypher and load it into a dockerised Neo4j instance automatically. To run it, simply execute

docker compose up -d --build

in the root directory of the project. The example instance consists of the TCR3d database only as it is small enough to visualize, for other database compositions, just edit the create_knowledge_graph_docker.py script to your needs. This will start up a single (detached) docker container with a Neo4j instance that contains the knowledge graph built by BioCypher as the DB docker, which you can connect to and browse at localhost:7474. Authentication is set to neo4j/neo4jpassword by default and can be modified in the docker_variables.env file.

Open http://localhost:7474 to access the neo4j database. You can now run queries against the database. To get a visual representation of the tcr3d knowledge grraph constructed by iggytop, run the following CYPHER query:

MATCH (n) return n

The biocypher_docker_config.yaml file is used instead of the biocypher_config.yaml. Everything else is the same as in the local setup. The first container installs and runs the BioCypher pipeline, and the second container installs and runs Neo4j. The files created by BioCypher in the first container are copied and automatically imported into the DB in the second container.

Scirpy integration

This project helps generating the anndata versions of all the Scirpy reference databases supported by Iggytop. The Anndata objects are stored in h5ad file format. This can be replicated by running the create_anndata script (while selecting the databases of interest using the variable adapters_to_include)

uv run create_anndata.py

Note: this is a wip

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or create an Issue if you discover any problems.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7

Languages