GraPhens

What is GraPhens?

GraPhens is a phenotype graph construction library and training pipeline. It builds graphs from HPO phenotype sets and supports dataset generation and model training in the current Keras+JAX stack.

Project Status

GraPhens uses Keras+JAX for dataset generation and model training, with fixed-shape NPZ graph data consumed through Keras data sequences.

Quick Start

from graphens import GraPhens

# Create a graph from phenotypes
graphens = GraPhens()
graph = graphens.create_graph_from_phenotypes(["HP:0001250", "HP:0001251"])  # Seizure phenotypes

# Export to a format you need
graph_json = graphens.export_graph(graph, format="json")

Core Capabilities

Phenotype lookup

# Search for phenotypes by keyword
seizure_phenotypes = graphens.phenotype_lookup("seizure")
for phenotype in seizure_phenotypes[:3]:
    print(f"{phenotype.id} - {phenotype.name}")

Configurable graph pipeline

# Chain methods for clear, readable configuration
graphens = (GraPhens()
           .with_embedding_model("openai", "text-embedding-3-small")
           .with_augmentation(include_ancestors=True)
           .with_visualization(enabled=True))

# Create and visualize your graph
graph = graphens.create_graph_from_phenotypes(phenotype_ids)
graphens.visualize(graph=graph, phenotypes=graph.metadata["phenotypes"])

Pre-computed embeddings

# Load biomedical embeddings directly from file
graphens = GraPhens().with_lookup_embeddings("data/embeddings/hpo_biobert.pkl")

# Create graph with domain-specific embeddings
graph = graphens.create_graph_from_phenotypes(phenotype_ids)

Export formats

# NetworkX
nx_graph = graphens.export_graph(graph, format="networkx")

# JSON
graphens.export_graph(graph, format="json", output_path="graph.json")

Multi-patient graph creation

patient_data = {
    "patient_1": ["HP:0001250", "HP:0002066"],
    "patient_2": ["HP:0000407", "HP:0001263"],
    # ... more patients
}

# Create graphs for multiple patients
patient_graphs = graphens.create_graphs_from_multiple_patients(patient_data)

Installation

pip install graphens

Dataset Generation (Keras + JAX)

Current training dataset path:

Simulation JSON -> NPZ shards -> Keras/JAX loader

Dataset builder: src/simulation/phenotype_simulation/create_hpo_dataset.py
- Two-pass process: collect max_nodes/max_edges statistics, then write padded NPZ shards.
NPZ shard writer: src/simulation/phenotype_simulation/jax_npz_writer.py
- Stores fixed-shape arrays and masks: x, node_mask, edge_index, edge_mask, y.
Dataset loaders:
- training/datasets/jax_npz_graph_dataset.py
- training/datasets/keras_npz_sequence.py

Example command:

python -m src.simulation.phenotype_simulation.create_hpo_dataset --input <simulated_json> --output-dir <dataset_dir> --shard-size 2048 --create-splits

Validate runtime stack:

KERAS_BACKEND=jax python scripts/validate_jax_stack.py

See docs/dataset_keras_jax.md for schema, workflow, and dependencies.

Keras+JAX Training Notes

Graph samples are pre-sharded as fixed-shape NPZ tensors and consumed via Keras Sequence for model.fit.
Batches use static padded tensors with explicit masks (node_mask, edge_mask) so JAX/XLA can compile stable programs.

TPU Performance Tips

Set backend to JAX: export KERAS_BACKEND=jax.
Use larger --shard-size and training batch_size when memory allows to improve device utilization.
Keep graph tensors fixed-shape (already done via NPZ + masks) so XLA can compile stable TPU programs.
Avoid frequent shape changes between runs; keep max_nodes, max_edges, and model config stable for better compile reuse.

Design Philosophy

GraPhens prioritizes clear APIs, reproducible preprocessing, and explicit training data contracts across the graph construction and model training pipeline.

Learn More

# Examples in the repo show you everything you need
from examples import quick_start, custom_embeddings, visualization_demo

# Methods have helpful docstrings
help(GraPhens.create_graph_from_phenotypes)

Advanced Configuration

# Custom embedding models
graphens.with_embedding_model("tfidf", max_features=512)

# Save configurations for reproducibility
graphens.save_config("my_config.json")
loaded_graphens = GraPhens().with_config_from_file("my_config.json")

Examples

See the examples directory for end-to-end usage examples.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
config		config
data		data
docs		docs
scripts		scripts
src		src
tests		tests
training		training
validation_arena		validation_arena
.gitignore		.gitignore
README.md		README.md
analyze_gnn_latent_space.py		analyze_gnn_latent_space.py
build_biomedical_vector_dbs.py		build_biomedical_vector_dbs.py
build_phenotype_vector_db.py		build_phenotype_vector_db.py
convert_json_to_graphs.py		convert_json_to_graphs.py
dependencies.yml		dependencies.yml
environment.yml		environment.yml
graphens_fnn_train_v1_463670.err		graphens_fnn_train_v1_463670.err
graphens_fnn_train_v1_463670.out		graphens_fnn_train_v1_463670.out
models_config.json		models_config.json
phenotype_processor.py		phenotype_processor.py
plot_gene_phenotype_distribution.py		plot_gene_phenotype_distribution.py
requirements-jax-cpu.txt		requirements-jax-cpu.txt
requirements-jax-cuda12.txt		requirements-jax-cuda12.txt
run_embedding_evaluation.py		run_embedding_evaluation.py
setup_database.py		setup_database.py
validation_bitgenia.csv		validation_bitgenia.csv
validation_mcrd_new_training_dataset.csv		validation_mcrd_new_training_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraPhens

What is GraPhens?

Project Status

Quick Start

Core Capabilities

Phenotype lookup

Configurable graph pipeline

Pre-computed embeddings

Export formats

Multi-patient graph creation

Installation

Dataset Generation (Keras + JAX)

Keras+JAX Training Notes

TPU Performance Tips

Design Philosophy

Learn More

Advanced Configuration

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GraPhens

What is GraPhens?

Project Status

Quick Start

Core Capabilities

Phenotype lookup

Configurable graph pipeline

Pre-computed embeddings

Export formats

Multi-patient graph creation

Installation

Dataset Generation (Keras + JAX)

Keras+JAX Training Notes

TPU Performance Tips

Design Philosophy

Learn More

Advanced Configuration

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages