A challenge for predicting 3D bioprinting parameters using Graph Neural Networks.
Bioprinting is an additive manufacturing process that functions similarly to 3D printing but uses "bio-inks"βmaterials combined with living cells. Instead of printing plastic or metal, we print tissue-like structures layer-by-layer. This technology is at the forefront of regenerative medicine, aiming to create functional organs, skin grafts, and disease models for drug testing without animal subjects.
The most common method is Extrusion-based Bioprinting, where a syringe-like printhead pushes bio-ink through a needle. Success depends on the perfect balance between material viscosity, cell viability, and the mechanical parameters of the printer.
If youβve ever cooked a complex dish, you already understand the core problem in bioprinting.
You start with ingredients (biomaterials like Gelatin, Alginate, or Fibrinogen) in specific proportions. You choose how to cook: the heat level, the pressure applied to the "piping bag," and the speed of your hand. If you get it right, the structure holds its shape. If you don't, itβs a messβeither too runny, too stiff, or the "cells" (the biological garnish) simply don't survive.
Currently, these "recipes" are scattered across thousands of research papers. This challenge is about learning the recipe logic behind bioprinting using the power of Graph Machine Learning.
Predict three continuous targets from bioink formulation graphs:
- Pressure (kPa): Extrusion force
- Temperature (Β°C): Printing temperature
- Speed (mm/s): Print head velocity
Each formulation is a graph
-
$V_i$ : Biomaterials in formulation$i$ -
$E_i$ : Fully connected edges between all materials -
$X_i \in R^{n_i \times D}$ : Node feature matrix (Dimension$D \approx 31$ )
Target
For formulation
-
Binary connectivity:
$A_{ij} = 1$ for all$i, j$ (Fully connected clique). - Topology: Represents a mixture where all components potentially interact.
-
Note: While the provided
$A$ is binary, participants are encouraged to explore weighted adjacency strategies (e.g., based on concentration differences) closer to the physical reality of mixture interactions.
Files: data/public/train_graphs/graph_{id}_A.npy
Each node corresponds to one biomaterial in the formulation.
| Feature | Description | Dim |
|---|---|---|
| Material Identity | One-Hot Encoding of material type | ~30 |
| Concentration | Normalized concentration in formulation | 1 |
Files: data/public/train_graphs/graph_{id}_X.npy
Graph-level regression targets:
- Pressure (kPa)
- Temperature (Β°C)
- Speed (mm/s)
Files: data/public/train_graphs/graph_{id}_y.npy (Train only)
The processed graph dataset (.npy matrices) is already generated and available in:
data/public/train_graphs/data/public/test_graphs/
For transparency, the generation script is included as scripts/build_graph.py.
- 423 formulations from peer-reviewed publications
- 30 biomaterials (appearing β₯5 times each)
- 303 training / 120 test samples (70/30 stratified group split)
- Real-world scientific data with natural complexity
NMAE = (1/3) Γ [MAE_pressure/1496 + MAE_temperature/228 + MAE_speed/90]
Lower is better. Range: 0.0 (perfect) to 1.0+ (poor).
- Random Forest: NMAE = 0.060
git clone <this-repo>
cd bioink-gnn-challenge
pip install -r requirements.txtGraph data (ready to use) is in data/public/:
train_graphs/β.npyfiles:graph_{id}_A.npy,graph_{id}_X.npy,graph_{id}_y.npytest_graphs/β.npyfiles:graph_{id}_A.npy,graph_{id}_X.npynode_vocabulary.txtβ Material index mappingtrain.csvβ Original CSV (for reference)test_nodes.csvβ Test IDssample_submission.csvβ Example submission format
Train on train.csv. Since there is no official validation set, you should create your own split (e.g., 80/20) from the training data to evaluate your model locally.
Create predictions.csv for test set:
id,pressure,temperature,speed
340,150.5,25.0,5.0
341,800.0,155.0,1.2
...
399,45.0,23.0,8.5Since PRs are public, you must encrypt your submission to keep your predictions private.
-
Encrypt your CSV:
python scripts/encrypt_submission.py predictions.csv --team YourTeamName # Output: submission.enc (This file is safe to share) -
Upload to GitHub: Create a folder structure with your encrypted file:
submissions/inbox/<YourTeamName>/ βββ submission.enc -
Open Pull Request: Target the
masterbranch. The bot will decrypt it securely, score it, and close the PR.
Submission Policy (Strict)
- π¨ One Submission Only: Each participant (GitHub user) is allowed exactly ONE submission.
- Privacy: Your
submission.encis decrypted only by the scoring server. The plaintext CSV is never stored in the repo. - Format: Submit only
submission.enc. Do NOT uploadpredictions.csv.
View the leaderboard:
- Static: leaderboard/leaderboard.md
- Interactive: Enable GitHub Pages β
/docs/leaderboard.html
Rankings are by NMAE (ascending) - lower is better.
30 common biomaterials across categories:
- Alginates: Alginate, Alginate Methacrylated, Alginate Dialdehyde
- Gelatins: Gelatin, Gelatin Methacrylated (GelMA)
- Polymers: PCL, PLGA, PEG derivatives
- Natural: Collagen, Chitosan, Hyaluronic Acid
- Ceramics: Hydroxyapatite, Ξ²-TCP, Bioactive Glass
| Target | Min | Max | Distribution |
|---|---|---|---|
| Pressure | 4 kPa | 1500 kPa | Log-distributed, bimodal |
| Temperature | 2Β°C | 230Β°C | Bimodal (room temp vs melt) |
| Speed | 0.02 mm/s | 90 mm/s | Many near-zero values |
- Ranges converted to means: "70-80 kPa" β 75.0 kPa
- Unit standardization: All pressure in kPa, temp in Β°C, speed in mm/s
- Stratified split: By temperature regime (hydrogel vs thermoplastic)
bioink-gnn-challenge/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ .gitignore # Excludes private data
β
βββ data/
β βββ public/ # Visible to participants
β β βββ train.csv
β β βββ test_features.csv
β β βββ test_nodes.csv
β β βββ train_graphs/ # A, X, y matrices (npy)
β β βββ test_graphs/ # A, X matrices (npy)
β β βββ node_vocabulary.txt # Material list
β
βββ scripts/
β βββ build_graph.py # Script used to generate graphs
β
βββ competition/ # Evaluation code
β βββ data_utils.py # Parsing & preprocessing
β βββ metrics.py # NMAE calculation
β βββ validation.py # Format checking
β βββ evaluate.py # Scoring script
β βββ render_leaderboard.py # Generate markdown
β
βββ baselines/ # Reference implementations
β βββ README.md
β βββ gnn_utils.py # Graph data loader (npy β PyG)
β βββ mlp_baseline.py # MLP (ignores graph structure)
β βββ gcn_baseline.py # Graph Convolutional Network
β βββ gat_baseline.py # Graph Attention Network
β βββ random_forest_baseline.py # Tabular baseline
β
βββ submissions/
β βββ inbox/ # PR submissions go here
β
βββ leaderboard/
β βββ leaderboard.csv # Authoritative scores
β βββ leaderboard.md # Auto-generated table
β
βββ docs/ # GitHub Pages
β βββ leaderboard.html
β βββ leaderboard.css
β βββ leaderboard.js
β
βββ .github/workflows/
βββ score_submission.yml # Auto-score PRs
βββ update_leaderboard.yml # Update on merge
The raw dataset for the data used in this challenge can be found at [https://cect.umd.edu/database]
MIT License - See LICENSE for details.
- Issues: Use GitHub Issues for bugs/questions
- Discussions: Use GitHub Discussions for general chat
- Email: [vineet10338@gmail.com] for private inquiries
Good luck! π

