Skip to content

VinitSingroha/Mix2Print

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mix2Print: Learning Material Interaction Physics for identifying parameters of 3D Bioprinting

A challenge for predicting 3D bioprinting parameters using Graph Neural Networks.

License: MIT


πŸ§ͺ What is Bioprinting?

Bioprinting is an additive manufacturing process that functions similarly to 3D printing but uses "bio-inks"β€”materials combined with living cells. Instead of printing plastic or metal, we print tissue-like structures layer-by-layer. This technology is at the forefront of regenerative medicine, aiming to create functional organs, skin grafts, and disease models for drug testing without animal subjects.

The most common method is Extrusion-based Bioprinting, where a syringe-like printhead pushes bio-ink through a needle. Success depends on the perfect balance between material viscosity, cell viability, and the mechanical parameters of the printer.

Bioprinting Flow

🍳 Think of Bioprinting Like Cooking (Seriously)

If you’ve ever cooked a complex dish, you already understand the core problem in bioprinting.

You start with ingredients (biomaterials like Gelatin, Alginate, or Fibrinogen) in specific proportions. You choose how to cook: the heat level, the pressure applied to the "piping bag," and the speed of your hand. If you get it right, the structure holds its shape. If you don't, it’s a messβ€”either too runny, too stiff, or the "cells" (the biological garnish) simply don't survive.

Currently, these "recipes" are scattered across thousands of research papers. This challenge is about learning the recipe logic behind bioprinting using the power of Graph Machine Learning.


πŸ“‹ Challenge Overview

Task

Predict three continuous targets from bioink formulation graphs:

  • Pressure (kPa): Extrusion force
  • Temperature (Β°C): Printing temperature
  • Speed (mm/s): Print head velocity

πŸ“ Graph Specification

Graph Definition

Each formulation is a graph $G_i = (V_i, E_i, X_i)$ where:

  • $V_i$: Biomaterials in formulation $i$
  • $E_i$: Fully connected edges between all materials
  • $X_i \in R^{n_i \times D}$: Node feature matrix (Dimension $D \approx 31$)

Target $y_i \in R^3$: (pressure, temperature, speed)

Graph Data Structure

1️⃣ Adjacency Matrix (Mandatory)

For formulation $i$ with $n$ materials: $A_i \in R^{n_i \times n_i}$

  • Binary connectivity: $A_{ij} = 1$ for all $i, j$ (Fully connected clique).
  • Topology: Represents a mixture where all components potentially interact.
  • Note: While the provided $A$ is binary, participants are encouraged to explore weighted adjacency strategies (e.g., based on concentration differences) closer to the physical reality of mixture interactions.

Files: data/public/train_graphs/graph_{id}_A.npy

2️⃣ Node Feature Matrix X

Each node corresponds to one biomaterial in the formulation. $X_i$ shape: $(n_i \times D)$ where $D = N_{materials} + 1$.

Feature Description Dim
Material Identity One-Hot Encoding of material type ~30
Concentration Normalized concentration in formulation 1

Files: data/public/train_graphs/graph_{id}_X.npy

3️⃣ Targets

Graph-level regression targets:

  • Pressure (kPa)
  • Temperature (Β°C)
  • Speed (mm/s)

Files: data/public/train_graphs/graph_{id}_y.npy (Train only)

πŸ“‚ Dataset Provided

The processed graph dataset (.npy matrices) is already generated and available in:

  • data/public/train_graphs/
  • data/public/test_graphs/

For transparency, the generation script is included as scripts/build_graph.py.

Dataset

  • 423 formulations from peer-reviewed publications
  • 30 biomaterials (appearing β‰₯5 times each)
  • 303 training / 120 test samples (70/30 stratified group split)
  • Real-world scientific data with natural complexity

Evaluation Metric

NMAE = (1/3) Γ— [MAE_pressure/1496 + MAE_temperature/228 + MAE_speed/90]

Lower is better. Range: 0.0 (perfect) to 1.0+ (poor).

Baseline Performance

  • Random Forest: NMAE = 0.060

πŸš€ Quick Start

1. Get the Data

git clone <this-repo>
cd bioink-gnn-challenge
pip install -r requirements.txt

Graph data (ready to use) is in data/public/:

  • train_graphs/ β€” .npy files: graph_{id}_A.npy, graph_{id}_X.npy, graph_{id}_y.npy
  • test_graphs/ β€” .npy files: graph_{id}_A.npy, graph_{id}_X.npy
  • node_vocabulary.txt β€” Material index mapping
  • train.csv β€” Original CSV (for reference)
  • test_nodes.csv β€” Test IDs
  • sample_submission.csv β€” Example submission format

2. Train Your Model

Train on train.csv. Since there is no official validation set, you should create your own split (e.g., 80/20) from the training data to evaluate your model locally.

3. Generate Predictions

Create predictions.csv for test set:

id,pressure,temperature,speed
340,150.5,25.0,5.0
341,800.0,155.0,1.2
...
399,45.0,23.0,8.5

4. Submit (Secure)

Since PRs are public, you must encrypt your submission to keep your predictions private.

  1. Encrypt your CSV:

    python scripts/encrypt_submission.py predictions.csv --team YourTeamName
    # Output: submission.enc (This file is safe to share)
  2. Upload to GitHub: Create a folder structure with your encrypted file:

    submissions/inbox/<YourTeamName>/
    └── submission.enc
    
  3. Open Pull Request: Target the master branch. The bot will decrypt it securely, score it, and close the PR.

Submission Policy (Strict)

  • 🚨 One Submission Only: Each participant (GitHub user) is allowed exactly ONE submission.
  • Privacy: Your submission.enc is decrypted only by the scoring server. The plaintext CSV is never stored in the repo.
  • Format: Submit only submission.enc. Do NOT upload predictions.csv.

πŸ“Š Leaderboard

View the leaderboard:

Rankings are by NMAE (ascending) - lower is better.


πŸ”¬ Data Details

Bioink Components

30 common biomaterials across categories:

  • Alginates: Alginate, Alginate Methacrylated, Alginate Dialdehyde
  • Gelatins: Gelatin, Gelatin Methacrylated (GelMA)
  • Polymers: PCL, PLGA, PEG derivatives
  • Natural: Collagen, Chitosan, Hyaluronic Acid
  • Ceramics: Hydroxyapatite, Ξ²-TCP, Bioactive Glass

Target Distributions

Target Min Max Distribution
Pressure 4 kPa 1500 kPa Log-distributed, bimodal
Temperature 2Β°C 230Β°C Bimodal (room temp vs melt)
Speed 0.02 mm/s 90 mm/s Many near-zero values

Data Preprocessing

  • Ranges converted to means: "70-80 kPa" β†’ 75.0 kPa
  • Unit standardization: All pressure in kPa, temp in Β°C, speed in mm/s
  • Stratified split: By temperature regime (hydrogel vs thermoplastic)

πŸ—οΈ Repository Structure

bioink-gnn-challenge/
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ .gitignore                   # Excludes private data
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ public/                  # Visible to participants
β”‚   β”‚   β”œβ”€β”€ train.csv
β”‚   β”‚   β”œβ”€β”€ test_features.csv
β”‚   β”‚   β”œβ”€β”€ test_nodes.csv
β”‚   β”‚   β”œβ”€β”€ train_graphs/        # A, X, y matrices (npy)
β”‚   β”‚   β”œβ”€β”€ test_graphs/         # A, X matrices (npy)
β”‚   β”‚   └── node_vocabulary.txt  # Material list
β”‚
β”œβ”€β”€ scripts/
β”‚   └── build_graph.py          # Script used to generate graphs
β”‚
β”œβ”€β”€ competition/                 # Evaluation code
β”‚   β”œβ”€β”€ data_utils.py           # Parsing & preprocessing
β”‚   β”œβ”€β”€ metrics.py              # NMAE calculation
β”‚   β”œβ”€β”€ validation.py           # Format checking
β”‚   β”œβ”€β”€ evaluate.py             # Scoring script
β”‚   └── render_leaderboard.py   # Generate markdown
β”‚
β”œβ”€β”€ baselines/                   # Reference implementations
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ gnn_utils.py            # Graph data loader (npy β†’ PyG)
β”‚   β”œβ”€β”€ mlp_baseline.py         # MLP (ignores graph structure)
β”‚   β”œβ”€β”€ gcn_baseline.py         # Graph Convolutional Network
β”‚   β”œβ”€β”€ gat_baseline.py         # Graph Attention Network
β”‚   └── random_forest_baseline.py # Tabular baseline
β”‚
β”œβ”€β”€ submissions/
β”‚   └── inbox/                   # PR submissions go here
β”‚
β”œβ”€β”€ leaderboard/
β”‚   β”œβ”€β”€ leaderboard.csv         # Authoritative scores
β”‚   └── leaderboard.md          # Auto-generated table
β”‚
β”œβ”€β”€ docs/                        # GitHub Pages
β”‚   β”œβ”€β”€ leaderboard.html
β”‚   β”œβ”€β”€ leaderboard.css
β”‚   └── leaderboard.js
β”‚
└── .github/workflows/
    β”œβ”€β”€ score_submission.yml     # Auto-score PRs
    └── update_leaderboard.yml   # Update on merge

πŸ“– Dataset link

The raw dataset for the data used in this challenge can be found at [https://cect.umd.edu/database]


πŸ“„ License

MIT License - See LICENSE for details.


πŸ™‹ Support

  • Issues: Use GitHub Issues for bugs/questions
  • Discussions: Use GitHub Discussions for general chat
  • Email: [vineet10338@gmail.com] for private inquiries

Good luck! πŸš€

About

Learning Material Interaction Physics for identifying parameters of 3D Bioprinting.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages