\begin{algorithm}[H] \caption{Computation of Ordinary Persistence Image (Ord PI)} \label{alg:ord_pi} \begin{algorithmic}[1] \REQUIRE Persistence Diagram $D = { (b_i, d_i) }{i \in I}$, Weight function $w: \mathbb{R}^2 \to \mathbb{R}$, Gaussian kernel parameters $\sigma$, Grid resolution $R$ \ENSURE Persistence Image Vector $V{PI}$
\STATE Define transformation
\STATE Define surface function
\STATE
\RETURN Flatten(
\begin{algorithm}[H]
\caption{Computation of Sixpack Features}
\label{alg:sixpack}
\begin{algorithmic}[1]
\REQUIRE Point set $P = { (x_k, y_k, c_k) }{k=1}^N$ (spatial coordinates and scalar value $c_k$), Grid resolution $R$, Correlation range $r{max}$
\ENSURE Feature Vector
\STATE Let
\STATE
\STATE Calculate Nearest Neighbor Distance Distribution (NND) for centroids of clusters in
\STATE
\RETURN
\begin{algorithm}[H]
\caption{Computation of Mixup Barcodes via Canonical Matching (Wagner et al. 2024)}
\label{alg:mixup_barcode}
\begin{algorithmic}[1]
\REQUIRE Point clouds
\STATE Construct Vietoris-Rips filtrations
\STATE
\FOR{each bar $\gamma = [b_{im}, d_{im}] \in B_{im}$}
\STATE
\RETURN
The experimental framework is organized into three main stages, implemented across separate notebooks:
Notebook: Original Notebooks/Compute OP Features.ipynb
This stage calculates statistical mechanics descriptors based on the spatial distribution of cell types.
-
Input: Raw simulation data (
Pos_*.dat,Types_*.dat) -
Methodology:
-
Angular Distribution Order Parameters (OPs):
-
$\Theta(\theta)$ : Overall angular distribution -
$\Theta_B(\theta), \Theta_O(\theta)$ : Type-specific angular distributions (Blue/Orange) - Measures the directional alignment of neighbors.
-
-
Radial Distribution Function (RDF):
-
$R(r)$ : Overall pair correlation function -
$R_B(r), R_O(r)$ : Type-specific pair correlations - Measures the probability of finding a neighbor at distance
$r$ .
-
-
Angular Distribution Order Parameters (OPs):
- Output: Vector of concatenated OP values.
Notebook: Original Notebooks/Compute TDA Features.ipynb
This stage computes topological descriptors using Persistent Homology.
- Input: Raw simulation data
-
Methodology:
-
Vietoris-Rips Filtration: Constructed on point clouds of:
- Green cells (Type 2)
- Red cells (Type 1)
- All cells combined
-
Ordinary Persistence Images (PIs):
- Computes PIs for Homology dimensions
$H_0, H_1, H_2$ . -
Parameters:
- Weight Function: Linear Persistence (
$w = p$ ) - Kernel: Gaussian (
$\sigma=0.05$ ) - Resolution: Defined by
pixel_sizeandmax_eps
- Weight Function: Linear Persistence (
- Computes PIs for Homology dimensions
-
Vietoris-Rips Filtration: Constructed on point clouds of:
-
Output:
npyfiles containing stacked Persistence Image vectors for each dimension and sub-population.
Notebook: Original Notebooks/Embedding and Classification.ipynb
This stage combines features and evaluates their discriminative power.
- Input: Extracted OP vectors and TDA vectors.
- Methodology:
- Dimensionality Reduction:
- PCA (Principal Component Analysis)
- t-SNE (t-Distributed Stochastic Neighbor Embedding) for 2D visualization.
- Classification:
- Train classifiers (e.g., SVM, Random Forest) on the feature vectors.
- Compare accuracy between OP features and TDA features.
- Dimensionality Reduction:
The experimental framework is systematically organized into three distinct stages: Feature Extraction (Order Parameters), Topological Feature Extraction (TDA), and Comparative Evaluation using supervised learning.
Notebook: Original Notebooks/Compute OP Features.ipynb
This stage quantifies spatial ordering using statistical mechanics descriptors derived from particle positions and types.
-
Input: Raw simulation data (
Pos_*.dat,Types_*.dat) representing cell centers and types (Red/Type 1, Green/Type 2). -
Methodology:
-
Angular Distribution Order Parameters (
$\Theta$ ):- Captures the directional alignment of neighboring cells.
- Computed for the overall population ($\Theta(\theta)$) and type-specific subpopulations (
$\Theta_B(\theta)$ for Blue/Green,$\Theta_O(\theta)$ for Orange/Red).
-
Radial Distribution Function ($R(r)$):
- Measures the probability of finding a particle at a distance
$r$ from a reference particle, normalized by ideal gas density. -
Homotypic RDFs:
$R_B(r)$ (Green-Green),$R_O(r)$ (Red-Red). -
Heterotypic RDFs:
$R(r)$ (All-All), potentially$R_{BO}(r)$ (Cross-correlations).
- Measures the probability of finding a particle at a distance
-
Angular Distribution Order Parameters (
-
Output: A high-dimensional feature vector concatenated from discretized
$\Theta(\theta)$ and$R(r)$ distributions.
Notebook: Original Notebooks/Compute TDA Features.ipynb
This stage extracts robust topological features using Persistent Homology and vectorizes them into Persistence Images (PIs).
- Input: Point clouds of Green cells, Red cells, and All cells.
-
Methodology:
-
Vietoris-Rips Filtration:
- Constructed independently for three point cloud sets: Green (Type 2), Red (Type 1), and Combined.
-
Persistent Homology Calculation:
- Computes persistence diagrams
$D_k$ for homology dimensions$k \in {0, 1, 2}$ . -
$H_0$ : Connected components. -
$H_1$ : Loops/Cycles. -
$H_2$ : Voids/Cavities.
- Computes persistence diagrams
-
Persistence Image (PI) Vectorization:
- Converts persistence diagrams into vector-space compatible images.
-
Metric Settings:
-
Weight Function: Linear persistence weighting (
$w(b, p) = p$ ) to emphasize high-persistence features. -
Kernel: Gaussian kernel with
$\sigma = 0.05$ . -
Resolution: Pixel size
$p_{res} = 0.1$ .
-
Weight Function: Linear persistence weighting (
-
Filtration Ranges:
-
Dimension 0: Birth
$[0, 1]$ , Persistence$[0, 10]$ . -
Dimension 1 & 2: Birth
$[0, 10]$ , Persistence$[0, 5]$ (Max$\epsilon = 10$ ).
-
Dimension 0: Birth
-
Vietoris-Rips Filtration:
-
Output: Stacked vectors of Persistence Images for
$H_0, H_1, H_2$ across all subpopulations.
Notebook: Sixpack_Chroma_비교평가.ipynb (and Embedding and Classification.ipynb)
This phase evaluates the discriminative power of the extracted features using diverse classifiers and rigorous validation protocols.
-
Dimensionality Reduction:
- PCA (Principal Component Analysis): Used for initial feature compression and visualization.
- PHATE: Applied for manifold visualization to reveal intrinsic data geometry.
-
Classification Models: A suite of classifiers is trained to benchmark feature performance:
-
Support Vector Machines (SVM):
-
RBF Kernel: Tested with
$C \in {0.5, 1.0, 2.0}$ andgamma='scale'. -
Linear Kernel: Tested with
$C=1.0$ .
-
RBF Kernel: Tested with
-
Random Forest:
- Ensemble of 100 decision trees (
n_estimators=100,random_state=42).
- Ensemble of 100 decision trees (
-
Support Vector Machines (SVM):
-
Validation Protocol:
-
Stratified k-Fold Cross-Validation:
$k=5$ splits with shuffling (random_state=42) to ensure class balance in every fold.
-
Stratified k-Fold Cross-Validation:
-
Performance Metrics:
- Strict Accuracy: Standard classification accuracy.
- Soft Accuracy: Custom metric that accepts adjacent phases as correct predictions (accounting for continuous phase transitions in the simulation space).
- F1-Score: Weighted average F1-score to handle potential class imbalances.
- Reporting: Mean and Standard Deviation of F1-scores across folds to assess model stability.