Skip to content

Latest commit

 

History

History
218 lines (169 loc) · 10.6 KB

File metadata and controls

218 lines (169 loc) · 10.6 KB

1. Ordinary Persistence Image (Ord PI) Calculation

\begin{algorithm}[H] \caption{Computation of Ordinary Persistence Image (Ord PI)} \label{alg:ord_pi} \begin{algorithmic}[1] \REQUIRE Persistence Diagram $D = { (b_i, d_i) }{i \in I}$, Weight function $w: \mathbb{R}^2 \to \mathbb{R}$, Gaussian kernel parameters $\sigma$, Grid resolution $R$ \ENSURE Persistence Image Vector $V{PI}$

\STATE Define transformation $T: \mathbb{R}^2 \to \mathbb{R}^2$ as $T(b, d) = (b, d-b)$; \STATE $P \leftarrow \emptyset$; \hfill \COMMENT{▷ Transformed points (birth, persistence)} \FOR{each pair $(b_i, d_i) \in D$} \IF{$d_i < \infty$} \STATE $P \leftarrow P \cup { T(b_i, d_i) }$; \ENDIF \ENDFOR

\STATE Define surface function $\rho: \mathbb{R}^2 \to \mathbb{R}$ initialized to 0; \FOR{each point $u = (x_u, y_u) \in P$} \STATE $w_u \leftarrow w(x_u, y_u)$; \hfill \COMMENT{▷ Calculate weight (e.g., $w_u = y_u$)} \STATE $\rho(z) \leftarrow \rho(z) + w_u \cdot \frac{1}{2\pi\sigma^2} e^{-\frac{|z-u|^2}{2\sigma^2}}$; \hfill \COMMENT{▷ Sum of weighted Gaussian kernels} \ENDFOR

\STATE $V_{PI} \leftarrow \text{Array of size } R \times R$; \STATE Define grid pixels ${ \pi_{j,k} }{1 \le j,k \le R}$; \FOR{each pixel $\pi{j,k}$} \STATE $V_{PI}[j, k] \leftarrow \iint_{\pi_{j,k}} \rho(z) , dz$; \hfill \COMMENT{▷ Discretize by integration over pixel area} \ENDFOR

\RETURN Flatten($V_{PI}$); \end{algorithmic} \end{algorithm}


2. Sixpack Data Calculation

\begin{algorithm}[H] \caption{Computation of Sixpack Features} \label{alg:sixpack} \begin{algorithmic}[1] \REQUIRE Point set $P = { (x_k, y_k, c_k) }{k=1}^N$ (spatial coordinates and scalar value $c_k$), Grid resolution $R$, Correlation range $r{max}$ \ENSURE Feature Vector $V_{Sixpack}$

\STATE Let $T = { (x, y) \mid c &gt; c_{th} }$ and $V = { (x, y) \mid c \le c_{th} }$; \hfill \COMMENT{▷ Active/Passive phases based on threshold}

\STATE $F_{void} \leftarrow$ Area fraction of voids (connected components of $V$); \STATE $N_{clusters} \leftarrow$ Number of connected components of $T$; \STATE $L_{interface} \leftarrow$ Total length of boundary between $T$ and $V$;

\STATE Calculate Nearest Neighbor Distance Distribution (NND) for centroids of clusters in $T$; \STATE Calculate Pair Correlation Function $g(r)$ for $r \in (0, r_{max}]$; \hfill \COMMENT{▷ Probability of finding a point at distance $r$}

\STATE $V_{Sixpack} \leftarrow [F_{void}, N_{clusters}, L_{interface}, \text{Mean}(NND), \text{Var}(NND), \int g(r)dr]$; \hfill \COMMENT{▷ Combine scalar metrics}

\RETURN $V_{Sixpack}$; \end{algorithmic} \end{algorithm}


3. Mixup Barcode Calculation

\begin{algorithm}[H] \caption{Computation of Mixup Barcodes via Canonical Matching (Wagner et al. 2024)} \label{alg:mixup_barcode} \begin{algorithmic}[1] \REQUIRE Point clouds $A, B$, Filtration parameter $\epsilon$ \ENSURE Mixup Barcode $B_{mixup}$

\STATE Construct Vietoris-Rips filtrations $K_A, K_B$ from $A, B$; \STATE Apply infinitesimal perturbation $\eta$ to all filtration values; \hfill \COMMENT{▷ Ensure function values are unique} \STATE $B_{dom} \leftarrow$ Compute Persistent Homology of $K_A$ (e.g., using standard reduction); \hfill \COMMENT{▷ Domain Persistence} \STATE $B_{im} \leftarrow$ Compute Persistent Homology of image persistence $im(f: K_A \rightarrow K_{A \cup B})$; \hfill \COMMENT{▷ Image Persistence}

\STATE $B_{mixup} \leftarrow \emptyset$; \hfill \COMMENT{▷ Initialize resulting barcode}

\FOR{each bar $\gamma = [b_{im}, d_{im}] \in B_{im}$} \STATE $\sigma \leftarrow$ Identify the unique simplex in $K_A$ that creates the cycle at $b_{im}$; \STATE Find the corresponding bar $\delta = [b_{dom}, d_{dom}] \in B_{dom}$ generated by $\sigma$; \STATE $B_{mixup} \leftarrow B_{mixup} \cup { (d_{dom}, d_{im}) }$; \hfill \COMMENT{▷ Form pairs, or triples $(b_{im}, d_{im}, d_{dom})$} \ENDFOR

\RETURN $B_{mixup}$; \end{algorithmic} \end{algorithm}


4. Experimental Framework Structure (Original Notebooks)

The experimental framework is organized into three main stages, implemented across separate notebooks:

Stage 1: Order Parameter Feature Extraction

Notebook: Original Notebooks/Compute OP Features.ipynb

This stage calculates statistical mechanics descriptors based on the spatial distribution of cell types.

  • Input: Raw simulation data (Pos_*.dat, Types_*.dat)
  • Methodology:
    1. Angular Distribution Order Parameters (OPs):
      • $\Theta(\theta)$: Overall angular distribution
      • $\Theta_B(\theta), \Theta_O(\theta)$: Type-specific angular distributions (Blue/Orange)
      • Measures the directional alignment of neighbors.
    2. Radial Distribution Function (RDF):
      • $R(r)$: Overall pair correlation function
      • $R_B(r), R_O(r)$: Type-specific pair correlations
      • Measures the probability of finding a neighbor at distance $r$.
  • Output: Vector of concatenated OP values.

Stage 2: Topological Feature Extraction

Notebook: Original Notebooks/Compute TDA Features.ipynb

This stage computes topological descriptors using Persistent Homology.

  • Input: Raw simulation data
  • Methodology:
    1. Vietoris-Rips Filtration: Constructed on point clouds of:
      • Green cells (Type 2)
      • Red cells (Type 1)
      • All cells combined
    2. Ordinary Persistence Images (PIs):
      • Computes PIs for Homology dimensions $H_0, H_1, H_2$.
      • Parameters:
        • Weight Function: Linear Persistence ($w = p$)
        • Kernel: Gaussian ($\sigma=0.05$)
        • Resolution: Defined by pixel_size and max_eps
  • Output: npy files containing stacked Persistence Image vectors for each dimension and sub-population.

Stage 3: Evaluation (Embedding & Classification)

Notebook: Original Notebooks/Embedding and Classification.ipynb

This stage combines features and evaluates their discriminative power.

  • Input: Extracted OP vectors and TDA vectors.
  • Methodology:
    1. Dimensionality Reduction:
      • PCA (Principal Component Analysis)
      • t-SNE (t-Distributed Stochastic Neighbor Embedding) for 2D visualization.
    2. Classification:
      • Train classifiers (e.g., SVM, Random Forest) on the feature vectors.
      • Compare accuracy between OP features and TDA features.

4. Experimental Framework Structure and Methodology

The experimental framework is systematically organized into three distinct stages: Feature Extraction (Order Parameters), Topological Feature Extraction (TDA), and Comparative Evaluation using supervised learning.

Stage 1: Order Parameter (OP) Feature Extraction

Notebook: Original Notebooks/Compute OP Features.ipynb

This stage quantifies spatial ordering using statistical mechanics descriptors derived from particle positions and types.

  • Input: Raw simulation data (Pos_*.dat, Types_*.dat) representing cell centers and types (Red/Type 1, Green/Type 2).
  • Methodology:
    1. Angular Distribution Order Parameters ($\Theta$):
      • Captures the directional alignment of neighboring cells.
      • Computed for the overall population ($\Theta(\theta)$) and type-specific subpopulations ($\Theta_B(\theta)$ for Blue/Green, $\Theta_O(\theta)$ for Orange/Red).
    2. Radial Distribution Function ($R(r)$):
      • Measures the probability of finding a particle at a distance $r$ from a reference particle, normalized by ideal gas density.
      • Homotypic RDFs: $R_B(r)$ (Green-Green), $R_O(r)$ (Red-Red).
      • Heterotypic RDFs: $R(r)$ (All-All), potentially $R_{BO}(r)$ (Cross-correlations).
  • Output: A high-dimensional feature vector concatenated from discretized $\Theta(\theta)$ and $R(r)$ distributions.

Stage 2: Topological Feature Extraction

Notebook: Original Notebooks/Compute TDA Features.ipynb

This stage extracts robust topological features using Persistent Homology and vectorizes them into Persistence Images (PIs).

  • Input: Point clouds of Green cells, Red cells, and All cells.
  • Methodology:
    1. Vietoris-Rips Filtration:
      • Constructed independently for three point cloud sets: Green (Type 2), Red (Type 1), and Combined.
    2. Persistent Homology Calculation:
      • Computes persistence diagrams $D_k$ for homology dimensions $k \in {0, 1, 2}$.
      • $H_0$: Connected components.
      • $H_1$: Loops/Cycles.
      • $H_2$: Voids/Cavities.
    3. Persistence Image (PI) Vectorization:
      • Converts persistence diagrams into vector-space compatible images.
      • Metric Settings:
        • Weight Function: Linear persistence weighting ($w(b, p) = p$) to emphasize high-persistence features.
        • Kernel: Gaussian kernel with $\sigma = 0.05$.
        • Resolution: Pixel size $p_{res} = 0.1$.
      • Filtration Ranges:
        • Dimension 0: Birth $[0, 1]$, Persistence $[0, 10]$.
        • Dimension 1 & 2: Birth $[0, 10]$, Persistence $[0, 5]$ (Max $\epsilon = 10$).
  • Output: Stacked vectors of Persistence Images for $H_0, H_1, H_2$ across all subpopulations.

Stage 3: Evaluation (Embedding & Classification)

Notebook: Sixpack_Chroma_비교평가.ipynb (and Embedding and Classification.ipynb)

This phase evaluates the discriminative power of the extracted features using diverse classifiers and rigorous validation protocols.

  • Dimensionality Reduction:

    • PCA (Principal Component Analysis): Used for initial feature compression and visualization.
    • PHATE: Applied for manifold visualization to reveal intrinsic data geometry.
  • Classification Models: A suite of classifiers is trained to benchmark feature performance:

    1. Support Vector Machines (SVM):
      • RBF Kernel: Tested with $C \in {0.5, 1.0, 2.0}$ and gamma='scale'.
      • Linear Kernel: Tested with $C=1.0$.
    2. Random Forest:
      • Ensemble of 100 decision trees (n_estimators=100, random_state=42).
  • Validation Protocol:

    • Stratified k-Fold Cross-Validation: $k=5$ splits with shuffling (random_state=42) to ensure class balance in every fold.
  • Performance Metrics:

    1. Strict Accuracy: Standard classification accuracy.
    2. Soft Accuracy: Custom metric that accepts adjacent phases as correct predictions (accounting for continuous phase transitions in the simulation space).
    3. F1-Score: Weighted average F1-score to handle potential class imbalances.
    4. Reporting: Mean and Standard Deviation of F1-scores across folds to assess model stability.