Skip to content

jramirezgen/ELEMENT

Repository files navigation

⚗️ ELEMENT

Enzymatic-Level Equation-based Mechanistic Elementary-route Network Toolkit

Version MATLAB Platform License Organism-agnostic

Organism-agnostic MATLAB platform for mechanistic metabolic route discovery,
multi-omics ranking, and phenotype simulation from any genome-scale model.


What ELEMENT does

Given a Genome-Scale Metabolic Model (GEM) and a substrate→product phenotype (e.g. azo dye decolorization, alcohol production, antibiotic degradation), ELEMENT automatically:

  1. Discovers K mechanistic route hypotheses from the GEM using sparse MILP with guaranteed stoichiometric balance (S·v = 0)
  2. Simulates elementary enzymatic kinetics — no Quasi-Steady-State Approximation
  3. Fits a population-level bridge linking molecular kinetics to observable phenotype
  4. Ranks hypotheses with a multi-omics composite score (C1–C10 + C_dyn)
  5. Iterates with accumulative cuts to explore diverse route topologies
  6. Exports publication-quality figures, tables and a full audit report

Mathematical Framework

Core Equations

To make the governing mathematics visible from the start, ELEMENT v2 uses two complementary core equations:

1. Physiological-Molecular Equation (EFM)

This equation couples molecular route flux with relative biomass to produce the observable phenotype:

Authorship note: The EFM formulation documented in ELEMENT is presented as an original methodological contribution of the author within the ELEMENT program. Reuse, adaptation, or discussion of this formulation should cite ELEMENT.

$$ \begin{aligned} D(t) &= y_{max},\bigl(1 - e^{-H_{pop}(t)}\bigr),\\ H_{pop}(t) &= \int_{0}^{t} v(\tau),\dfrac{X(\tau)}{X_0},d\tau \end{aligned} $$

EFM variables and parameters:

  • $D(t)$: normalized observable phenotype at time $t$; for example, decolorization fraction, transformed substrate fraction, or accumulated product scaled to 0-1.
  • $y_{max}$: maximum reachable observable amplitude. It maps the integrated molecular-population signal to the experimental ceiling.
  • $H_{pop}(t)$: cumulative population-weighted molecular activity up to time $t$. Larger values indicate more integrated biological work.
  • $v(\tau)$: route-specific molecular flux or effective transformation rate predicted by the elementary ODE at time $\tau$.
  • $X(\tau)$: biomass or effective population signal at time $\tau$.
  • $X_0$: initial biomass used to normalize the population trajectory.
  • $X(\tau)/X_0$: relative biological capacity. Values above 1 indicate growth relative to the inoculum; values below 1 indicate loss of active biomass.
  • $t$: current evaluation time.
  • $\tau$: dummy integration variable used to accumulate biological activity from 0 to $t$.

2. Biological Transformation Equation (ETB)

ELEMENT also reuses the WF92 surrogate as a biological transformation equation for biomass and condition-specific substrate/product targets:

Authorship note: The ETB formulation as implemented and parameterized in WF92/ELEMENT is presented as an original methodological contribution of the author within the WF92 and ELEMENT program stack. Reuse, adaptation, or discussion of this formulation should cite ELEMENT.

$$ \begin{aligned} D_{ETB}(t) &= \min(1, \max(0, \dfrac{A,T}{B,I})),\\ \phi &= \dfrac{V_{max},X_{eff}}{K_s + X_{eff}},t,\\ A &= a_f,\max(\phi,0)^{bf_f},\\ T &= 1 - e^{-k_f t},\\ B &= 100,[1 + e^{-k_2(\phi/100 - t_{10f},C_0^{\alpha_f})}],\\ I &= 1 + \left(\dfrac{C_0}{K_{if}}\right)^b \end{aligned} $$

ETB variables and parameters:

  • $D_{ETB}(t)$: transformed fraction predicted by the WF92 surrogate, bounded between 0 and 1.
  • $\phi$: cumulative effective transformation drive before explicit delay and inhibition penalties are applied.
  • $A$: amplitude term that converts effective capacity into transformation potential.
  • $T$: temporal activation term. It starts near 0 and approaches 1 as the system leaves the lag region.
  • $B$: delay and shape penalty term. Larger values suppress early transformation and shift the rise of the curve.
  • $I$: inhibition term driven by the initial target concentration.
  • $V_{max}$: maximum nominal transformation capacity in the surrogate.
  • $X_{eff}$: effective biological capacity used by the surrogate. Depending on the use case, it can represent biomass, active biomass, or another calibrated activity proxy.
  • $K_s$: saturation constant for effective capacity. It controls how quickly the effect of $X_{eff}$ approaches saturation.
  • $a_f$: scale factor for the amplitude term.
  • $bf_f$: curvature exponent for the amplitude term. Higher values make $A$ respond more nonlinearly to $\phi$.
  • $k_f$: activation rate constant controlling how fast $T$ approaches 1.
  • $k_2$: steepness parameter for the denominator transition in $B$.
  • $t_{10f}$: delay coefficient that sets when transformation starts to rise appreciably.
  • $C_0$: initial substrate or target concentration.
  • $\alpha_f$: exponent controlling how strongly the initial concentration shifts the delay term.
  • $K_{if}$: inhibition constant. Larger values weaken the penalty imposed by $C_0$.
  • $b$: inhibition exponent controlling how sharply inhibition grows as $C_0/K_{if}$ increases.
  • $C_{amp}$: amplitude used when the measured observable increases instead of decreases.

Operational mapping:

$$ C(t)= \begin{cases} C_0,\bigl(1 - D_{ETB}(t)\bigr), & \text{decreasing observable},\\ C_0 + C_{amp},D_{ETB}(t), & \text{increasing observable}. \end{cases} $$

The full skill specification is available at .github/skills/element-ode-engine-v2/SKILL.md.

1 — Elementary Enzymatic Kinetics (Phase 2)

ELEMENT adopts the framework of Ullah et al. (2006) for three-step elementary kinetics without Quasi-Steady-State Approximation:

$$E + S ;\overset{k_{\text{bind}}}{\underset{k_{\text{rev}}}{\rightleftharpoons}}; ES \xrightarrow{;k_{\text{cat}};} E + P$$

The ODE system solved by ode15s for each reaction in the route:

$$\frac{d[E]}{dt} = -k_{\text{bind}}[E][S] + (k_{\text{rev}} + k_{\text{cat}})[ES]$$

$$\frac{d[S]}{dt} = -k_{\text{bind}}[E][S] + k_{\text{rev}}[ES]$$

$$\frac{d[ES]}{dt} = k_{\text{bind}}[E][S] - (k_{\text{rev}} + k_{\text{cat}})[ES]$$

$$\frac{d[P]}{dt} = k_{\text{cat}}[ES]$$

Default parameters (organism-agnostic, justified for unbiased route ranking):

Parameter Value Meaning
$k_{\text{bind}}$ 1000 mM⁻¹h⁻¹ Binding rate constant
$k_{\text{rev}}$ 150 h⁻¹ Reverse (dissociation) rate
$k_{\text{cat}}$ 150 h⁻¹ Catalytic rate
$E_0$ 3×10⁻⁴ mM Initial enzyme pool
$[bg]$ 3×10⁻³ mM Metabolite background pool

Design rationale: Uniform $k_{\text{cat}}$ ensures hypothesis ranking is driven by stoichiometric topology, not by uncertain enzyme-specific constants. Implicit $K_m = k_{\text{rev}}/k_{\text{bind}} = 0.15$ mM is physiologically reasonable. See METHODS.md for full justification.


2 — Population-Level Bridge (Phase 3)

The molecular ODE produces species concentrations $P_{\text{mol}}$. The observable phenotype $D(t)$ (e.g. % decolorization) is linked via a bridge equation based on cumulative biological capacity:

$$ \begin{aligned} D(t) &= y_{max},\bigl(1 - e^{-H_{pop}(t)}\bigr),\\ H_{pop}(t) &= \lambda_{mol},\int_{0}^{t} \dfrac{X(\tau)}{X_0},d\tau \end{aligned} $$

  • $y_{max} \in [0,1]$: maximum decolorization asymptote (single free parameter)
  • $X(t)/X_0$: normalized biomass (OD600 timeseries)
  • $\lambda_{mol}$: molecular flux scale factor (fitted by non-linear least squares)
  • CI 95% via GEKKO/bootstrapping

For condition-specific target reconstruction, ELEMENT v2 also exposes the Biological Transformation Equation (ETB) reused from WF92 to describe biomass or substrate/product transformation:

$$D_{ETB}(t) = \min(1, \max(0, \dfrac{A,T}{B,I}))$$

$$ C(t)= \begin{cases} C_0,\bigl(1 - D_{ETB}(t)\bigr), & \text{decreasing observable},\\ C_0 + C_{amp},D_{ETB}(t), & \text{increasing observable}. \end{cases} $$

The full v2 skill specification is available at .github/skills/element-ode-engine-v2/SKILL.md.

The bridge R² is reported as C2 in the composite score.


3 — Multi-Omics Expression Coupling (Phase 4)

For each gene $g$ in a candidate route, the expression score $\mathcal{E}(g)$ is computed as:

$$\mathcal{E}(g) = \text{sign}(\Delta) \cdot \dfrac{|\log_2\text{FC}(g)|}{\log_2\text{FC}_{\text{cap}}} \cdot w_{\text{src}}$$

where:

  • $\log_2\text{FC}(g)$: log₂ fold-change from RNA-seq, proteomics (iBAQ ratio), or RT-qPCR
  • $\log_2\text{FC}_{\text{cap}} = 2.0$: clamp to avoid outlier dominance
  • $w_{\text{src}}$: source reliability weight (RT-qPCR > proteomics > RNA-seq)

The expression component C3 for a route is:

$$C_3 = \frac{1}{|G|} \sum_{g \in G} \mathcal{E}(g) \cdot \mathbf{1}[\text{direction matches}]$$

Missing data sources degrade gracefully — C3 is computed from available evidence.


4 — Composite Scoring System

$$\text{Score}_{\text{total}} = \sum_{k=1}^{10} w_k , C_k + w_{\text{dyn}} , C_{\text{dyn}}$$

subject to $\sum w_k + w_{\text{dyn}} = 1$ and NaN components redistribute weight proportionally.

Component Description Default $w$
C1 ODE convergence & substrate consumption 0.10
C2 Bridge R² (phenotype fit quality) 0.20
C3 Multi-omics expression agreement 0.25
C4 FVA rigidity (route flux variance) 0.08
C5 Phenotype shape (sigmoidal coherence) 0.12
C6 FSEOF flux-sum enrichment 0.05
C7 Reporter metabolite enrichment 0.05
C8 OptMDF thermodynamic feasibility 0.05
C9 Carbon continuity in route 0.05
C10 pFBA parsimony 0.05
C_dyn NMR dynamics R² (substrate + product) 0.10

Quality gate: Score ≥ 0.65PASS


5 — Iterative Route Discovery (MILP + Accumulative Cuts)

Routes are discovered by solving a sparse MILP on the stoichiometric matrix:

$$\min_{v} ; \mathbf{1}^\top |v| \quad \text{s.t.} \quad S \cdot v = 0, ; v_{\text{sub}} \leq -\epsilon, ; v_{\text{prod}} \geq \epsilon$$

Diversity is enforced by accumulative integer cuts after each iteration $i$:

$$\sum_{j \in \mathcal{R}_i} y_j \leq |\mathcal{R}_i| - 1 \quad \forall i \in {0 \ldots n_{\text{prev}}}$$

This guarantees that no previously discovered route is repeated across the entire search history.


Architecture

ELEMENT/
├── run_element.m                  ← Universal launcher (start here)
├── inputs/
│   ├── element_config.json        ← Single configuration file (edit this)
│   ├── gem/                       ← Genome-scale metabolic model (SBML)
│   ├── omics/
│   │   ├── proteomics/            ← iBAQ quantitative proteomics TSV
│   │   ├── rnaseq/                ← DESeq2 differential expression TSV
│   │   └── rtqpcr/                ← RT-qPCR validation panel TSV
│   ├── nmr/                       ← NMR timeseries CSVs (substrate + product)
│   ├── specialty_reactions_db.json   ← Extended reactions database
│   ├── dye_molecules_metadata.json
│   ├── electron_carriers_db.json
│   └── species_xref.json
├── matlab/                        ← 64 core MATLAB scripts
│   ├── run_iterative_loop_element.m   ← Main loop
│   ├── config_reference_v2.m          ← Config parser + defaults
│   ├── discover_routes_v2.m           ← MILP route discovery
│   ├── build_S_elementary.m           ← Stoichiometric matrix builder (MECATE v2)
│   ├── solve_3step.m                  ← ODE elementary kinetics solver
│   ├── run_phase3_bridge_efm.m        ← Bridge fitting
│   ├── run_phase4_multiomics.m        ← Multi-omics coupling
│   ├── run_phase6_scoring.m           ← Score computation
│   ├── run_phase7_export_docs_lc6.m   ← Export figures & report
│   └── ...                            ← All 64 scripts included
├── outputs/                       ← Generated automatically
│   ├── iter_0/ ... iter_N/
│   └── best/                      ← Best scoring iteration
├── docs/
│   ├── METHODS.md                 ← Full mathematical methods
│   ├── SCORE_COMPONENTS.md        ← Scoring system documentation
│   └── INPUT_FORMAT.md            ← Data format specifications
├── INSTALL.md
├── CITATION.cff
└── LICENSE

Quick Start

Prerequisites

Tool Version Required
MATLAB R2021b+ ✅ Required
COBRA Toolbox Latest ✅ Required
IBM CPLEX 12.10+ ⚡ Recommended (5–10× speedup)
GLPK (via COBRA) ✅ Free fallback

Installation

# 1. Clone ELEMENT
git clone https://github.com/jramirezgen/ELEMENT.git
cd ELEMENT

# 2. Clone COBRA Toolbox (if not already installed)
git clone https://github.com/opencobra/cobratoolbox.git

# 3. In MATLAB, initialize COBRA once:
#    >> addpath(genpath('cobratoolbox')); initCobraToolbox(false)

Running the included example (S. xiamenensis LC6, Methyl Orange)

% The config already points to the included GEM and omics data
% Just run:
run_element

Expected output:

[ELEMENT] ELEMENT v1.0 — Shewanella xiamenensis LC6
[ELEMENT] Substrate: MO → Sulfanilic acid / catechol
[ELEMENT] MAX_ITER=10 | K_ROUTES=3 | PATIENCE=5 | TARGET_C3=0.55
[ELEMENT] ===== ITERATION 0 (no cuts) =====
...
[ELEMENT] ★★★ BEST ITER: 11 (C3=0.4414) ★★★
[ELEMENT] Final score: 0.7199  [PASS ≥ 0.65]
[ELEMENT] Results in: outputs/

Adapting to your organism

  1. Edit inputs/element_config.json — change project, gem.path, substrate_product_pair, and omics paths
  2. Place your GEM in inputs/gem/ (SBML Level 3, FBC v2)
  3. Place your omics in inputs/omics/{proteomics,rnaseq,rtqpcr}/ (see docs/INPUT_FORMAT.md)
  4. Run run_element

Minimal run (GEM only, no omics — scores C3, C4, C7, C8, C10 will be NaN):

{
  "project":              { "id": "my_organism", "organism": "My Organism" },
  "gem":                  { "path": "inputs/gem/my_model.xml" },
  "substrate_product_pair": { "substrate_met_id": "cpd_X[e0]", "product_met_id": "cpd_Y[c0]" },
  "omics":                { "gene_rxn_mapping_tsv": "", "proteomics_8h_tsv": "" },
  "experimental_data":    { "decol_tsv": "inputs/my_timeseries.tsv" }
}

Biological Validation — S. xiamenensis LC6

The included example represents the first validated application of ELEMENT:

Metric Value
Best iteration 11 / 16 executed
Score total 0.7199 ✅ PASS
Bridge R² 0.868
C3 expression agreement 0.441
C_dyn (NMR) 0.814
φ decolorization 99.7%
GEM 2472 reactions, 2226 species, 2112 genes
Runtime ~24 min (CPLEX, 16-core)

Key mechanistic finding: In Methyl Orange degradation, only AzoR/ARS1 is strongly up-regulated (RNA-seq log₂FC = +6.05; RT-qPCR = +6.89), while Mtr/CymA electron transport chain genes are DOWN-regulated — rejecting the EET hypothesis and supporting direct NADH-mediated azo reduction → catechol → β-ketoadipate → TCA.


Pipeline Overview

flowchart TD
    A[GEM + Config JSON] --> B[Phase 0: GEM validation]
    B --> C[Phase 1: MILP route discovery\nS·v=0, K hypotheses, accumulative cuts]
    C --> D[Phase 2: Elementary ODE\nE+S⇌ES→E+P via ode15s]
    D --> E[Phase 3: Bridge fitting\nD=ymax·1-exp-Hpop]
    E --> F[Phase 4: Multi-omics coupling\nRNA-seq + Proteomics + RT-qPCR]
    F --> G[Phase 6: Composite scoring\nC1-C10 + C_dyn]
    G --> H{Score ≥ 0.65?}
    H -- YES --> I[Phase 7: Export figures + report\nPASS ✅]
    H -- NO --> J[Add accumulative cut\nNext iteration]
    J --> C
Loading

Input Data Formats

See docs/INPUT_FORMAT.md for complete specifications.

Decolorization timeseries (exp_decol.tsv):

time_h  replicate  abs_pct
0       1          0.000
3       1          0.312
6       1          0.687

Proteomics (proteomics_Xh.tsv):

gene_id         iBAQ_condition   iBAQ_control
EJGNCN_00123    1.45e6           2.01e5

RNA-seq (rnaseq.tsv):

gene_id         log2FoldChange   padj
EJGNCN_00123    2.34             0.001

Citation

If you use ELEMENT in your research, please cite:

@software{ramirez2025element,
  author    = {Ramirez Quispe, Jefferson David and
               Ramirez Roca, Pablo Sergio and
               Reyes, Bryan Cesar and
               Susuki Pacheco, Asami and
               Huamantalla Huamán, Magerlyn and
               Herrera Quiñones, Josemaria and
               Beraun Gasco, Piero},
  title     = {{ELEMENT}: Enzymatic-Level Equation-based Mechanistic
               Elementary-route Network Toolkit},
  year      = {2025},
  version   = {1.0.0},
  publisher = {GitHub},
  url       = {https://github.com/jramirezgen/ELEMENT},
  institution = {Universidad Nacional Mayor de San Marcos (UNMSM),
                 Laboratorio de Microbiología Molecular y Biotecnología,
                 Grupo de Investigación: Genómica Funcional de
                 Microorganismos y Biorremediación}
}

See also CITATION.cff.


Authors

Jefferson David Ramirez Quispe, B.Sc.1, Pablo Sergio Ramirez Roca, Ph.D.1, Bryan Cesar Reyes, M.Sc.1, Asami Susuki Pacheco, B.Sc.1, Magerlyn Huamantalla Huamán, M.Sc.1, Josemaria Herrera Quiñones, B.Sc.1, and Piero Beraun Gasco, M.Sc.1

1 Laboratorio de Microbiología Molecular y Biotecnología, Grupo de Investigación en Genómica Funcional de Microorganismos y Biorremediación, Universidad Nacional Mayor de San Marcos (UNMSM), Lima, Perú


License

Apache License 2.0 — see LICENSE.


ELEMENT v1.0.0 · Universidad Nacional Mayor de San Marcos · 2025

About

Organism-agnostic MATLAB platform for mechanistic metabolic route discovery, multi-omics ranking and phenotype simulation from genome-scale models.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors