AJOLOTE (Automated Gene-curation for ORganism Metabolic Enhancement) is a comprehensive Python framework for curating and extending genome-scale metabolic models (GEMs) generated by Gapseq, using multi-omic data integration and orthology-based expansion.
Developed at Laboratorio de Microbiología Molecular y Biotecnología, UNMSM, Peru.
The axolotl (ajolote) represents biological transformation and adaptation - much like how this framework transforms raw Gapseq outputs into production-ready, biologically-grounded metabolic models.
Proven Performance (Shewanella xiamenensis LC6 - MAXIMALLY EXTENDED):
- 🧬 769 → 2,099 genes (Phase 2, +172% vs Gapseq)
- ⚛️ 2,314 → 3,815 reactions (Final, +65% vs Gapseq)
- 📊 15.3% → 41.8% genome coverage (+173%)
- ✅ 100% RT-qPCR genes (12/12 with real biochemistry)
- ✅ 93.75% terminal electron acceptors (15/16 from paper)
- ✅ 100% BES critical pathways (Serine bypass, Pentose phosphate, Threonine bypass, CymA, Menaquinone)
- ✅ 100% periplasm representation (104 reactions, electron transfer complete)
- ✅ 100% stoichiometric consistency (FBA optimal, validated)
- 🧪 100% maximally extended for Shewanella bioelectrochemistry applications
- ✅ Multi-omic Integration: Transcriptomics (RNA-seq) + Proteomics (DIA-MS)
- ✅ Intelligent Gene Extension: Expression-weighted selection from reference models
- ✅ Complete Gene Tracking: JSON-based tracking, zero genes lost
- ✅ Real Sequence Alignment: DIAMOND BLASTP with configurable thresholds
- ✅ ELEMENT Phase 0 Filtering: Family-specific parameters (id≥25-30%, cov≥35-50%)
- ✅ Triple Safety Verification: No dummy reactions, real metabolites, valid bounds
- ✅ Generalizable Framework: Works with ANY query + reference model pair
- ✅ Supplementary Integration: Literature-based reaction tables with full justification
- ✅ ID Mapping Resolution: Handles cross-database ID formats automatically
- 🎯 Philosophy Shift: "Not Correct, But Complete" — Discover pathways instead of enforcing them
- ✅ Multi-Compartment Flexibility: Metabolites unrestricted, exist in multiple locations simultaneously
- ✅ Bidirectional Transport: Complete freedom between compartments (no artificial restrictions)
- ✅ Multiple Reaction Pathways: Same process possible via different routes & enzymes
- ✅ FBA-Driven Discovery: Model selects optimal pathway(s) based on energetics, not assumptions
- ✅ Hypothesis Testing Ready: Test competing metabolic hypotheses using knockout & FVA analysis
- ✅ Fragment Circulation: Products can accumulate anywhere, supporting complex degradation networks
- ✅ Metabolite-Complete Reactions: Copy real stoichiometry, never placeholders
- ✅ MEMOTE-Ready: 100% stoichiometric consistency, <4 min validation
- ✅ Organism-Agnostic: Works with any organism + reference model
- ✅ Production Quality: Full documentation, reproducible methodology, reusable code
- ✅ Discovery-Ready: Framework for understanding unknown metabolic processes (Phase 10)
- Python 3.8+
- CobraPy ≥0.26.0
- Pandas ≥1.3.0
- NumPy ≥1.21.0
# Clone the repository
git clone https://github.com/jramirezgen/AJOLOTE.git
cd AJOLOTE
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install AJOLOTE
pip install -e .
# Or with all dependencies
pip install -e ".[dev,alignment,solvers]"# 1. Create working directory
mkdir my_organism_curation
cd my_organism_curation
# 2. Prepare your data:
# - gapseq_model.xml (output from Gapseq)
# - reference_model.xml (related organism, e.g., iLJ1162 for Shewanella)
# - transcriptomics.csv (gene expression matrix)
# - proteomics.tsv (protein abundance data, optional but recommended)
# 3. Copy config template
cp /path/to/AJOLOTE/config.yaml ./config_my_organism.yaml
nano config_my_organism.yaml # Edit for your organism
# 4. Run full curation pipeline
python -m ajolote.scripts.ajolote_pipeline \
--gapseq-model gapseq_model.xml \
--reference-model reference_model.xml \
--transcriptomics transcriptomics.csv \
--proteomics proteomics.tsv \
--config config_my_organism.yaml \
--output output/
# 5. Validate results
python -m ajolote.scripts.ajolote_analyze \
--model output/final_model.xml \
--metadata output/metadata.jsonFor detailed walkthrough, see START_HERE.md
Raw Gapseq Model (Low coverage, placeholder reactions)
↓
Phase 0: Data Loading
├─ Load Gapseq draft model
├─ Load reference model (evolutionary neighbor)
└─ Load transcriptomics & proteomics
↓
Phase 1: Gene Extension
├─ Expression-weighted gene selection
├─ ELEMENT Phase 0 orthology mapping (id≥25%, cov≥35%)
└─ Track all gene assignments (JSON)
↓
Phase 2: Reaction Completion
├─ Map genes to reference reactions
├─ Copy REAL stoichiometry (not ATP/ADP/Pi)
└─ Fill metabolites with biological accuracy
↓
Phase 3: Validation
├─ MEMOTE consistency check (<4 min)
├─ FBA viability test
└─ Growth rate prediction
↓
Production-Ready Model
(High coverage, real biochemistry, MEMOTE-validated)
- START_HERE.md - Quick start guide (5 min)
- docs/METHODOLOGY.md - Detailed curation strategy
- docs/BEST_PRACTICES.md - Recommendations for your organism
- CONTRIBUTING.md - Development guidelines
- examples/ - Complete case study (S. xiamenensis LC6)
- Data Integrity: Every gene assignment documented and tracked
- Biological Accuracy: Use real stoichiometry, not placeholders
- Reproducibility: Save scripts, parameters, and methodology
- Quality Assurance: MEMOTE validation, FBA viability, growth prediction
- Knowledge Preservation: Enable reuse on other organisms
Tested successfully on Shewanella xiamenensis LC6:
| Parameter | General | Specific (azo-reductases) |
|---|---|---|
| Identity threshold | ≥25% | ≥30% |
| Coverage threshold | ≥35% | ≥50% |
| Alignment method | Smith-Waterman (parasail) | Smith-Waterman |
| Reference | Same genus | Same genus |
Rationale: High permissiveness captures broader gene function while maintaining biological relevance.
- Gapseq model - Draft GEM from Gapseq (standard output)
- Reference model - Related organism (same genus preferred)
- Transcriptomics - RNA-seq counts (required)
- Proteomics - DIA-MS abundance (optional but recommended)
- Copy and edit
config.yamlfor your organism - Prepare data files
- Run
ajolote_pipeline.py - Validate with
ajolote_analyze.py
See docs/BEST_PRACTICES.md for:
- Parameter tuning
- Organism-specific considerations
- Troubleshooting guide
- Reference model selection
🚀 v1.1.0 FINAL - PHASES 1-9 COMPLETE & VALIDATED ✅✅✅
Last Update: 2026-05-29
All Tests Passing: ✅ Growth validated on ZZ medium (1.7806 mmol/gDW/hr)
Production Ready: ✅ MEMOTE validated, annotations enriched
Generalizable: ✅ Scripts and methodology applicable to any organism
Case Study: Shewanella xiamenensis LC6 (WF46 Complete)
- ✅ Gene extension: 769 → 2,099 genes (+172%)
- ✅ Reaction addition: 2,314 → 3,806 reactions (+65%)
- ✅ Genome coverage: 15.3% → 41.8% (+173%)
- ✅ All 12 RT-qPCR targets integrated with real biochemistry
- ✅ DIAMOND BLASTP with MR-1 reference (iLJ1162)
- ✅ 689 homologous reactions mapped + 168 energy reactions validated
- ✅ No energy cycles (stoichiometric consistency verified)
- ✅ Growth validated: 1.7806 mmol/gDW/hr (LB medium)
- ✅ COMPLETE: Model configured for ZZ medium (5g/L yeast extract)
- ✅ 17 amino acids (Glu, Ala, Asp, Gly, Leu, Lys, Val, Ser, Ile, Thr, Pro, Arg, Phe, Tyr, His, Met, Cys, Trp)
- ✅ 6 vitamins (Niacin, Pantothenate, Pyridoxine, Riboflavin, Thiamin, Folate)
- ✅ 8 minerals (Mg, Ca, Zn, Fe2+, Fe3+, Cu, Mn)
- ✅ Growth rate: 1.7806 mmol/gDW/hr (optimal)
- ✅ COMPLETE: MEMOTE snapshot executed
- ✅ Annotation enrichment: MetanetX, ModelSEED, eggNOG, UniProt integrated
- ✅ Gene annotations: 100% with locus tags
- ✅ Metabolite cross-references: InChI keys, PubChem, KEGG
- ✅ Stoichiometric validation: 91.9% balanced reactions
- ✅ Production model:
LC6_FULLY_ANNOTATED.xml(3,806 reactions)
Model File: LC6_WITH_ENERGY_REACTIONS_INTEGRATED.xml (3,806 reactions)
Generalizable To: Any organism with:
- Gapseq-generated model
- Reference model from related species
- Multi-omic data (transcriptomics ± proteomics)
Complete working example included:
- examples/LC6_PRODUCTION_READY_ALL_12_RTQPCR_COMPLETE.xml - Production model
- examples/phase2a_*.py - Tested, reusable scripts
- examples/README.md - Step-by-step instructions
Adapt these scripts for your organism.
Q: Is AJOLOTE specific to Shewanella? A: No! While developed on S. xiamenensis LC6, AJOLOTE is organism-agnostic. Any organism + reference model + multi-omic data works.
Q: What reference models can I use? A: Any related organism (same genus preferred). Examples:
- For Shewanella: Use S. oneidensis iLJ1162
- For other genera: Use closest relative with published model
- More distant = less confident predictions (adjust parameters)
Q: What's the minimum data needed? A: Minimally: Gapseq model + reference model + transcriptomics Recommended: Add proteomics for validation
Q: How long does curation take? A: ~30-60 minutes on modern laptop (depends on genome size)
Q: Can I use for other dyes/colorants? A: Yes! AJOLOTE is not limited to dyes. Works for any metabolic activity validated by:
- RNA-seq (gene expression)
- Proteomics (protein abundance)
- RT-qPCR targets
AJOLOTE is actively maintained. Contributing:
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest -v tests/
# Format code
black ajolote/
# Lint
flake8 ajolote/See CONTRIBUTING.md for guidelines.
If you use AJOLOTE, please cite:
@software{ajolote2026,
title={AJOLOTE: Automated Gene-curation for ORganism Metabolic Enhancement},
author={Ramírez, Jefferson},
year={2026},
institution={Laboratorio de Microbiología Molecular y Biotecnología, UNMSM},
url={https://github.com/jramirezgen/AJOLOTE},
note={Tested on Shewanella xiamenensis with multi-omic integration}
}Or use CITATION.cff.
Jefferson Ramírez
- Email: jefferson.ramirez1@unmsm.edu.pe / jramirezgen@gmail.com
- Institution: Laboratorio de Microbiología Molecular y Biotecnología
- University: Universidad Nacional Mayor de San Marcos (UNMSM), Peru
- COBRApy - Python FBA framework
- Gapseq - Automated GEM reconstruction
- MEMOTE - Model validation
- BioPython - Sequence analysis
MIT License - See LICENSE for details
- 📧 Email: jefferson.ramirez1@unmsm.edu.pe or jramirezgen@gmail.com
- 📍 Lab: Laboratorio de Microbiología Molecular y Biotecnología, UNMSM, Peru
- 🐛 Issues: GitHub Issues
- 💬 Discussion: GitHub Discussions
Developed at UNMSM's Laboratorio de Microbiología Molecular y Biotecnología with support from the transcriptomics, proteomics, and metabolomics research teams.
Built with passion for metabolic modeling. 🦎🧬
"Transforming Gapseq outputs into production-ready metabolic models through rigorous curation, multi-omic integration, and complete knowledge preservation."
© 2026 Jefferson Ramírez. MIT Licensed.