LLM-assisted curation system for media ingredient ontology mappings.
MediaIngredientMech provides a structured workflow for curating media ingredient ontology mappings with full audit trails. It manages 995 mapped and 136 unmapped ingredients aggregated from 10,657 media recipes in CultureMech.
Key Features:
- Ingredient-centric data model with LinkML schemas
- Interactive CLI for ontology mapping curation
- LLM assistance tracking in curation events
- Comprehensive validation (schema + ontology terms)
- Full audit trail for all curation actions
- YAML-based data storage with version control
# Clone repository
cd /Users/marcin/Documents/VIMSS/ontology/KG-Hub/KG-Microbe/MediaIngredientMech
# Install with dev dependencies
just install
# Generate LinkML dataclasses
just gen-schema# Import from CultureMech (995 mapped + 136 unmapped = 1,131 total)
just import-data
# Validate imported data
just validate-all# Create snapshot before curation
just snapshot
# Launch interactive curation CLI
just curate
# Generate progress report
just reportData Sources:
CultureMech/output/mapped_ingredients.yaml→ 995 mapped ingredientsCultureMech/output/unmapped_ingredients.yaml→ 136 unmapped ingredients
Schema:
IngredientRecord: Root class with mapping status, synonyms, curation historyOntologyMapping: CHEBI/FOODON term mappings with quality ratingsCurationEvent: Audit trail with LLM assistance tracking
Workflow:
- Import data from CultureMech
- Curate unmapped ingredients (sorted by occurrence count)
- Validate ontology terms via OAK/OLS
- Record curation events with timestamps
- Export validated mappings back to CultureMech
MediaIngredientMech/
├── src/mediaingredientmech/
│ ├── schema/ # LinkML schemas
│ ├── curation/ # Core curation logic
│ ├── validation/ # Schema & ontology validators
│ ├── export/ # Report generation
│ └── utils/ # YAML I/O, ontology client
├── data/
│ ├── curated/ # Working data (version controlled)
│ └── snapshots/ # Timestamped backups (excluded from git)
├── scripts/ # CLI tools
├── tests/ # Test suite
└── docs/ # Documentation
- Curation Guide - Step-by-step curation workflow
- Schema Reference - Data model documentation
- Workflows - Common operations and integration
# Run tests with coverage
just test-cov
# Format code
just format
# Lint code
just lint
# Run all quality checks
just checkCC0-1.0 - Public Domain Dedication