Skip to content

Add query headers, parameterization, and dynamic species/lipid filters#27

Open
marvinm2 wants to merge 34 commits intomasterfrom
pr/query-enrichment
Open

Add query headers, parameterization, and dynamic species/lipid filters#27
marvinm2 wants to merge 34 commits intomasterfrom
pr/query-enrichment

Conversation

@marvinm2
Copy link
Collaborator

@marvinm2 marvinm2 commented Mar 9, 2026

Summary

  • Header enrichment: Added title, category, and description headers to all .rq query files following standardized conventions
  • Header validation: Test suite and CI lint script to enforce required headers on all queries
  • Species parameterization: Dynamic {{species}} autocomplete replaces hardcoded organism values across Lipids, Collaborations, Metadata, Datadump, and DSMN queries
  • Pathway & protein ID parameterization: Added {{pathwayId}} and {{proteinId}} params to D. General, H. Chemistry, E. Literature, and J. Authors queries
  • Lipid class dropdown: Added {{lipidClass}} parameter to Lipids community queries
  • Housekeeping: Added .gitignore, removed .planning/ from tracking

Test plan

  • Verify all .rq files have required headers (title, category, description)
  • Run python scripts/lint_headers.py to confirm CI lint passes
  • Run pytest tests/ for header validation tests
  • Load queries in Snorql UI and verify parameterized dropdowns render
  • Test species autocomplete returns results from SPARQL endpoint
  • Confirm .ttl-to-.rq extraction still works: python scripts/transformDotTtlToDotSparql.py

marvinm2 added 30 commits March 6, 2026 18:56
- Create categories.json mapping 24 query directories to 11 categories
- Split datasources/ into dedicated "Data Sources" category
- Add pytest suite validating vocabulary coverage and structure
- Document field order, format rules, and SNORQL parser behavior
- Cover multi-line description format (repeated prefix, not bare continuation)
- Include TTL metadata mapping reference and 3 complete examples
- Add extract_header() to read leading comment block from .rq files
- Add process_ttl_file() for importable per-file processing
- Guard against empty SPARQL output (skip write, print warning)
- Wrap glob/loop in __main__ block for clean imports
- All 6 unit tests pass, zero regression on existing .rq files
- test_all_rq_have_title: parametrized over 90 files (RED - no titles yet)
- test_all_rq_have_valid_category: validates against categories.json (RED)
- test_titles_are_unique: ensures no duplicate titles across files
- test_header_field_order: title must appear before category
- test_blank_line_separator: blank line required after structured headers
- SUMMARY.md with execution results and deviation log
- STATE.md updated to phase 2 position
- ROADMAP.md and REQUIREMENTS.md progress updated
- Add # title: and # category: headers to all 29 .rq files
- Categories: Metadata (root, datacounts, species), Data Sources (datasources)
- Remove old-style comments from datasources files, preserving query body
- Titles derived from query purpose in title case
- Add headers to 7 Collaborations queries (AOP-Wiki, MetaNetX, MolMeDB, neXtProt, Rhea/IDSM)
- Add headers to 4 General queries (genes, interactions, metabolites, ontology)
- Add headers to 5 Literature queries (PubMed references, interaction refs)
- Add headers to 3 Data Export queries (CyTargetLinker, ontology dump, species dump)
- Add # title: and # category: headers to all 25 .rq files
- All files categorized as Communities per categories.json
- Disambiguate duplicate filenames (allPathways, allProteins) with community name
- Remove old-style comments from files with existing headers
- Add headers to 7 Curation queries (metabolite/pathway quality checks)
- Add headers to 2 Chemistry queries (IDSM similarity, SMILES)
- Add headers to 4 DSMN queries (directed metabolic reactions network)
- Add headers to 4 Authors queries (contributors, first authors)
- All 183 tests pass GREEN, zero duplicate titles across 90 files
- Add execution summary for 54 files enriched with title/category headers
- Update STATE.md with plan progress and decisions
- Update ROADMAP.md phase progress
- SUMMARY.md documents 36 files enriched across 8 directories
- All 90 .rq files now have title and category headers
- 183 tests pass GREEN, META-01 and META-02 complete
- Add test_all_rq_have_description parametrized across 90 .rq files
- Update test_header_field_order to enforce category-before-description ordering
- Add SUMMARY.md documenting test additions and CI verification
- Update STATE.md position to phase 3, plan 1 complete
- Update ROADMAP.md and REQUIREMENTS.md progress
…ts queries

- Add descriptions to 4 root files (authors, linksets, metadata, prefixes)
- Add descriptions to 13 datacounts files (averageX, countX, linkoutCounts)
- Each near-duplicate query specifies its unique entity type
…pecies queries

- Add descriptions to 6 datasource files specifying each external database
- Add descriptions to 6 species files specifying each entity type
- All 29 A. Metadata files now have description headers
- Add descriptions to 4 General, 5 Literature, 3 Data Export, 7 Curation queries
- CyTargetLinker query describes Cytoscape app context
- Curation queries explain specific data quality issues detected
- Literature queries differentiated by scope (all refs, interaction refs, specific interaction)
- Add descriptions to 2 Chemistry, 4 DSMN, 4 Authors queries
- IDSM similarity search describes federation with IDSM/ChEBI and notes performance impact
- DSMN queries contextualized within directed small molecules network workflow
- Authors queries differentiated by scope (single pathway, first authors, all contributors)
- Summary for 29 files enriched with description headers
- STATE.md updated to plan 2 of 4 in phase 3
- ROADMAP.md progress updated
- Add # description: headers to all 7 C. Collaborations .rq files
- All 7 are federated queries with SERVICE clauses
- Each description names the external service (AOP-Wiki, MetaNetX, MolMeDB, neXtProt, IDSM)
- Each notes potential performance impact from external endpoint dependency
- MolMeDB pair differentiated: one finds pathways for a compound, other checks pathway subset
- neXtProt pair differentiated: cellular location vs mitochondrial protein focus
- SUMMARY.md with 2 task commits documented
- STATE.md updated to plan 4 of 4 with progress at 89%
- ROADMAP.md updated with phase 3 plan progress
- Add SUMMARY.md for plan 03-03
- Update STATE.md with metrics and decisions
- Update ROADMAP.md with phase 3 progress
- Create scripts/lint_headers.py checking title, category, description
- Add lint step to GitHub Actions workflow after TTL extraction
- All 90 .rq files pass validation
…mistry queries

- Add # param: pathwayId headers to 5 query files
- Replace hardcoded WP IDs with {{pathwayId}} placeholders
- Remove #Replace inline hints from 3 D. General files
- Replace $species with {{species}} in Example 3
- Remove XSD type cast from placeholder
- Update example title from Phase 4 preview to Phase 4
…DSMN queries

- Add species enum param header with 38 organisms to 5 query files
- Replace hardcoded species names with {{species}} placeholder
- Remove #Replace inline hint from PWsforSpecies.rq
…ture and J. Authors queries

- Add # param: pathwayId headers to 4 query files
- Add # param: proteinId header to referencesForSpecificInteraction.rq
- Replace hardcoded WP/UniProt IDs with {{placeholder}} syntax
- Preserve #filter inline comments in referencesForInteraction.rq
- Add species enum param header with 38 organisms to 3 Lipids query files
- Replace hardcoded Homo sapiens with {{species}} placeholder
- Preserve #Filter inline hints per design decision
- Add 04-01-SUMMARY.md with execution results
- Update STATE.md position to Phase 4 Plan 1
- Update ROADMAP.md progress
marvinm2 added 4 commits March 8, 2026 12:56
- Add 04-03-SUMMARY.md with execution results
- Update STATE.md position and progress
- Update ROADMAP.md plan progress
- Mark PARAM-02 and PARAM-03 requirements complete
…on params

- Replace hardcoded 38-species enum with string type (8 queries)
  Species are now fetched dynamically from the endpoint
- Add lipidClass enum param to 2 Lipids queries (FA/GL/GP/SP/ST/PR/SL/PK)
- Add species param to 3 Collaboration queries (AOP-Wiki, MolMeDB)
  replacing hardcoded "Homo sapiens"
@marvinm2 marvinm2 requested a review from DeniseSl22 March 12, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant