Skip to content

Add metadata-driven Croissant parsers for HGNC and BindingDB#386

Draft
cbizon wants to merge 6 commits intomasterfrom
croissant_parser
Draft

Add metadata-driven Croissant parsers for HGNC and BindingDB#386
cbizon wants to merge 6 commits intomasterfrom
croissant_parser

Conversation

@cbizon
Copy link
Copy Markdown
Contributor

@cbizon cbizon commented Apr 2, 2026

Summary

  • add a metadata-driven parser framework that combines Croissant metadata with ORION parser specs
  • integrate HGNC as the default metadata-driven parser and add BindingDB as a metadata-driven Croissant loader
  • fix live BindingDB parsing issues so legacy and metadata-driven parsers match on the 202603 archive

Validation

  • uv run pytest tests/test_metadata_driven_parser.py
  • uv run pytest tests/test_metadata_driven_parser.py tests/test_source_metadata.py
  • ran legacy and metadata-driven HGNC on the live HGNC dataset; both produced 29,405 nodes and 34,075 edges
  • ran legacy and metadata-driven BindingDB on the same BindingDB_All_202603_tsv.zip archive; both produced 1,200,861 nodes and 1,892,606 edges with identical node and edge SHA-256 hashes

Notes

  • BindingDB comparison report: /tmp/orion_parser_runs_20260402/bindingdb_compare_202603/report.json
  • HGNC run artifacts: /tmp/orion_parser_runs_20260402/hgnc_shared

@github-actions github-actions bot added the Biological Context QC Require validation of biological context to ensure accuracy and consistency label Apr 2, 2026
@cbizon
Copy link
Copy Markdown
Contributor Author

cbizon commented Apr 2, 2026

Check it out @EvanDietzMorris

I think that the current metadata for going data->graph highlights the concern that the metadata will need to be just as complicated as code. So not 100% sure it's a win, but maybe that can be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Biological Context QC Require validation of biological context to ensure accuracy and consistency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant