[BUG] fix load_from_rcsb to return MoleculeLoader instead of raw Structure#395
[BUG] fix load_from_rcsb to return MoleculeLoader instead of raw Structure#395kunal14901 wants to merge 5 commits intogc-os-ai:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes load_from_rcsb() to return a MoleculeLoader (consistent with other dataset loaders) so callers can use loader APIs like .to_df_seq() on downloaded RCSB structures.
Changes:
- Update
load_from_rcsb()to returnMoleculeLoaderinstead of a raw BioPythonStructure. - Add
.ent→.pdbrenaming soMoleculeLoadercan recognize the downloaded file type. - Update the online loader test to expect a
MoleculeLoader.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
pyaptamer/datasets/_loaders/_online_databank.py |
Wrap RCSB-downloaded file in MoleculeLoader and rename .ent to .pdb. |
pyaptamer/datasets/tests/test_online_loader.py |
Adjust assertion to check load_from_rcsb() returns MoleculeLoader. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| loader = load_from_rcsb(pdb_id) | ||
| assert isinstance(loader, MoleculeLoader), ( | ||
| f"Expected a MoleculeLoader, got {type(loader)}" | ||
| ) |
| from pathlib import Path | ||
|
|
||
| from Bio.PDB import PDBList | ||
|
|
||
| from pyaptamer.utils import pdb_to_struct | ||
| from pyaptamer.data.loader import MoleculeLoader |
satvshr
left a comment
There was a problem hiding this comment.
Thanks for the PR! Left some comments
| def load_from_rcsb(pdb_id, overwrite=False): | ||
| """ | ||
| Download a PDB file from the RCSB Protein Data Bank and parse it into a `Structure`. | ||
| Download a PDB file from the RCSB Protein Data Bank and load it as a MoleculeLoader. |
There was a problem hiding this comment.
Kindly use backticks here, also MoleculeLoader is not a loader, it is our in-memory data format, so please change all instances where you've mentioned it as a loader
| pdbl = PDBList() | ||
| pdb_file_path = pdbl.retrieve_pdb_file( | ||
| pdb_id, file_format="pdb", overwrite=overwrite | ||
| pdb_file_path = Path( |
|
Hey @satvshr, Thanks for the review; I made all the changes and updated the test too. |
|
|
||
| def load_from_rcsb(pdb_id, overwrite=False): | ||
| """ | ||
| Download a PDB file from the RCSB Protein Data Bank and parse it into a `Structure`. |
There was a problem hiding this comment.
Let it be: parse it into a MoleculeLoader, make no other change
| mol : MoleculeLoader | ||
| A `MoleculeLoader` object for the downloaded structure. |
There was a problem hiding this comment.
Why mol : MoleculeLoader? Just keep it MoleculeLoader
There was a problem hiding this comment.
Add a test to ensure the pdb file is getting downloaded in the desired path
|
@satvshr made some changes. Please approve it. |
|
Hi @satvshr, Thanks for the approval! Could you please merge it when you get a chance? |
Reference Issues/PRs
#393
What does this implement/fix? Explain your changes.
load_from_rcsb()was returning a raw BioPythonStructureobject while every other loader (load_1gnh,load_1brq,load_5nu7,load_pfoa) returnsMoleculeLoader. This meant you couldn't call.to_df_seq()on proteins loaded from RCSB. Changed it to returnMoleculeLoaderlike the rest. Also handles the.entto.pdbrename since BioPython downloads with.entextension.What should a reviewer concentrate their feedback on?
The
.entto.pdbrename — open to a better approach if there is one.Did you add any tests for the change?
Updated
test_online_loader.pyto check forMoleculeLoaderinstead ofStructure.Any other comments?
None
pre-commit install.