Skip to content

[BUG] fix load_from_rcsb to return MoleculeLoader instead of raw Structure#395

Open
kunal14901 wants to merge 5 commits intogc-os-ai:mainfrom
kunal14901:fix/load-from-rcsb-return-moleculeloader
Open

[BUG] fix load_from_rcsb to return MoleculeLoader instead of raw Structure#395
kunal14901 wants to merge 5 commits intogc-os-ai:mainfrom
kunal14901:fix/load-from-rcsb-return-moleculeloader

Conversation

@kunal14901
Copy link
Copy Markdown

Reference Issues/PRs
#393

What does this implement/fix? Explain your changes.
load_from_rcsb() was returning a raw BioPython Structure object while every other loader (load_1gnh, load_1brq, load_5nu7, load_pfoa) returns MoleculeLoader. This meant you couldn't call .to_df_seq() on proteins loaded from RCSB. Changed it to return MoleculeLoader like the rest. Also handles the .ent to .pdb rename since BioPython downloads with .ent extension.

What should a reviewer concentrate their feedback on?
The .ent to .pdb rename — open to a better approach if there is one.

Did you add any tests for the change?
Updated test_online_loader.py to check for MoleculeLoader instead of Structure.

Any other comments?
None

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.

Copilot AI review requested due to automatic review settings April 12, 2026 22:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes load_from_rcsb() to return a MoleculeLoader (consistent with other dataset loaders) so callers can use loader APIs like .to_df_seq() on downloaded RCSB structures.

Changes:

  • Update load_from_rcsb() to return MoleculeLoader instead of a raw BioPython Structure.
  • Add .ent.pdb renaming so MoleculeLoader can recognize the downloaded file type.
  • Update the online loader test to expect a MoleculeLoader.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pyaptamer/datasets/_loaders/_online_databank.py Wrap RCSB-downloaded file in MoleculeLoader and rename .ent to .pdb.
pyaptamer/datasets/tests/test_online_loader.py Adjust assertion to check load_from_rcsb() returns MoleculeLoader.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyaptamer/datasets/_loaders/_online_databank.py
loader = load_from_rcsb(pdb_id)
assert isinstance(loader, MoleculeLoader), (
f"Expected a MoleculeLoader, got {type(loader)}"
)
Comment on lines +4 to +8
from pathlib import Path

from Bio.PDB import PDBList

from pyaptamer.utils import pdb_to_struct
from pyaptamer.data.loader import MoleculeLoader
Copy link
Copy Markdown
Collaborator

@satvshr satvshr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some comments

def load_from_rcsb(pdb_id, overwrite=False):
"""
Download a PDB file from the RCSB Protein Data Bank and parse it into a `Structure`.
Download a PDB file from the RCSB Protein Data Bank and load it as a MoleculeLoader.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly use backticks here, also MoleculeLoader is not a loader, it is our in-memory data format, so please change all instances where you've mentioned it as a loader

pdbl = PDBList()
pdb_file_path = pdbl.retrieve_pdb_file(
pdb_id, file_format="pdb", overwrite=overwrite
pdb_file_path = Path(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add Path?

@kunal14901
Copy link
Copy Markdown
Author

Hey @satvshr, Thanks for the review; I made all the changes and updated the test too.
Now also checks that to_df_seq() returns a non-empty DataFrame.


def load_from_rcsb(pdb_id, overwrite=False):
"""
Download a PDB file from the RCSB Protein Data Bank and parse it into a `Structure`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let it be: parse it into a MoleculeLoader, make no other change

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; updated in 2ce22bf.

Comment on lines +24 to +25
mol : MoleculeLoader
A `MoleculeLoader` object for the downloaded structure.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mol : MoleculeLoader? Just keep it MoleculeLoader

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@satvshr Fixed, thanks for catching that.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test to ensure the pdb file is getting downloaded in the desired path

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @satvshr
Added in 0297336
The test now checks os.path.exists(loader.path) to verify that the file was downloaded.

@kunal14901
Copy link
Copy Markdown
Author

@satvshr made some changes. Please approve it.

Copy link
Copy Markdown
Collaborator

@satvshr satvshr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunal14901
Copy link
Copy Markdown
Author

Hi @satvshr, Thanks for the approval! Could you please merge it when you get a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants