Skip to content

[BUG] fix: rna2vec preserves short sequences as zero rows (closes #363)#375

Open
Blackphoenix-15 wants to merge 2 commits intogc-os-ai:mainfrom
Blackphoenix-15:fix/rna2vec-short-sequence-drop-363
Open

[BUG] fix: rna2vec preserves short sequences as zero rows (closes #363)#375
Blackphoenix-15 wants to merge 2 commits intogc-os-ai:mainfrom
Blackphoenix-15:fix/rna2vec-short-sequence-drop-363

Conversation

@Blackphoenix-15
Copy link
Copy Markdown

Reference Issues/PRs

Fixes #363

What does this implement/fix? Explain your changes.

Removed the if any(converted): guard in rna2vec that silently dropped
sequences shorter than 3 characters. Short sequences now return a zero-padded
row instead, preserving alignment between x_apta, x_prot, and y in APIDataset.

Before: rna2vec(["AAAA", "AA", "CCCC"], max_sequence_length=5) returned shape (2, 5)
After: rna2vec(["AAAA", "AA", "CCCC"], max_sequence_length=5) returns shape (3, 5)

What should a reviewer concentrate their feedback on?

The removal of the if any(converted): block in _rna.py and the dedented
truncate/pad logic that replaces it.

Did you add any tests for the change?

none

Any other comments?

PR checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] rna2vec drops short sequences, causing silent dataset misalignment

1 participant