Skip to content

fix(rna2vec): always append padded sequence when all triplets are unknown#373

Open
maniktyagi04 wants to merge 1 commit intogc-os-ai:mainfrom
maniktyagi04:fix/rna2vec-silent-drop-unknown-sequences
Open

fix(rna2vec): always append padded sequence when all triplets are unknown#373
maniktyagi04 wants to merge 1 commit intogc-os-ai:mainfrom
maniktyagi04:fix/rna2vec-silent-drop-unknown-sequences

Conversation

@maniktyagi04
Copy link
Copy Markdown

@maniktyagi04 maniktyagi04 commented Apr 11, 2026

Fixes #372

Summary

Remove the if any(converted): guard in rna2vec which caused sequences with all-zero values to be dropped.

This ensures the output always preserves the same number of rows as the input.

Before:

rna2vec(["NNN", "AAAC"], sequence_type="rna", max_sequence_length=4).shape
# (1, 4)  <- "NNN" was silently dropped

After:

rna2vec(["NNN", "AAAC"], sequence_type="rna", max_sequence_length=4).shape
# (2, 4)  <- all-zero row for "NNN", correct row for "AAAC"

…re unknown

Remove the if any(converted) guard that silently dropped sequences
where every overlapping triplet resolved to index 0. Such sequences now
produce an all-zero row, preserving 1-to-1 alignment between the input
list and the output array.

Fixes gc-os-ai#372
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rna2vec silently drops sequences when all triplets map to unknown tokens, breaking input/output length parity

1 participant