[ENH] move pairs_to_features, generate_kmer_vecs and other utilities to transformer interface #174#402
Open
purvanshjoshi wants to merge 1 commit intogc-os-ai:mainfrom
Conversation
…to transformer interface gc-os-ai#174
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #174. See also #106 and #170.
What does this implement/fix? Explain your changes.
This PR moves the feature extraction utilities used in
AptaNetand other parts of the codebase to theBaseTransforminterface, strictly following the design patterns defined in #106 and #170.Key changes:
KMerEncoder: Standardized k-mer frequency vectorizer inheriting fromBaseTransform.PSeAACTransformer: ABaseTransformcompliant wrapper for thePSeAAClogic, supportingpd.DataFrameinput/output.AptaNetFeatureExtractor: A composite transformer (capability:multivariate=True) that handles (aptamer, protein) sequence pairs.AptaNetPipelineto use the newAptaNetFeatureExtractordirectly, improving consistency with the library's transformer interface.generate_kmer_vecsandpairs_to_featuresin_aptanet_utils.pyto serve as wrappers for the new transformers, includingDeprecationWarninglogs._tagsandget_test_params()for all new transformers to ensure compatibility with theskbasetesting framework.What should a reviewer concentrate their feedback on?
_transformmethods in the new encoders to ensure they correctly handlepd.DataFrameinputs.capability:multivariatetag logic inAptaNetFeatureExtractor._aptanet_utils.pycorrectly relay parameters to the new transformers.Did you add any tests for the change?
I verified the implementation using a comprehensive verification script that checks:
Any other comments?
The implementation follows the
GreedyEncoderpattern as a template and ensures thatAptaNetfeatures are now first-class citizens in the transformer interface.PR checklist