Draft
Conversation
fix column names for extra feature df
More diagnostic logging
more diagnostics log final dataframe show nan rows track df size
row logging log fixes
Collaborator
Author
|
Current error is dealing with plotting. Details
|
Collaborator
Author
|
Incorporated changes from |
Collaborator
Author
|
A problem for the current implementation is file size. For example, some data sets can be up to 6 GB if we use the current TSV format. We have to optimize loading of this file so that the memory and IO usage is limited. |
Collaborator
Author
|
For the extra feature support, I am creating a Rust library to process the large files in a memory-optimized manner. The current flow is described in the diagram below.
graph TD
A["Expression Data"]
B["Extra Feature Files"]
C["Identify All Unique Genes"]
D["Save unified gene order to pkl file"]
E["Realign extra features with unified gene order"]
F["Save each feature as pkl file"]
A --> C
B --> C
C --> D
C --> E
E --> F
TODO:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds support for additional features to FunMap. Users can use the
extra_feature_filekey in their config to specify a TSV file that contains features for a gene pair in any scale.New Features
only_extra_features = truein your config YAML file to ignore expression dataStatus
Implementation notes
all_averagecurve in the LLR plot does not use the extra features, only the cohort informationOther Changes
pyproject.tomlinstead ofsetup.py.maturinfor Rust library integrationData Format
The format for the
extra_feature_fileis below. The first column is the first gene in the pair, the second column is the second gene. The following columns are the feature values. If a feature does not a value for the specified pair, it should have a value ofNA.Columns are tab-separated.