-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Right now, we all have an ad-hoc method for loading the dataset. We need to unify the process. So is here the plan:
- a generic load_dataset() method that does the very basic processing of the dataset (eg cid > 30000)... an outcome can well be a
.pklfile to speed up the processing - we can compute/include
nbyesfeatures by default - instead of our own server, I propose to use git-lfs https://git-lfs.github.com/ and this repo: https://gitlab.com/FAMILIAR-project/tuxml-size-analysis-datasets/ in order to host files
- for making it reusable (not by copy and paste), maybe a pure Python script is possible
- a success story of the two previous points is that all our procedures (neural network, linear regression, tree-based methods, etc.) rely on the same load_dataset()
Reactions are currently unavailable
Metadata
Metadata
Labels
enhancementNew feature or requestNew feature or request