Generic load dataset

Right now, we all have an ad-hoc method for loading the dataset. We need to unify the process. So is here the plan: 
 - [x] a generic load_dataset() method that does the very basic processing of the dataset (eg cid > 30000)... an outcome can well be a `.pkl` file to speed up the processing 
 - [x] we can compute/include `nbyes` features by default 
 - [x] instead of our own server, I propose to use git-lfs https://git-lfs.github.com/ and this repo: https://gitlab.com/FAMILIAR-project/tuxml-size-analysis-datasets/ in order to host files
 - [x] for making it reusable (not by copy and paste), maybe a pure Python script is possible 
 - [ ] a success story of the two previous points is that all our procedures (neural network, linear regression, tree-based methods, etc.) rely on the same load_dataset()  



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic load dataset #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generic load dataset #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions