Skip to content

Generic load dataset #10

@FAMILIAR-project

Description

@FAMILIAR-project

Right now, we all have an ad-hoc method for loading the dataset. We need to unify the process. So is here the plan:

  • a generic load_dataset() method that does the very basic processing of the dataset (eg cid > 30000)... an outcome can well be a .pkl file to speed up the processing
  • we can compute/include nbyes features by default
  • instead of our own server, I propose to use git-lfs https://git-lfs.github.com/ and this repo: https://gitlab.com/FAMILIAR-project/tuxml-size-analysis-datasets/ in order to host files
  • for making it reusable (not by copy and paste), maybe a pure Python script is possible
  • a success story of the two previous points is that all our procedures (neural network, linear regression, tree-based methods, etc.) rely on the same load_dataset()

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions