Skip to content

Options that matter: status #14

@FAMILIAR-project

Description

@FAMILIAR-project

There are several "sources":

  • supervised learning algorithms with feature importances (eg random forest)
  • supervised learning algorithms with coefficient (eg Lasso)
  • supervised learning algorithms with partial dependence (eg gradient boosting)
  • (linear) correlation between options and size (eg Pearson correlation)
  • specialized options we used last summer: https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config

feature importances and coefficients are serialized in CSV files (feature_importanceXXX.csv) but it should be noted that such files are not necessarily up-to-date, we are using very low training set size, and basic hyperparameters

There is also https://github.com/TuxML/size-analysis/blob/master/correlations_vmlinux.csv

About https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config TuxML/ProjetIrma@87895c8 some comments:

  • it has been obtained through random forest when learning over ~60K configurations...
  • we review some of them (using the doc) to see whether they indeed have an effect
  • we did all this effort summer 2018 (july)
  • we then perform randconfig with such pre-set options (from 90K-120K cid)
    The list can be simplified (CONFIG_64BIT=y CONFIG_AIC7XXX_BUILD_FIRMWARE=n
    CONFIG_AIC79XX_BUILD_FIRMWARE=n
    CONFIG_WANXL_BUILD_FIRMWARE=n are mainly here for avoiding compilation errors and has nothing to have with size...)
CONFIG_DEBUG_INFO=n
CONFIG_DEBUG_INFO_SPLIT=n
CONFIG_UBSAN_SANITIZE_ALL=n
CONFIG_GCOV_PROFILE_ALL=n
CONFIG_DEBUG_INFO_REDUCED=n
CONFIG_RANDOMIZE_BASE=n
CONFIG_X86_NEED_RELOCS=n
CONFIG_KASAN_OUTLINE=n
CONFIG_UBSAN_ALIGNMENT=n
CONFIG_USB_SERIAL_OPTICON=n
CONFIG_KASAN=n
CONFIG_KCOV_INSTRUMENT_ALL=n
CONFIG_XFS_DEBUG=n
CONFIG_MAXSMP=n
CONFIG_FW_LOADER_USER_HELPER=n
CONFIG_STRICT_MODULE_RWX=n
CONFIG_DEBUG_INFO_DWARF4=n
CONFIG_LOCK_STAT=n
CONFIG_X86_VSMP=n
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_OPTIMIZE_INLINING=y
CONFIG_SLOB=y
CONFIG_PCI=y

We had issue to "specialize" SLOB https://bugzilla.kernel.org/show_bug.cgi?id=202437 (we do have a workaround though now). We may need to increase the frequency of this option to further investigate its effect

@llesoil made an interesting script to "merge" all lists: https://github.com/TuxML/size-analysis/blob/master/vote_feature_selection.ipynb https://nbviewer.jupyter.org/github/TuxML/size-analysis/blob/master/vote_feature_selection.ipynb

Finally I've started something to compute options frequency in the dataset (options coming from Linux documentation):
https://github.com/TuxML/size-analysis/blob/master/feature_frequency.ipynb
(update, full frequency here: https://github.com/TuxML/size-analysis/blob/master/options_frequencydataset_wrt_linuxdoc.csv)

  • script to merge all list
  • update of feature_importance files
  • computation of "overlapping" between doc. and our merged list, as well as differences in both sides (doc => merged list: what options do we miss? ||| merged_list => doc: what options are not in the doc)
  • for some options, we will recompile new configurations to explore more diverse values

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions