Options that matter: status

There are several "sources":
 * supervised learning algorithms with feature importances (eg random forest)
 * supervised learning algorithms with coefficient (eg Lasso)
 * supervised learning algorithms with partial dependence (eg gradient boosting)
 * (linear) correlation between options and size (eg Pearson correlation) 
 * specialized options we used last summer: https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config

feature importances and coefficients are serialized in CSV files (`feature_importanceXXX.csv`) but **it should be noted that such files are not necessarily up-to-date, we are using very low training set size, and basic hyperparameters** 

There is also https://github.com/TuxML/size-analysis/blob/master/correlations_vmlinux.csv

About https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config https://github.com/TuxML/ProjetIrma/commit/87895c88419f7cd5040c8a25c1b781b4f2892757 some comments:
 * it has been obtained through random forest when learning over ~60K configurations...
 * we review some of them (using the doc) to see whether they indeed have an effect
 * we did all this effort summer 2018 (july)
 * we then perform `randconfig` with such pre-set options (from 90K-120K cid) 
The list can be simplified (CONFIG_64BIT=y CONFIG_AIC7XXX_BUILD_FIRMWARE=n
CONFIG_AIC79XX_BUILD_FIRMWARE=n
CONFIG_WANXL_BUILD_FIRMWARE=n are mainly here for avoiding compilation errors and has nothing to have with size...) 
```
CONFIG_DEBUG_INFO=n
CONFIG_DEBUG_INFO_SPLIT=n
CONFIG_UBSAN_SANITIZE_ALL=n
CONFIG_GCOV_PROFILE_ALL=n
CONFIG_DEBUG_INFO_REDUCED=n
CONFIG_RANDOMIZE_BASE=n
CONFIG_X86_NEED_RELOCS=n
CONFIG_KASAN_OUTLINE=n
CONFIG_UBSAN_ALIGNMENT=n
CONFIG_USB_SERIAL_OPTICON=n
CONFIG_KASAN=n
CONFIG_KCOV_INSTRUMENT_ALL=n
CONFIG_XFS_DEBUG=n
CONFIG_MAXSMP=n
CONFIG_FW_LOADER_USER_HELPER=n
CONFIG_STRICT_MODULE_RWX=n
CONFIG_DEBUG_INFO_DWARF4=n
CONFIG_LOCK_STAT=n
CONFIG_X86_VSMP=n
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_OPTIMIZE_INLINING=y
CONFIG_SLOB=y
CONFIG_PCI=y
```

We had issue to "specialize" SLOB https://bugzilla.kernel.org/show_bug.cgi?id=202437 (we do have a workaround though now). We may need to increase the frequency of this option to further investigate its effect 

@llesoil made an interesting script to "merge" all lists: https://github.com/TuxML/size-analysis/blob/master/vote_feature_selection.ipynb https://nbviewer.jupyter.org/github/TuxML/size-analysis/blob/master/vote_feature_selection.ipynb

Finally I've started something to compute options frequency in the dataset (options coming from Linux documentation): 
https://github.com/TuxML/size-analysis/blob/master/feature_frequency.ipynb
(update, full frequency here: https://github.com/TuxML/size-analysis/blob/master/options_frequencydataset_wrt_linuxdoc.csv) 

 - [ ] script to merge all list 
 - [ ] update of feature_importance files 
 - [ ] computation of "overlapping" between doc. and our merged list, as well as differences in both sides (doc => merged list: what options do we miss? ||| merged_list => doc: what options are not in the doc)  
 - [ ] for some options, we will recompile new configurations to explore more diverse values 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options that matter: status #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Options that matter: status #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions