-
Notifications
You must be signed in to change notification settings - Fork 1
Description
There are several "sources":
- supervised learning algorithms with feature importances (eg random forest)
- supervised learning algorithms with coefficient (eg Lasso)
- supervised learning algorithms with partial dependence (eg gradient boosting)
- (linear) correlation between options and size (eg Pearson correlation)
- specialized options we used last summer: https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config
feature importances and coefficients are serialized in CSV files (feature_importanceXXX.csv) but it should be noted that such files are not necessarily up-to-date, we are using very low training set size, and basic hyperparameters
There is also https://github.com/TuxML/size-analysis/blob/master/correlations_vmlinux.csv
About https://github.com/TuxML/ProjetIrma/blob/dev/compilation/tuxml.config TuxML/ProjetIrma@87895c8 some comments:
- it has been obtained through random forest when learning over ~60K configurations...
- we review some of them (using the doc) to see whether they indeed have an effect
- we did all this effort summer 2018 (july)
- we then perform
randconfigwith such pre-set options (from 90K-120K cid)
The list can be simplified (CONFIG_64BIT=y CONFIG_AIC7XXX_BUILD_FIRMWARE=n
CONFIG_AIC79XX_BUILD_FIRMWARE=n
CONFIG_WANXL_BUILD_FIRMWARE=n are mainly here for avoiding compilation errors and has nothing to have with size...)
CONFIG_DEBUG_INFO=n
CONFIG_DEBUG_INFO_SPLIT=n
CONFIG_UBSAN_SANITIZE_ALL=n
CONFIG_GCOV_PROFILE_ALL=n
CONFIG_DEBUG_INFO_REDUCED=n
CONFIG_RANDOMIZE_BASE=n
CONFIG_X86_NEED_RELOCS=n
CONFIG_KASAN_OUTLINE=n
CONFIG_UBSAN_ALIGNMENT=n
CONFIG_USB_SERIAL_OPTICON=n
CONFIG_KASAN=n
CONFIG_KCOV_INSTRUMENT_ALL=n
CONFIG_XFS_DEBUG=n
CONFIG_MAXSMP=n
CONFIG_FW_LOADER_USER_HELPER=n
CONFIG_STRICT_MODULE_RWX=n
CONFIG_DEBUG_INFO_DWARF4=n
CONFIG_LOCK_STAT=n
CONFIG_X86_VSMP=n
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_OPTIMIZE_INLINING=y
CONFIG_SLOB=y
CONFIG_PCI=y
We had issue to "specialize" SLOB https://bugzilla.kernel.org/show_bug.cgi?id=202437 (we do have a workaround though now). We may need to increase the frequency of this option to further investigate its effect
@llesoil made an interesting script to "merge" all lists: https://github.com/TuxML/size-analysis/blob/master/vote_feature_selection.ipynb https://nbviewer.jupyter.org/github/TuxML/size-analysis/blob/master/vote_feature_selection.ipynb
Finally I've started something to compute options frequency in the dataset (options coming from Linux documentation):
https://github.com/TuxML/size-analysis/blob/master/feature_frequency.ipynb
(update, full frequency here: https://github.com/TuxML/size-analysis/blob/master/options_frequencydataset_wrt_linuxdoc.csv)
- script to merge all list
- update of feature_importance files
- computation of "overlapping" between doc. and our merged list, as well as differences in both sides (doc => merged list: what options do we miss? ||| merged_list => doc: what options are not in the doc)
- for some options, we will recompile new configurations to explore more diverse values