This package provides a general function imputation_test() that will produce a simple report for different imputation methods for your specific data.
It takes complete data (i.e., features with no missing values) from your data and introduces random missing values at 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%. We then perform the following imputation methods:
- 1/5th of lowest detected value (5th)
- left censored missing data (lcmd)
- k-nearest neighbours (KNN; does not work with >50% missingness - mean is used for >50% missingness)
- probabilistic PCA (PPCA; does not work with complete sample missingness)
- median
- mean
- random forest (RF)
After imputation we compare the actual value with the imputed value using root-mean-square error (RMSE) and R2. NOTE: R2 is not reflective of prediction accuracy but can indicate the correlation between the actual and imputed values. RMSE is calculated as:
Where NA and
A single function takes a dataframe, the location where you want to save the report, and a label (subtitle) to attach to the report. The report and a cache will be saved in the location provided. A simulated dataframe (data(data_features)) and a report generated from this are provided as an example. A preview of the report can be seen here
ImputationReport::imputation_test(
data = ImputationReport::data_features,
output_dir = "inst/test/",
knit_root_dir = here::here(),
subtitle = "test")- left-censored missing data imputation performed using
{imputeLCMD} - k nearest neighbours imputation performed using
{impute} - probabilistic PCA performed using
{pcaMethods} - median imputation performed using
{missMethods} - mean imputation performed using
{missMethods} - random forest imputation performed using
{missForest}