Skip to content

IARCBiostat/ImputationReport

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImputationReport

What

This package provides a general function imputation_test() that will produce a simple report for different imputation methods for your specific data.

It takes complete data (i.e., features with no missing values) from your data and introduces random missing values at 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%. We then perform the following imputation methods:

  1. 1/5th of lowest detected value (5th)
  2. left censored missing data (lcmd)
  3. k-nearest neighbours (KNN; does not work with >50% missingness - mean is used for >50% missingness)
  4. probabilistic PCA (PPCA; does not work with complete sample missingness)
  5. median
  6. mean
  7. random forest (RF)

After imputation we compare the actual value with the imputed value using root-mean-square error (RMSE) and R2. NOTE: R2 is not reflective of prediction accuracy but can indicate the correlation between the actual and imputed values. RMSE is calculated as:

$\sqrt{\text{mean}\left((\text{actual} - \text{predicted})^2\right)}$

Where $actual$ is the value from the complete data prior to replacement with NA and $predicted$ is the imputed value of said missing data. The figure below gives values for all models at all % missing and the table shows the most accurate model for each % missing tested; the lower the RMSE the better the model fit; the higher the R2 the more correlated the actual and imputed values are.

How

A single function takes a dataframe, the location where you want to save the report, and a label (subtitle) to attach to the report. The report and a cache will be saved in the location provided. A simulated dataframe (data(data_features)) and a report generated from this are provided as an example. A preview of the report can be seen here

ImputationReport::imputation_test(
  data = ImputationReport::data_features, 
  output_dir = "inst/test/", 
  knit_root_dir = here::here(), 
  subtitle = "test")

References

  1. left-censored missing data imputation performed using {imputeLCMD}
  2. k nearest neighbours imputation performed using {impute}
  3. probabilistic PCA performed using {pcaMethods}
  4. median imputation performed using {missMethods}
  5. mean imputation performed using {missMethods}
  6. random forest imputation performed using {missForest}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors