Skip to content

..new is calculated wrong in lencode steps #243

@EmilHvitfeldt

Description

@EmilHvitfeldt

The unseen levels are calculated based on the mean of the coeficients rather than the mean of global data. This should be fixed to better reflect the literature.

Make sure that the documentation is changed accordingly.

This change will be easily backward compatible as it changes how new values will change only.

data <- data.frame(
  outcome = rnorm(1000) + c(rep(10, 900), rep(0, 100)),
  predictor = c(rep("Big", 900), rep(letters[1:10], each = 10))
)

library(tidyverse)

data |>
  count(predictor)
#>    predictor   n
#> 1        Big 900
#> 2          a  10
#> 3          b  10
#> 4          c  10
#> 5          d  10
#> 6          e  10
#> 7          f  10
#> 8          g  10
#> 9          h  10
#> 10         i  10
#> 11         j  10

data |>
  summarize(
    mean = mean(outcome),
    .by = predictor
  )
#>    predictor        mean
#> 1        Big  9.92621834
#> 2          a -0.12884918
#> 3          b  0.24802560
#> 4          c  0.12339453
#> 5          d  0.33307724
#> 6          e  0.08705590
#> 7          f  0.86433875
#> 8          g  0.42452332
#> 9          h  0.42548890
#> 10         i -0.07257279
#> 11         j -0.67403943

embed:::glm_coefs(y = select(data, outcome), x = pull(data, predictor))
#> # A tibble: 12 × 2
#>    ..level ..value
#>    <chr>     <dbl>
#>  1 a       -0.129
#>  2 b        0.248
#>  3 Big      9.93
#>  4 c        0.123
#>  5 d        0.333
#>  6 e        0.0871
#>  7 f        0.864
#>  8 g        0.425
#>  9 h        0.425
#> 10 i       -0.0726
#> 11 j       -0.674
#> 12 ..new    0.256

mean(data$outcome, trim = 0.1)
#> [1] 9.717217

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behaviortarget encodingTemporary label to group target encodingstidy-dev-day 🤓Tidyverse Developer Day rstd.io/tidy-dev-day

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions