-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Labels
bugan unexpected problem or unintended behavioran unexpected problem or unintended behaviortarget encodingTemporary label to group target encodingsTemporary label to group target encodingstidy-dev-day 🤓Tidyverse Developer Day rstd.io/tidy-dev-dayTidyverse Developer Day rstd.io/tidy-dev-day
Description
The unseen levels are calculated based on the mean of the coeficients rather than the mean of global data. This should be fixed to better reflect the literature.
Make sure that the documentation is changed accordingly.
This change will be easily backward compatible as it changes how new values will change only.
data <- data.frame(
outcome = rnorm(1000) + c(rep(10, 900), rep(0, 100)),
predictor = c(rep("Big", 900), rep(letters[1:10], each = 10))
)
library(tidyverse)
data |>
count(predictor)
#> predictor n
#> 1 Big 900
#> 2 a 10
#> 3 b 10
#> 4 c 10
#> 5 d 10
#> 6 e 10
#> 7 f 10
#> 8 g 10
#> 9 h 10
#> 10 i 10
#> 11 j 10
data |>
summarize(
mean = mean(outcome),
.by = predictor
)
#> predictor mean
#> 1 Big 9.92621834
#> 2 a -0.12884918
#> 3 b 0.24802560
#> 4 c 0.12339453
#> 5 d 0.33307724
#> 6 e 0.08705590
#> 7 f 0.86433875
#> 8 g 0.42452332
#> 9 h 0.42548890
#> 10 i -0.07257279
#> 11 j -0.67403943
embed:::glm_coefs(y = select(data, outcome), x = pull(data, predictor))
#> # A tibble: 12 × 2
#> ..level ..value
#> <chr> <dbl>
#> 1 a -0.129
#> 2 b 0.248
#> 3 Big 9.93
#> 4 c 0.123
#> 5 d 0.333
#> 6 e 0.0871
#> 7 f 0.864
#> 8 g 0.425
#> 9 h 0.425
#> 10 i -0.0726
#> 11 j -0.674
#> 12 ..new 0.256
mean(data$outcome, trim = 0.1)
#> [1] 9.717217Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugan unexpected problem or unintended behavioran unexpected problem or unintended behaviortarget encodingTemporary label to group target encodingsTemporary label to group target encodingstidy-dev-day 🤓Tidyverse Developer Day rstd.io/tidy-dev-dayTidyverse Developer Day rstd.io/tidy-dev-day