-
Notifications
You must be signed in to change notification settings - Fork 4
Description
From a recent e-mail :
Greetings!
I am Chris Swenson, a data scientist in the United States, working for a multi-state, non-profit hospital system called SSM Health. We have been using the R0 package in R for the past few months, and I have a question about how the smooth.Rt function works.
We’ve been calculating the R(0) / R(t) for specific regions where we have hospitals for the past few months, to alert the infection control staff of potential surges in COVID-19 cases. I’ve been using the data from Johns Hopkins to supply the county-level regional data as inputs to the estimation. (We’ve had to do some data corrections. For examples, some locations appear to skip weekends and reporting very high numbers on Mondays.)
It appears that if we’re not careful handling the data, some unexpected results may occur. There have been some situations where the data was not entered correctly, and it appears that the smooth.Rt function will exclude any incomplete periods (e.g., 7-day periods, in our case). Below is some code where I ran the estimate and smoothing for the data, removed 1 day and estimated, removed 1 week and estimated, and compared all three estimates. I also included the final table, and since the data is included in R, you may run the code to compare.
My question is: What happens in the smooth.Rt function when a period is incomplete, for example, the input data only has 5 days instead of 7 in a week? It appears the most recent week of data generates a much smaller estimate after smoothing. Should we be cautious when using the most recent data?
Provided replicable example
mGT <- generation.time("gamma", c(3,1.5))
TD <- estimate.R(Germany.1918, mGT, begin=as.integer(1), end=as.integer(length(Germany.1918)), methods="TD", nsim=100)
TD.weekly <- smooth.Rt(TD$estimates$TD, 7)
init <- TD.weekly$R
#TD.weekly
print(paste('Original: ', as.character(length(init))))
len <- length(Germany.1918)-1
test <- Germany.1918[1:len]
TD <- estimate.R(test, mGT, begin=as.integer(1), end=as.integer(length(test)), methods="TD", nsim=100)
TD.weekly <- smooth.Rt(TD$estimates$TD, 7)
new <- TD.weekly$R
#TD.weekly
print(paste('One Less Row: ', length(new)))
len <- length(Germany.1918)-7
test2 <- Germany.1918[1:len]
TD <- estimate.R(test2, mGT, begin=as.integer(1), end=as.integer(length(test2)), methods="TD", nsim=100)
TD.weekly <- smooth.Rt(TD$estimates$TD, 7)
new2 <- TD.weekly$R
#TD.weekly
print(paste('One Less Row: ', length(new2)))
df_init_full <- as.data.frame(init)
df_init <- as.data.frame(init)[1:length(init)-1,]
df_new <- as.data.frame(new)
df_new2 <- as.data.frame(new2)
df_compare <- cbind(df_init, df_new, df_new2)
Comparison of outputs
| Full Data | Data Missing 1 Day | Data Missing 1 Week | |
|---|---|---|---|
| 1 | 1.8784 | 1.8784 | 1.8784 |
| 2 | 1.5810 | 1.5810 | 1.5810 |
| 3 | 1.3569 | 1.3569 | 1.3569 |
| 4 | 1.1316 | 1.1316 | 1.1316 |
| 5 | 0.9615 | 0.9615 | 0.9615 |
| 6 | 0.8119 | 0.8119 | 0.8119 |
| 7 | 0.8045 | 0.8045 | 0.8045 |
| 8 | 0.8396 | 0.8396 | 0.8396 |
| 9 | 0.8543 | 0.8543 | 0.8543 |
| 10 | 0.8258 | 0.8258 | 0.8258 |
| 11 | 0.8544 | 0.8544 | 0.8544 |
| 12 | 0.9776 | 0.9776 | 0.9776 |
| 13 | 0.9517 | 0.9517 | 0.9517 |
| 14 | 0.9273 | 0.9273 | 0.9273 |
| 15 | 0.9635 | 0.9635 | 0.9635 |
| 16 | 0.9509 | 0.9509 | 0.9481 |
| 17 | 0.9827 | 0.9843 | 0.4994 |
| 18 | 0.5844 |
Note: I manually added week 18 from the original estimation for comparison.