RTutor/conf_mean_sol.Rmd at master · posnerab/RTutor · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
#< ignore
    Save the file as `conf_mean_sol.Rmd`.
    Run setup chunk in RStudio.

```{r ""}
library(RTutor)
setwd(getwd())
ps.name = "conf_mean"; sol.file = paste0(ps.name,"_sol.Rmd")
libs = c("tidyverse",
         "readxl",
         "epiR") # character vector of all packages you load in the problem set

name.rmd.chunks(sol.file)
create.ps(sol.file = sol.file, ps.name = ps.name, libs = libs, addons = "quiz")


# The following line directly shows the problem set
# in the browser
show.ps(ps.name, launch.browser = TRUE,  auto.save.code = FALSE, sample.solution = FALSE)
```
#>

## Exercise Overview

<h1>R Tutorial -- Calculating the Confidence Interval for a Population Mean</h1>


Author: Xander Posner, Epi/Biostat MPH '20

If you get stuck, use the <strong>hint</strong> in each code chunk first. If you're still having trouble, head to Piazza and post your question. GSIs will be available to respond at <strong>these times</strong>.

<h3>Tools Used in this Tutorial</h3>

<p>In this tutorial, we will: </p>

<ul>
<li>Find the mean and standard deviation of our variable of interest and calculate the standard error the estimator of the population mean</li>
<li>Use R functions to find critical t* values for 95% and 99% confidence intervals</li>
<li>Use two different methods in R to produce confidence intervals for a mean</li>
</ul>


<h3>R Packages Used in this Tutorial</h3>

<ul>
<li><code>"tidyverse"</code></li>
<li><code>"readr"</code></li>
<li><code>"readxl"</code></li>
<li><code>"epiR"</code></li>
<li><code>"knitr"</code></li>
<li><code>"tinytex"</code></li>
<li><code>"rdrop2"</code></li>
</ul>

<p>When practicing on your own in RStudio, you will first need to install each of these packages using the <code>install.packages("")</code> function, putting the name(s) of the package(s) inside the quotation marks like this:</p>

<div id="block-quotes" class="section level2">
<blockquote>
<p><code>install.packages("tidyverse")</code> OR <code>install.packages(c("tidyverse", "epiR"))</code></p>
</blockquote>
</div>

<p>You <strong>won't</strong> need to <i>install</i> packages more than once on the same computer.</p>

<p>After installing, you must load each package library <strong>separately</strong>, using the <code>library()</code> function, putting the name of each package inside the parentheses without quotation marks like this:</p>

<div id="block-quotes" class="section level2">
<blockquote>
<p><code>library(tidyverse)</code></p>
</blockquote>
</div>

<p>You <strong>will</strong> need to <i>load</i> packages each time you need to use them, using the <code>library()</code> function whenever you initiate a new R session.</p>

<p><strong>You don't need to install any packages for this tutorial</strong>. Just press <strong>check</strong> to run any chunk with a <code>library()</code> function in it.</p>


<h3>R Functions Used in this Tutorial</h3>

<ul>
<li><code>library()</code> to load each package library</li>
<li><code>read_excel()</code> to read in Excel spreadsheets</li>
<li><code>head()</code> to view the first few lines of your dataset</li>
<li><code>nrow()</code> to view the number of rows in your dataset</li>
<li><code>summary()</code> to produce summary statistics</li>
<li><code>mean()</code> to find the mean of a given variable</li>
<li><code>sd()</code> to find the standard deviation of a given variable</li>
<li><code>qt()</code> to find critical t* values to calculate confidence intervals</li>
<li><code>epi.conf()</code> to calculate the mean, standard error, and lower and upper confidence interval bounds for you</li>


Click the "Go to next exercise..." button to continue.


## Exercise 1 -- Get Started

<p>Some code chunks are finished for you; just press <strong>check</strong> to proceed. Other chunks require you to modify and/or add code before pressing <strong>check</strong> to move through the tutorial.</p>

<strong>Make sure you run each chunk in order!</strong>

<h3>Loading and getting to know your data</h3>

a) This chunk loads the package libraries, reads in the Excel spreadsheet using the <code>read_excel()</code> function, and assigns the spreadsheet to an object called <code>bmi_adol_men</code> in the global environment.
```{r "1_a"}
#< task
## Don't modify this code; just press check to continue.
library(tidyverse)
library(readxl)
library(epiR)


bmi_adol_men <- read_excel("bmi_young_men.xlsx", col_names = T)
#>
```


b) View just the first few lines of the <code>bmi_adol_men</code> dataset by using the <code>head()</code> function.
<p><code>head()</code> is a really useful function both when starting analysis and getting acquainted with your dataset or if you just need a quick reminder of the column names.</p>
```{r "1_b"}
#< task
head(bmi_adol_men)
#>
```

<p>The <code>head()</code> function by default shows you the first six lines, but you can change this if you'd like, using the <code>n = </code> argument inside the <code>head()</code> function.</p>


c) View the number of rows in your dataset, which here amounts to our sample size, $n$.
```{r "1_c"}
#< task
nrow(bmi_adol_men)
#>
```

<h3>Summarizing your data</h3>

<p>R has many ways to view summary statistics. The simplest is the <code>summary()</code> function.</p>

<p>The dollar sign <code>$</code> is used to index or "choose" column names that represent different variables in your dataset with this syntax:</p>

<div id="block-quotes" class="section level2">
<blockquote>
<p><code><"DATASET OBJECT">$<"COLUMN NAME"></code></p>
</blockquote>
</div>

d) Run the chunk below to see what summary statistics the <code>summary()</code> function produces.
```{r "1_d"}
#< task
summary(bmi_adol_men$bmi)
#>
```


#< award "Summary"
You summarized it! Click 'Go to next exercise...' to continue.
#>


## Exercise 2 -- Calculating Confidence Intervals for a Mean

a) Find the mean and standard deviation of <code>bmi_adol_men$bmi</code> using the <code>mean()</code> and <code>sd()</code> functions.

<p>The first is done for you; mean BMI is already assigned to an object called <code>mean_bmi</code>. Modify the code below it to assign the object <code>std_dev_bmi</code> to the standard deviation of <code>bmi_adol_men$bmi</code> using the <code>sd()</code> function.</p>
```{r "2_a"}
#< task
bmi_adol_men <- read_excel("bmi_young_men.xlsx", col_names = T)
mean_bmi <- mean(bmi_adol_men$bmi)
mean_bmi
#>
#< task_notest
std_dev_bmi <- "<<<<<<<YOUR CODE HERE>>>>>>>"
std_dev_bmi
#>
std_dev_bmi <- sd(bmi_adol_men$bmi)
#< hint
display("Delete the quotation marks and everything inside them. Use the sd() function to find the standard deviation of BMI and assign it to the std_dev_bmi object.")
#>
```


b) Find the standard error of the sample estimator of the population mean using this formula: $$\frac{σ}{\sqrt{n}}$$ where $σ$ is the standard deviation and $n$ is your sample size.
```{r "2_b"}
#< task_notest
se_bmi <- "<<<<<<<YOUR CODE HERE>>>>>>>"
se_bmi
#>
se_bmi <- sd(bmi_adol_men$bmi)/sqrt(50)
se_bmi
#< hint
display("Use the <code>sd()</code> function you used in the last chunk, and make sure you use the right sample size.")
#>
```


#< quiz "Alpha"
question: What would be your alpha for a 95% CI?
sc:
    - 0.95
    - 5
    - 0.025
    - 0.05*
success: Great job! Alpha is your confidence level, equal to 1-0.XX, where XX is your XX% confidence interval.
failure: Try again. Alpha is your confidence level, equal to 1-0.XX, where XX is your XX% confidence interval.
#>

#< quiz "p"
question: What is the probability, p, or area under the t-distribution curve for a 95% CI?
sc:
    - 0.95
    - 0.025
    - 0.975*
    - 0.05
success: Nice work! p is your area under the t-distribution curve with n-1 degrees of freedom (where n is your sample size) that's to the LEFT of your given confidence level for a one-sided or one-tailed t-test.
failure: Try again. p is your area under the t-distribution curve that's to the LEFT of your given confidence level for a one-sided or one-tailed t-test.
#>


c) Modify the code below to find the critical $t^*$ value for a 95% confidence interval (CI) using the <code>qt()</code> function.
```{r "2_c"}
#< task_notest
t_star_95 <- qt(p = "<<<<<<<YOUR CODE HERE>>>>>>>",
                df = "<<<<<<<YOUR CODE HERE>>>>>>>",
                lower.tail = "<<<<<<<TRUE OR FALSE>>>>>>>")
t_star_95
#>
t_star_95 <- qt(p = 0.975, df = 49, lower.tail = TRUE)
t_star_95
#< hint
display("p is the probability or 'area' under a t-distribution curve with df, degrees of freedom, equal to the sample size, n, minus 1 (n-1). p is equal to 1-(alpha/2) for a one-tailed t-test, where alpha is your confidence level (0.05 or 5% for a 95% CI). The lower.tail argument is known as a 'logical' operator and can only take the boolean values TRUE or FALSE. Set lower.tail equal to TRUE to get the area to the left of your probability, p, or FALSE to find the area to the right of p.")
#>

```


#< quiz "Alpha90"
question: What would be your alpha for a 90% CI?
sc:
    - 0.10*
    - 10
    - 0.05
    - 0.90
success: Great job! Alpha is your confidence level, equal to 1-0.XX, where XX is your XX% confidence interval.
failure: Try again. Alpha is your confidence level, equal to 1-0.XX, where XX is your XX% confidence interval. What is the "confidence level" for a 90% CI?
#>


#< quiz "p90"
question: What would be your p in the <code>qt()</code> function above for a 90% CI?
sc:
    - 0.90
    - 0.975
    - 0.95*
    - 0.10
success: Nice work! p is your probability under the t-distribution curve with n-1 degrees of freedom. It is equal to 1-alpha/2 where alpha is your confidence level.
failure: Try again. p is your probability under the t-distribution curve with n-1 degrees of freedom. It is equal to 1-alpha/2 where alpha is your confidence level. *Hint* You found alpha in the quiz above.
#>