-
Notifications
You must be signed in to change notification settings - Fork 10
Expand file tree
/
Copy pathREADME.Rmd
More file actions
401 lines (265 loc) · 15.9 KB
/
Copy pathREADME.Rmd
File metadata and controls
401 lines (265 loc) · 15.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
## DoSStoolkit <img src="man/figures/stitches_hex_all.png" align="right" height="200" />
The DoSS Toolkit is a bunch of self-paced modules to help you learn and use R.
We all know that R is a critical part of applied statistics and data science these days, but it can have a steep learning curve and be intimidating to get started with.
The [Department of Statistical Sciences](http://www.utstat.utoronto.ca/) (DoSS) toolkit is a free series of open source online modules written by undergraduates, that their fellow students and the public can use to learn the essentials of R.
## How to use this resource
### If you have never used R before
You use this resource by running R code! This may sound intimidating if you've never used R before, so we've made a video that walks through what you need to do.
<iframe width="560" height="315" src="https://www.youtube.com/embed/7mymJEEXux4?start=3" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
Get started by going to R Studio Cloud - https://rstudio.cloud - and creating an account. When you've signed up, start a new project, and copy-paste the code below to install packages. (If you already have R and R Studio working on your local computer then you don't have to use R Studio Cloud, you can install the packages on your local machine instead.)
```{r, eval = FALSE}
install.packages('tidyverse')
install.packages('remotes')
install.packages('opendatatoronto')
remotes::install_github("rstudio-education/gradethis")
```
Then you can install the `DoSStoolkit`:
```{r, eval = FALSE}
remotes::install_github("RohanAlexander/DoSStoolkit")
```
You'll use the function `run_tutorial` to run each module. At the moment we have nine modules. So you can pick one to start with. For instance, if you wanted to run the 'hello world' module then run:
```{r, eval = FALSE}
learnr::run_tutorial("hello_world", package = "DoSStoolkit")
```
### If you already have R and R Studio on your computer
You can install `DoSStoolkit` from [GitHub](https://github.com/) with:
```{r, eval = FALSE}
# install.packages("devtools")
devtools::install_github("RohanAlexander/heapsofpapers")
# install.packages('tidyverse')
# install.packages('remotes')
# install.packages('opendatatoronto')
remotes::install_github("rstudio-education/gradethis")
```
Then you can install the `DoSStoolkit`:
```{r, eval = FALSE}
remotes::install_github("RohanAlexander/DoSStoolkit")
```
You'll use the function `run_tutorial` to run each module. At the moment we have ten modules. So you can pick one to start with. For instance, if you wanted to run the 'hello world' module then run:
```{r, eval = FALSE}
learnr::run_tutorial("hello_world", package = "DoSStoolkit")
```
## Content
We have ten modules. A complete collection is here:
```{r, eval = FALSE}
learnr::run_tutorial("hello_world", package = "DoSStoolkit")
learnr::run_tutorial("operating_in_an_error_prone_world", package = "DoSStoolkit")
learnr::run_tutorial("holding_the_chaos_at_bay", package = "DoSStoolkit")
learnr::run_tutorial("hand_me_my_plyrs", package = "DoSStoolkit")
learnr::run_tutorial("totally_addicted_to_base", package = "DoSStoolkit")
learnr::run_tutorial("he_was_a_d8er_boi", package = "DoSStoolkit")
learnr::run_tutorial("to_ggplot_or_not_to_ggplot", package = "DoSStoolkit")
learnr::run_tutorial("r_marky_markdown", package = "DoSStoolkit")
learnr::run_tutorial("git_outta_here", package = "DoSStoolkit")
learnr::run_tutorial("indistinguishable_from_magic", package = "DoSStoolkit")
```
### Hello world!
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("hello_world", package = "DoSStoolkit")
```
Module content:
- Why I love R, by Liza Bolton.
- Setting up RStudio, by Annie Collins.
- Getting to know what is what - console, terminal, etc, by Annie Collins.
- A fun hello world exercise, by Annie Collins.
- Another fun hello world exercise, by Shirley Deng.
- R Weekly newsletter, R Ladies, R meetups, by Annie Collins.
### Operating in an error prone world
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("operating_in_an_error_prone_world", package = "DoSStoolkit")
```
Module content:
- Why I love R, by Monica Alexander.
- Getting help is normal, by Michael Chong.
- Using Google and Stack Overflow, by Michael Chong.
- Stack overflow, by Annie Collins.
- How to problem solve when your code doesn't work, by Michael Chong.
- Making reproducible examples, by Marija Pejcinovska.
- How to make the most of R's cryptic error messages, by Shirley Deng.
### Holding the chaos at bay
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("holding_the_chaos_at_bay", package = "DoSStoolkit")
```
Module content:
- Why I love R, by Samantha-Jo Caetano.
- R projects and `setwd()`, by Isaac Ehrlich.
- Folder set-up, by Isaac Ehrlich.
- Writing comments, by Isaac Ehrlich.
- `install.packages()`, by Haoluan Chen.
- `install_github()`, by Haoluan Chen.
- `library()`, by Mariam Walaa.
- `update.packages()`, by Mariam Walaa.
- `read_csv()`, by Marija Pejcinovska.
- `read_table()`, `read_dta()`, and other data types, by Isaac Ehrlich.
### Hand me my plyrs
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("hand_me_my_plyrs", package = "DoSStoolkit")
```
Module content:
- Why I love R, by Sabrina Sixta.
- What is the tidyverse?, by Yena Joo.
- The pipe, by Mariam Walaa.
- `select()`, by Yena Joo.
- `filter()`, by Shirley Deng.
- `group_by()` and `ungroup()`, by Matthew Wankiewicz.
- `summarise()`, by Mariam Walaa.
- `arrange()`, by Isaac Ehrlich.
- `mutate()`, by Haoluan Chen.
- `pivot_wider()` and `pivot_longer()`, by Annie Collins.
- `rename()`, by Mariam Walaa.
- `count()` and `uncount()`, by Annie Collins.
- `slice()`, by Annie Collins.
- `c()`, `matrix()`, `data.frame()`, and `tibble()`, by Matthew Wankiewicz.
- `length()`, `nrow()`, and `ncol()`, by Isaac Ehrlich.
### Totally addicted to base
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("totally_addicted_to_base", package = "DoSStoolkit")
```
Module content:
- Why I love R, by Rohan Alexander.
- `mean()`, `median()`, `sd()`, `lm()`, and `summary()`, by Mariam Walaa.
- `function()`, by Haoluan Chen.
- `for()` and `while()`, by Yena Joo.
- `if()`, `if_else()` and `case_when()`, by Haoluan Chen.
- `c()`, `seq()`, `seq_along()`, and `rep()`, by Matthew Wankiewicz.
- `hist()`, `plot()`, and `boxplot()`, by Yena Joo.
### He was a d8er boi
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("he_was_a_d8er_boi", package = "DoSStoolkit")
```
Module content:
- `head()`, `tail()`, `glimpse()`, and `summary()`, written by Haoluan Chen.
- `paste()`, `paste0()`, `glue::glue()` and `stringr`, written by Marija Pejcinovska
- `names()`, `rbind()` and `cbind()`, written by Isaac Ehrlich.
- `left_join()`, `anti_join()`, `full_join()`, etc, written by Haoluan Chen.
- Looking for missing data, written by Mariam Walaa.
- `set.seed()`, `runif()`, `rnorm()`, and `sample()`, written by Haoluan Chen.
- Simulating datasets for regression, written by Mariam Walaa.
- Advanced mutating and summarising, written by Mariam Walaa.
- Tidying up datasets, written by Mariam Walaa.
- `pull()`, `pluck()`, and `unnest()`, by Isaac Ehrlich.
- `forcats` and factors, written by Matthew Wankiewicz.
- More on strings, written by Annie Collins.
- Regular expressions, written by Shirley Deng.
- Working with dates, written by Mariam Walaa.
- `janitor`, written by Mariam Walaa.
- `tidyr`, written by Mariam Walaa.
### To ggplot or not to ggplot
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("to_ggplot_or_not_to_ggplot", package = "DoSStoolkit")
```
Module content:
- `ggplot2::ggplot()`, by Shirley Deng.
- Bar charts, by Matthew Wankiewicz.
- Histograms, by Haoluan Chen.
- Scatter plots, by Haoluan Chen.
- Various useful options, by Yena Joo.
- Saving graphs, by Yena Joo.
### R Marky Markdown and the Funky Docs
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("r_marky_markdown", package = "DoSStoolkit")
```
Module content:
- Introduction to R Markdown, written by Shirley Deng.
- Top Matter: Title, Date, Author, Abstract, written by Yena Joo.
- Tables: `kable`, `kableextra`, `gt`, written by Yena Joo.
- Multiple plots with `patchwork`, written by Michael Chong
- References and Bibtex, written by Yena Joo.
- PDF outputs, written by Yena Joo.
- `here::here()` and filepaths, written by Matthew Wankiewicz.
### Git outta here
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("git_outta_here", package = "DoSStoolkit")
```
Module content:
- What is version control and GitHub?, written by Mariam Walaa.
- Git: pull, status, add, commit, push, written by Mariam Walaa.
- Branches in GitHub, written by Matthew Wankiewicz.
- Dealing with Conflicts, written by Matthew Wankiewicz.
- Putting (G)it All together in RStudio, written by Matthew Wankiewicz.
### Indistinguishable from magic
How to run this module:
```{r, eval = FALSE}
learnr::run_tutorial("indistinguishable_from_magic", package = "DoSStoolkit")
```
Module content:
- Coding style, written by Marija Pejcinovska.
- Static maps with `ggmap`, by Annie Collins.
- Writing R Packages, written by Matthew Wankiewicz.
- Getting started with Blogdown, written by Annie Collins.
- Getting started with Shiny, written by Matthew Wankiewicz.
## Contributors
- Annie Collins is an undergraduate student in the Department of Mathematics specializing in applied mathematics and statistics with a minor in history and philosophy of science. In her free time, she focuses her efforts on student governance, promoting women's representation in STEM, and working with data in the non-profit and charitable sector.
- Haoluan Chen is an undergraduate student in the Department of Statistical Science specializing in applied statistics. He is interested in applying data science techniques, especially in NLP, to gain insight from the data.
- Isaac Ehrlich is an undergraduate student in Statistics and Cognitive Science at the University of Toronto. He enjoys using R for everything from analysing trends in his recent movie-viewing history, to his past research building models on human categorization.
- Mariam Walaa is an undergraduate student in the Department of Computer and Mathematical Sciences at University of Toronto Scarborough, majoring in Mathematics and minoring in GIS and statistics. Mariam enjoys learning about how to work with data such as spatial and text data to extract insights.
- Marija Pejcinovska is a graduate student in the Department of Statistical Sciences. Her research is motivated by modelling challenges that arise with "complicated" data (sparse/highly biased/poor quality data), usually in the context of social or health inequalities.
- Matthew Wankiewicz is an undergraduate student at the University of Toronto, majoring in Statistics and minoring in Mathematics and the History and Philosophy of Science.
- Michael Chong is a graduate student in the Department of Statistical Sciences. His research aims to build statistical models for demographic estimation in contexts where high quality data are unavailable. There is almost always an active R session on his computer!
- Paul Hodgetts is a Master of Information candidate concentrating in Human-Centred Data Science in the Faculty of Information, University of Toronto. He sincerely believes that Calvin and Hobbes is the greatest comic ever produced.
- Rohan Alexander is an assistant professor in the Faculty of Information and the Department of Statistical Sciences. Some people convert to catholicism upon marriage, Rohan converted to R. His greatest professional achievement is probably getting a pull request accepted into R for Data Science (it was just fixing a minor typo but still).
- Samantha-Jo Caetano is an assistant professor (teaching stream) in the Department of Statistical Sciences. She loves statistics, socializing, her family, and her dogs, not necessarily in that order.
- Shirley Deng is an undergraduate student specializing in Statistics and majoring in Mathematics. Meticulous and soft-hearted, she often finds herself engrossed in new pastimes by the second at the influence of her peers. One of which that has remained a longtime constant - spending an excessive amount of time helping people debug their R code.
- Yena Joo is an undergraduate student majoring in Economics and double minoring in Statistics and Computer Science.
## Pedagogical underpinnings
*Coming soon.*
## Acknowledgements
We gratefully acknowledge the support of Professor Bethany White, Chair Radu Craiu, and the U of T Faculty of Arts & Sciences Pedagogical Innovation and Experimentation Fund.
We'd like to acknowledge the help of:
- Liza Bolton
- Monica Alexander
- Sabrina Sixta
We'd like to thank Alex Cookson for his collection of datasets.
This toolkit builds on, and complements, the work of many others, including:
- Alex Stringer - https://github.com/awstringer1/course-materials
- Alison Gibbs, Jeff Rosenthal, Nathan Taback - https://stats.onlinelearning.utoronto.ca/
- Bethany White and Jennifer Peter - https://rscidata.utoronto.ca/
- Desirée De Leon and Alison Hill - https://rstudio4edu.github.io/rstudio4edu-book/
- Hasse Walum and Desirée De Leon - https://tinystats.github.io/teacups-giraffes-and-statistics/index.html
- Nathan Taback - https://ntaback.github.io/UofT_STA130/R_resources.html
- Radford Neal - http://www.cs.utoronto.ca/~radford/csc121.S17/
Rohan would like to thank Greg Wilson, for sharing his experience, thoughts, and leadership.
## References
We draw on the open-source statistical programming language R and a variety of packages. We are grateful for the work that we build on.
- Barret Schloerke, JJ Allaire, Barbara Borges and Garrick Aden-Buie (2021). learnr: Interactive Tutorials for R. https://rstudio.github.io/learnr/, https://github.com/rstudio/learnr.
- Bodwin, Kelly, and Hunter Glanz (2020). flair: Highlight, Annotate, and Format your R Source Code. R package version 0.0.2. https://CRAN.R-project.org/package=flair
- Chen, Daniel, Barret Schloerke, Garrick Aden-Buie and Garrett Grolemund (2021). gradethis: Automated Feedback for Student Exercises in 'learnr' Tutorials. https://rstudio-education.github.io/gradethis/, https://rstudio.github.io/learnr/, https://github.com/rstudio-education/gradethis.
- de Vries, Andrie, Barret Schloerke and Kenton Russell (2020). sortable: Drag-and-Drop in 'shiny' Apps with 'SortableJS'. R package version 0.4.4. https://CRAN.R-project.org/package=sortable
- Gelfand, Sharla, (2020). opendatatoronto: Access the City of Toronto Open Data Portal. R package version 0.1.4. https://CRAN.R-project.org/package=opendatatoronto
- Hester, Jim, Gábor Csárdi, Hadley Wickham, Winston Chang, Martin Morgan and Dan Tenenbaum (2020). remotes: R Package Installation from Remote Repositories, Including 'GitHub'. R package version 2.2.0. https://CRAN.R-project.org/package=remotes
- R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
## Related packages
- TBD
## Next steps
- Assessment
- Teaching
- French language version
## Citation
We have a pre-print coming soon.
## How to contribute
The best way to contribute fixes and minor typos is to make a pull request on GitHub.
If you are interested in contributing lessons or modules, then please contact Rohan Alexander. We are particularly interested in partnering with an institution where the language of instruction is French to develop a French language version.
## Contact
Please contact Rohan (rohan.alexander@utoronto.ca) with any questions, comments, and suggestions.