forked from wevanjohnson/2026_Spring_FDS
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathFundamentals_Syllabus.Rmd
More file actions
157 lines (108 loc) · 9.94 KB
/
Fundamentals_Syllabus.Rmd
File metadata and controls
157 lines (108 loc) · 9.94 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
title: "Fundamentals of Data Science (GSND 5345)"
date: Spring (Jan-Feb), 2026
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### COURSE DESCRIPTION:
This class is an introduction to the ethics and essential computational tools and skills for data science. The course will cover command-line coding, literate programming, software development, version control, data wrangling and management, and visualization. The standards for open science, reproducibility, and ethical and responsible computing will also be discussed. Students be expected to use R and GitHub throughout this course.
### COURSE OBJECTIVES:
Students who take this course will:
1. Gain experience with the fundamental tools and skills for data science
2. Develop an advanced understanding of the R programming language
3. Understand the principles and concepts surrounding reproducibility and open science
4. Discuss the ethical issues and potential bias in data and machine learning
5. Learn how to effectively plot and visualize data (know what to do and not to do!)
### PREREQUISITES
An introductory course in statistics, biostatistics, epidemiology, or equivalent experience in statistical analysis is recommended (but not required). Programming experience in R is also recommended (again not required). Students without this experience will be encouraged to utilize the asynchronous resources provided at the end of this syllabus to obtain these skills before or during the course. Please contact Dr. Johnson to obtain a list of the required proficiencies.
### COURSE FORMAT:
This class will be taught virtually using a synchronous remote modality, although students will be provided a classroom to gather for each lecture. A co-instructor will be present in the classroom for each lecture. Class will occur Mondays and Wednesdays from 12:00pm-1:50pm. Courses may also be recorded and made available for students who need to miss classes due to personal reasons, illness, or research related needs.
### ZOOM LINK AND CLASSROOM:
Zoom Meeting ID for all sessions is 95146491967, with the passcode: 236441, or use the following direct link (the link is also available though the course GitHub page): https://rutgers.zoom.us/j/95146491967?pwd=ySdIKF1NFl4wAOhtAwhop825QUYWWL.1.
Room B619 will also be available for the students to congregate for each lecture, with a co-instructor or TA present.
### FACULTY AND STAFF:
W. Evan Johnson, Ph.D.\
Email: w.evan.johnson@rutgers.edu\
Cell Phone: (801) 472-6951
Teaching Assistant: TBA\
Email: TBA\
### OFFICE HOURS:
**Instructor:** Dr. Johnson will be available virtually by appointment only. Email or text him any time to set up a time to meet!
### GitHub REPOSITORY:
The course GitHub repository is located at: https://github.com/wevanjohnson/2026_Spring_FDS. This page will contain all information in this syllabus plus more. Homework assignments and other information pertinent to this course will be posted on this web site, which will be updated frequently, so you should visit it regularly.
### COURSE TEXTBOOKS:
We will use multiple text resources in this class. None are required, all are freely available online or can be purchased in hard-copy. Many of my materials are adapted from these resources (thanks to the authors for these):
1. _Modern Data Science with R_, 2nd edition, By Benjamin S. Baumer, Daniel T. Kaplan, Nicholas J. Horton, Chapman and Hall/CRC, 2021. https://mdsr-book.github.io/mdsr2e/
2. _Introduction to Data Science: Data Analysis and Prediction Algorithms with R_, 1st edition, By Rafael A. Irizarry, Chapman and Hall/CRC, 2020. https://rafalab.github.io/dsbook/
3. _R for Data Science: Import, Tidy, Transform, Visualize, and Model Data_, 2nd edition, By Hadley Wickham, Garrett Grolemund, O'Reilly, 2017 https://r4ds.had.co.nz
4. _Mathematical Foundations for Data Analysis_, By Jeff M. Phillips: https://mathfordata.github.io.
### EVALUATION METHODS & COURSE GRADING
#### Assessment/Evaluation:
This course is a hands-on, project-based course. You will be graded based on homework assignments/mini projects (7 problem sets, each worth 100 points). There will be no final exam. Homework assignments and mini projects will be usually assigned at the beginning of each week and will be due by Wednesday of the week after the material is covered. The last homework assignment will include a presentation the last week of class. However, please plan to be flexible on due dates based on the material covered in class.
### Course Grading:
Grade Scale:
|$\geq$|90%|85%|80%|75%|70%|<70%
|:----|---:|---:|---:|---:|---:|---:|
|Grade|A|B+|B|C+|C|F
### ATTENDANCE:
This course is being taught through a synchronous remote modality through Zoom. Attendance is mandatory; lecture recordings will only be available to students with university approved absences or pre-approved special circumstances. If you are sick or have any other justified reason to miss a lecture, please reach out to Dr. Johnson in advance and you will be reasonably accommodated.
<!--
### SOME IMPORTANT DATES:
(See https://njms.rutgers.edu/sgs/current_students/events.php for more)
| | |
|:--------------------|:--------------------------|
|January 1 (Tuesday): | First day of Spring courses |
|January 15 (Monday):| Martin Luther King Jr. Day (no class) |
|February 21 (Wednesday): | Last day of class |
-->
### WORKLOAD:
This is an 8-week, 2.0 credit class in the begining of Spring 2026. In general, you should expect four hours of in class each week, and two hours outside of class for every hour in class.
### OTHER HELP:
I **strongly** encourage you to contact early me if you have difficulty with the material. This course builds on material from prior lectures, so do not fall behind! My job is to help you understand the material as well as possible, and I am flexible with meeting times.
### ACADEMIC INTEGRITY:
You are expected to have read and follow the guidelines at the university’s academic integrity website (http://academicintegrity.rutgers.edu ). These principles forbid plagiarism and require that every Rutgers University student to:
* Properly acknowledge and cite all use of the ideas, results, or words of others
* Properly acknowledge all contributors to a given piece of work
* Make sure that all work submitted as his or her own in a course or other academic activity isproduced without the aid of unsanctioned materials or unsanctioned collaboration
* Treat all other students in an ethical manner, respecting their integrity and right to pursue their educational goals without interference. This requires that a student neither facilitate academic dishonesty by others nor obstruct their academic progress (reproduced from: ttp://academicintegrity.rutgers.edu/academic-integrity-at-rutgers/ ).
Violations of academic integrity will be treated in accordance with university policy, and sanctions for violations may range from no credit for the assignment, to a failing course grade to (for the most severe violations) dismissal from the university.
\newpage
### COURSE TOPICS AND OUTLINE (BY WEEK)
Introduction to and Ethics of Data Science (Week 1)
* 1/5/26: What is Data Science; Keeping the “science” in data science
* 1/7/26: Data ethics and violations; Open science and reproducibility
Data Science Ethics and Essential Tools (Week 2)
* 1/12/25: Introduction to R and RStudio
* 1/14/25: The terminal and Unix
Essential Tools for Data Science (Week 3)
* 1/19/26: Martin Luther King Jr. Day (No Class)
* 1/21/26: High performance computing (Taught by OARC; Dr. Johnson NIH study section)
Essential Tools for Data Science (Week 4)
* 1/26/25: Git and GitHub
* 1/28/25: Introduction to Advanced R Programming
Advanced data wrangling in R (week 5)
* 2/2/25: RMarkdown; Data Structures
* 2/4/25: The tidyverse; Tidydata wrangling
Advanced R Tools (week 6)
* 2/9/24: Creating R packages
* 2/11/24: R/Shiny
Data Visualization (Week 7)
* 2/16/24: General plotting principles; Intro to `ggplot2`
* 2/18/24: Advanced plotting with `ggplot2`
Final Project Presentations (Week 8)
* 2/23/24: Final student presentations
* 2/25/24: Final student presentations
\newpage
### ADDITIONAL (ASYNCHRONOUS) MODULES
Learning R:
* [RStudio Education](https://education.rstudio.com/learn/beginner/)
* [R Programming (Coursera/Johns Hopkins)](https://www.coursera.org/learn/r-programming?specialization=jhu-data-science&utm_medium=sem&utm_source=gg&utm_campaign=B2C_NAMER_jhu-data-science_jhu_FTCOF_specializations_country-US-country-CA&campaignid=313639147&adgroupid=121203872804&device=c&keyword=&matchtype=&network=g&devicemodel=&adposition=&creativeid=507187136066&hide_mobile_promo&gclid=Cj0KCQjw9MCnBhCYARIsAB1WQVUuUyr1GQeQWOkLR-d9lj60pyAih9-5wg__yNgm-L0-VQPrvuZQFtEaApQ5EALw_wcB)
* [Data Science R Basics (edx/Harvard University)](https://www.edx.org/learn/r-programming/harvard-university-data-science-r-basics?irclickid=V9eQWSwpwxyPTCxztt2SI17tUkFyAmzqk1fbyE0&utm_source=affiliate&utm_medium=Hackrio&utm_campaign=edX%20Tracking%20Link_&utm_content=TEXT_LINK&irgwc=1)
* [R Training Course (LinkedIn)](https://www.linkedin.com/learning/learning-r)
* [R Programming A - Z: R for Data Science (Udemy)](https://www.udemy.com/course/r-programming/?ranMID=39197&ranEAID=jU79Zysihs4&ranSiteID=jU79Zysihs4-5fxuDsdoyms05cRQ5nTs7Q&LSNPUBID=jU79Zysihs4&utm_source=aff-campaign&utm_medium=udemyads)
* [Programming with R (Pluralsight)](https://www.pluralsight.com/courses/programming-with-r?aid=7010a000001xAKZAA2&clickid=w9MUi9wpwxyPTCxztt2SI17tUkFyAj2Lk1fbyE0&irgwc=1&mpid=2890636&utm_campaign=2890636&utm_medium=digital_affiliate&utm_source=impactradius)
Here are some resources to learn basic statistics (and in some cases R simultaneously):
* [Data Analysis with R Specialization (Coursera/Duke University)](https://www.coursera.org/specializations/statistics?irclickid=w9MUi9wpwxyPTCxztt2SI17tUkFyAhzKk1fbyE0&irgwc=1&utm_medium=partners&utm_source=impact&utm_campaign=2890636&utm_content=b2c)
* [Introduction to statistis (Coursera/Stanford)](https://www.coursera.org/learn/stanford-statistics)