-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathreport.Rmd
More file actions
217 lines (141 loc) · 5.45 KB
/
report.Rmd
File metadata and controls
217 lines (141 loc) · 5.45 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
title: "Statistical Modeling of Student Performance"
author: "Sana Ur Rehman Arain"
date: "`r Sys.Date()`"
output:
html_document:
theme: flatly
toc: true
toc_float: true
number_sections: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE,
warning = FALSE,
message = FALSE,
fig.align = "center"
)
library(tidyverse)
library(knitr)
```
---
# Introduction
Academic performance is influenced by a complex interaction of social, economic, and behavioral factors.
Rather than focusing purely on grade prediction, this project emphasizes **statistical inference** to identify which factors *significantly influence* student outcomes when controlling for others.
**Objective:**
To identify statistically valid predictors of final Mathematics grades (`G3`) using the **UCI Student Performance Dataset**.
---
# Exploratory Data Analysis (EDA)
Before formal modeling, exploratory analysis was conducted to understand distributions and potential relationships between variables.
## Distribution of Final Grades
The final grade (`G3`) follows an approximately normal distribution centered around **10–11**, with a notable spike at **0**, representing dropouts or severe failures.
```{r}
knitr::include_graphics("results/distribution_G3.png")
```
---
## Socioeconomic Factors
Family background appears to play a role in academic performance.
The boxplot below suggests that students whose mothers work in **Health** or **Teaching** professions tend to achieve higher grades.
```{r}
knitr::include_graphics("results/mjob_vs_grade.png")
```
These patterns are formally tested using ANOVA in Section 3.
---
## Study Habits
Initial exploration shows a weak positive association between study time and grades.
However, this relationship may be confounded by other factors such as prior failures and parental education.
```{r}
knitr::include_graphics("results/studytime_vs_grade.png")
```
---
## Correlation Matrix (Numeric Variables)
The correlation matrix shows relationships among numeric predictors and the target variable (`G3`).
This helps to identify multicollinearity and supports our feature selection.
```{r}
knitr::include_graphics("results/correlation matric of numeric variables.png")
```
---
# Statistical Inference
To validate findings from EDA, we performed formal hypothesis testing.
## Internet Access (Welch Two-Sample *t*-Test)
**Hypothesis:**
Students with internet access achieve higher final grades than those without.
**Results:**
* Mean (Internet = Yes): **10.62**
* Mean (Internet = No): **9.41**
* *p*-value = **0.0495**
**Conclusion:**
The difference is statistically significant at the 5% level.
Internet access is associated with higher academic performance.
---
## Mother’s Job Type (ANOVA)
**Hypothesis:**
Student grades differ by mother’s occupation.
**Results:**
* ANOVA *p*-value = **0.00519**
**Post-hoc Analysis (Tukey HSD):**
* The only statistically significant pairwise difference was between:
* **Health** vs **At Home**
* Students with mothers in Health professions score on average **2.99 points higher** (*p-adj = 0.018*).
---
# Multivariate Regression Model
A linear regression model was fitted to evaluate multiple predictors simultaneously.
## Model Specification
[
G3 = \beta_0 + \beta_1(\text{studytime}) + \beta_2(\text{failures}) +
\beta_3(\text{absences}) + \beta_4(\text{Medu}) +
\beta_5(\text{Fedu}) + \beta_6(\text{goout}) + \varepsilon
]
---
## Regression Results
```{r}
reg_table <- tibble(
Term = c("Intercept", "Study Time", "Failures", "Absences",
"Mother's Education", "Father's Education", "Going Out"),
Estimate = c(10.32, 0.16, -1.93, 0.03, 0.65, -0.08, -0.42),
`Std. Error` = c(1.03, 0.26, 0.31, 0.03, 0.25, 0.25, 0.19),
`p-value` = c("< 2e-16", "0.536", "7.24e-10", "0.333", "0.011", "0.752", "0.029"),
Significance = c("***", "", "***", "", "*", "", "*")
)
kable(reg_table, caption = "Linear Regression Results for Final Grade (G3)")
```
**Model Fit:**
* **R²:** 0.162
* **Adjusted R²:** 0.149
* **F-statistic p-value:** < 0.001
---
## Interpretation
* **Failures:**
The strongest predictor. One past failure lowers expected grades by **~2 points**, holding all else constant.
* **Mother’s Education (Medu):**
Higher maternal education significantly improves student performance.
* **Socializing (`goout`):**
A clear “party penalty”: each unit increase reduces grades by **0.42 points**.
* **Study Time:**
Becomes statistically insignificant after controlling for background and failures, suggesting **quality and foundation matter more than hours alone**.
---
# Model Diagnostics
Regression assumptions were evaluated using residual diagnostics.
```{r}
knitr::include_graphics("results/model_diagnostics.png")
```
The plots indicate:
* Approximate normality of residuals
* No major heteroscedasticity
* No extreme influential observations
---
# Conclusion
This analysis challenges the simplistic belief that *“studying more automatically leads to better grades.”*
Instead, results show that:
* **Past academic failures**
* **Socioeconomic background**
* **Access to resources (internet)**
are far more influential determinants of student success.
## Recommendations
1. **Targeted Academic Intervention**
Students with even a single prior failure should receive immediate support.
2. **Equitable Resource Access**
Ensuring internet availability and academic support for disadvantaged students can significantly improve outcomes.
---
*End of Report*