Stats-141-XP-Final-Project/Stats141_Proj_Q2.Rmd at main · mell00/Stats-141-XP-Final-Project · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
title: "Stats141_Project_Q2"
author: "Yichen Wang"
date: "2025-03-05"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r}
df = read.csv("Informed Placement Process (IPP) 2024 CLEANED.csv")
```

```{r}
head(df)
```
```{r}
summary(df)
```
**Demographic Factors Influencing Placement Decisions （Yichen, Yash）**
- Examining how student demographics (e.g., underrepresented minority status, first-generation status, Pell Grant eligibility, and possibly gender) correlate with: Self-placement choices; Faculty placement decisions; Discrepancies between student preferences and final placement.

Ordinal logistic regression estimates how predictors including demographics and survey scores make someone to choose the course on that ordinal scale, and the course levels are 1, 2, and 3.
```{r}
# ordinal logistic reg.
library(ordinal)
library(tidyverse)
library(MASS)


glimpse(df)
summary(df)
```
```{r}
# Convert final placement to ordinal factors
df <- df %>%
  mutate(
    Final.ENG.COMP.Placement..Final.Score. = factor(
      Final.ENG.COMP.Placement..Final.Score.,
      levels = c(1, 2, 3), ordered = TRUE
    ),
    Student.s.Course.Preference = factor(
      Student.s.Course.Preference,
      levels = c(1, 2, 3), ordered = TRUE
    ),
    # Convert demographic variables to factors (with relevant reference levels)
    urm = factor(urm, levels = c("Not URM","URM")),  # example
    first_gen_bachelors = factor(first_gen_bachelors, levels = c("Not First Gen","First Gen")),
    gender = factor(gender),
  )

# remove rows with excessive missingness
df_clean <- df %>%
  filter(!is.na(Final.ENG.COMP.Placement..Final.Score.),
         !is.na(Student.s.Course.Preference),
         !is.na(urm), !is.na(first_gen_bachelors),
         !is.na(gender),
  )

```

Note that Final.ENG.COMP.Placement..Final.Score. and Student.s.Course.Preference are ordered factors, meaning that 1, 2, 3 have a progression. Demographic variables including URM, first-gen become factors, and R treats them as categorical.

# 1. Model for student self-placement
```{r}
# MASS::polr for ordinal logistic regression
model_self <- polr(
  Student.s.Course.Preference ~ urm + first_gen_bachelors + ever_pell_fl +
    gender + Post.IPP.Survey,
  data = df_clean,
  Hess = TRUE # compute hessian matrix
)

summary(model_self)

# odds ratios with confidence intervals
(ctab_self <- coef(summary(model_self)))
OR_self <- exp(coef(model_self))  # exponentiate the coefficients
OR_self

```
Note that polr stands for Proportional Odds Logistic Regression (from the MASS package).
The summary prints the estimated coefficients for each predictor (like urmURM, genderMale) and the intercepts (1|2 and 2|3) that separate the three outcome categories.
- ctab_self is a table of the coefficients and standard errors.
- exp(coef(model_self)) transforms log-odds (the raw measure in logistic regression) into odds ratios.
 - An odds ratio > 1 means the predictor increases the chance of a higher-level placement.
 - An odds ratio < 1 means it lowers the chance of a higher-level placement.

Analysis of the results:

1. urmURM = 0.327 means around 67% decrease in odds of choosing a higher-level course. URM Students are less likely to place themselves in higher-level composition classes. With a coefficient of −1.12, the odds ratio (0.33) indicates that while holding other variables constant, URM students’ odds of self-selecting a higher-level course (e.g., ENG COMP 3 rather than 2, or 2 rather than 1) is only about 1/3 that of non-URM students, which is statistically significant.

2. Post-IPP Survey reflecting reading/writing confidence and experiences is strongly positive. For every 1-point increase on the (0–48) survey scale, the odds of placing oneself into a higher course level increase by about 37%. This is large and significant, suggesting that self-perceived reading/writing confidence effectively drives students’ self-placement choices.

3. First-generation status, Pell eligibility, and gender show no statistically meaningful differences for self-placement in this sample (p-values are moderate to large). Their effects are small or near zero once URM status and Post-IPP Survey are controlled for.

In summary, to target the question "Which demographics predict how students choose their composition course?", the results in general suggest that URM students systematically place themselves lower, and Confidence (Post-IPP) is a key driver of placing oneself higher.

# 2. Model for final faculty placement
```{r}
model_final <- polr(
  Final.ENG.COMP.Placement..Final.Score. ~ urm + first_gen_bachelors +
    ever_pell_fl + gender + IPP.Score.1 + Post.IPP.Survey,
  data = df_clean,
  Hess = TRUE
)

summary(model_final)
(ctab_final <- coef(summary(model_final)))
OR_final <- exp(coef(model_final))
OR_final

```

Analysis of the results:

1. URM is Negative: Even after controlling for IPP Score, Post-IPP Survey, and other demographics, URM students are again less likely to receive a higher-level composition placement from the faculty. The coefficient −1.265 is significant; the odds ratio is about 0.28. This is a substantial effect—larger in absolute value than in the self-placement model.

2. IPP.Score.1 is Extremely Strong (127), indicating that the IPP reading/writing score is a dominant predictor of final placement. Students with higher IPP scores are far more likely to be placed into higher-level courses by the faculty. This dwarfs the effect of Pell status, gender, or the Post-IPP confidence measure.

3. Post-IPP.Survey is also significant but with a smaller magnitude (1.19). Even controlling for the “objective” IPP score, confidence still has some incremental relationship with final placement. It is possible that confident students produce better writing samples or the IPP tasks partially capture confidence as well.

4. No strong evidence for or against differences for Gender, First-Gen, Pell.

In summary, IPP Score is the overwhelming determinant. URM students, on average, still land in lower courses.

```{r}
df_clean <- df_clean %>%
  mutate(
    # Coerce factors back to numeric to get the levels 1, 2, 3
    numeric_self = as.numeric(Student.s.Course.Preference),
    numeric_final = as.numeric(Final.ENG.COMP.Placement..Final.Score.),
    discrepancy = numeric_final - numeric_self
  )

```

```{r}
# Convert discrepancy to factor (ordered)
df_clean <- df_clean %>%
  mutate(discrepancy = factor(discrepancy, levels = c(-2, -1, 0, 1, 2), ordered = TRUE))

model_disc <- polr(
  discrepancy ~ urm + first_gen_bachelors + ever_pell_fl +
    gender + IPP.Score.1 + Post.IPP.Survey,
  data = df_clean,
  Hess = TRUE
)

summary(model_disc)
(ctab_disc <- coef(summary(model_disc)))
OR_disc <- exp(coef(model_disc))
OR_disc

```
The discrepancy model produces the outcome that showcases how much higher or lower the faculty placed the student than the student placed themselves.
- If discrepancy = 0, the faculty and student agreed on the same level.
- If discrepancy = 1, the faculty placed the student one level higher than the student did.
- If discrepancy = -1, the faculty placed them one level lower, and so on.

Analysis of the results:
1. URM is Positive for Discrepancy: An odds ratio of ~1.55 means that URM students, on average, are more likely to have a final placement above their self-placement. Note that: URM students self-place lower (Model 1). The faculty places URM students lower overall (Model 2), but not as drastically lower as they themselves do in some cases. Net effect: URM students are more likely to “under-place” themselves relative to the faculty’s judgment. Hence, the discrepancy becomes positive (faculty is above self).

2. IPP.Score.1 11.41 has a strong positive association with a positive discrepancy. If your IPP Score is high, but you placed yourself modestly, the faculty is more likely to “bump” you up, suggesting that students with strong objective performance but conservative self-perception end up with a final placement above their self-placement.

3. Post.IPP.Survey has a negative effect on discrepancy 0.8. Students with higher self-reported confidence are more likely to place themselves high, reducing the chance that the faculty placement is even higher. In other words, students who believe strongly in their reading/writing skill rarely get placed above where they placed themselves—since they’re already placing themselves high.

4. No strong effect for first-gen, Pell, or gender in discrepancy.

To encapsulate, Students who are relatively “modest” in self-placement but do well on the IPP are often placed higher by faculty; URM students are especially prone to “self-place lower” and hence final – self is more likely to be positive; Students with high confidence often place themselves high already so final minus self is smaller or negative.