At a school in Oregon, I was tasked with identifying trends and patterns using attendance, demographic, and student success data from Quarter 3 in order to inform more targeted academic interventions and trainings. This correlational/descriptive analysis was conducted using data derived from Synergy, cleaned in Google Sheets, analyzed in both SQL and R (exploration and analysis), and reported in Google Slides (report anonymized for school confidentiality).
- Which student factors are the largest predictors of student success?
- What should the school do to improve student outcomes?
- Norming Data in Google Sheets: Standardized table headers, cleaned student records, handled NULL values, and exported data for SQL processing
- Initial Data Exploration in SQL: Explored norms in grades overall, by department, by class type, etc.
- Data Cleaning in SQL: Created R-friendly table, merging necessary data from 10 tables and generating new columns to identify absence/trady per period, numeric and ordinal measures of attendance, etc.
- Data Aggregation in SQL: Summarized grades, attendance, and pass rates by demographics, course types, and departments
- Exploratory/Predictive Analysis in R: Explored data via ANOVAs and basic regressions, ran multiple linear regressions to identify most predictive model, calculated relative importance of each student factor
- Table Generation in SQL: Used stored procedures and other queries to create additional tables for the report
- Data Visualization in R: Created dashboards and faceted plots to illustrate trends across absence levels and student groups
- Report in Google Slides: Compiled visuals and findings into a Google Slides report to recommend next steps, demonstrate patterns, and guide staff discussions/interventions
- SQL Basics:
SELECT,FROM,WHERE,ORDER BY,LIMIT,DISTINCT,AS, basic operators and arithemetic - Aggregation:
GROUP BY,HAVING,COUNT(),AVG(),ROUND(),WITH ROLLUP - Joins/Combination Queries:
INNER JOIN,LEFT JOIN,USING(), subqueries inSELECT,FROM, andWHEREclauses - Conditional Expressions:
CASE WHEN ... THEN ... ELSE ... END,COALESCE(),IS NULL,IS NOT NULL - Table Design/Transformation:
CREATE TABLE AS,ALTER TABLE ADD COLUMN,GENERATED ALWAYS AS () STORED - Stored Procedures:
DROP PROCEDURE IF EXISTS,CREATE PROCEDURE (IN ...),DELIMITER,SET,CALL - Dynamic SQL:
PREPARE,EXECUTE,DEALLOCATE PREPARE,CONCAT()
- Data Cleaning and Manipulation:
filter(),mutate(),select(),left_join(),arrange(),group_by(),rename(),case_when(),if_else(),as.factor(),as.character(),round(),is.na() - Data Summarization and Aggregation:
summarize(),n(),mean(),sd(),min(),max(),count() - Statistical Modeling and Tests:
lm(),aov(),summary(),anova(),coef(),confint(),broom::tidy(),broom::glance() - Relative Importance Analysis:
relaimpo::calc.relimp(),relaimpo::boot.relimp(),summary() - Visualization and Plotting:
ggplot(),aes(),geom_point(),geom_bar(),geom_col(),geom_errorbar(),geom_boxplot(),geom_line(),geom_text(),facet_wrap(),facet_grid(),labs(),theme(),scale_fill_manual(),scale_color_manual(),coord_flip(), patchwork operators (+,/),ggsave() - Miscellaneous / Helper Functions:
library(),print(),head(),str(),%>%,glimpse(),unique(),levels(),table()
- Time with teacher (mostly absences, but also tardies) is largest predictor of grades/pass rates
- Averages, norms, and predictors varied by class type (see report for details)
- Shared largest predictors: support seminar/class absences
- Avg. core class grades are [redacted] grade point less than non-core avg
- Support needs had a [redacted] relationship with student success, differing by core/non-core classes
- Treat support seminar absences as attendance
- Currently, support seminar absences/tardies aren’t penalized the same way normal classes are (e.g., sports, detention, parking, etc.), suggesting to students that they're less important
- Enforcing support seminar attendance would align with the finding that support seminar attendance is an even larger predictor of success than normal class attendance
- Increase communications with parents about the importance of attendance for student success
- Post-COVID, student culture still deprioritizes physical presence
- Most absences are parent excused, so work must include parents -- consider reinstating the attendance task force with parents
- Focus teacher trainings on targeted demographics by core vs. non-core subjects (most-needed demographics are redacted for confidentiality)