This repository contains .R scripts and .txt R console transcripts to record my progress through the PH525x genomics class (Rafael Irizarry, Michael Love, Harvard) code worksheets.
Status: Chapter 4 (Matrix Algebra) complete
Key learning progress:
Ch4:
- understanding matrix notation and operations
- calculating beta coefficients to measure trends/changes between variables
- calculating fitted Y values (predicted outcomes) with Xbeta --> (regression/prediction line)
- residual sum of squares (RSS) to compare observed vs predicted outcomes; minimising RSS (least squares with lm())
Ch3:
- experimenting with robust statistics: spearman, MAD, wilcox
- filtering data (ChickWeight), adding outliers, generating and comparing boxplots
Ch2:
- when and how to use histogram, qq-plot, boxplot, scatterplot for exploratory data analyses
- univariate analysis vs 2D analysis
- stratifying 2-dimensional data
Ch1:
- understanding the theory behind statistics: CLT, Monte Carlo, t-distributions, confidence intervals, power
- using parametric simulations to generate population data
- modelling the null distribution to investigate differences between observed vs expected data (permutation tests, association tests)