A comprehensive book that provides step-by-step instructions on data analysis for researchers and students in natural sciences using R. This book is designed to guide users through fundamental statistical concepts and practical data analysis techniques with a focus on ecological, environmental, and life sciences applications.
Online Version: https://jm0535.github.io/dains/
The book covers:
| Part | Topics |
|---|---|
| Getting Started | Introduction to R, data analysis fundamentals, data basics |
| Data Analysis Fundamentals | Exploratory data analysis, hypothesis testing, statistical tests |
| Data Visualization | Visualization techniques, advanced graphics with ggplot2 |
| Advanced Topics | Regression analysis, conservation applications |
| R in Context | Integrations with jamovi, JASP, Positron, Quarto, reticulate, plumber |
- Introduction to Data Analysis: R basics and analytical thinking
- Data Basics: Data structures, importing, and cleaning
- Exploratory Data Analysis: Descriptive statistics and pattern discovery
- Hypothesis Testing: Statistical inference fundamentals
- Statistical Tests: Common parametric and non-parametric tests, with assumption checks
- Data Visualization: Creating effective scientific graphics
- Advanced Visualization: Interactive and publication-quality figures
- Regression Analysis: Linear models, diagnostics, and the tidymodels framework
- Advanced Modeling: Mixed-effects, GLMs, and modern modeling approaches
- Conservation Applications: Real-world ecological case studies
- R in the Wider Ecosystem: Integrations with jamovi, JASP, Positron, VS Code, Jupyter, reticulate, plumber, and friends
All datasets live in the data/ directory, organized by scientific discipline. A few of the directory names reflect the chapter context in which the data are used rather than the literal subject of the CSV file (the files were sourced from public datasets and kept under their working names so chapter references stay stable). See data/MISMATCHES.md for the full audit.
| Directory | Chapter use | Actual data |
|---|---|---|
agriculture/ |
Crop yields by country and year | Our World in Data: Wheat / Rice / Maize tonnes per hectare |
botany/ |
Categorical analysis example | Break Free From Plastic brand audit (polymer types) |
ecology/ |
Biodiversity and threat status | IUCN Red List records |
economics/ |
Quality-vs-price regression | Coffee Quality Institute scores |
entomology/ |
Categorical / counts example | Austin (and Australian) animal-shelter outcomes |
environmental/ |
Continuous variables example | Palmer Penguins morphology |
epidemiology/ |
Time-series / spatial example | Atlantic hurricane tracks |
forestry/ |
Continuous variables example | Star Wars character measurements (used as a stand-in) |
geography/ |
Categorical example | EMA medicine authorisations |
marine/ |
Long-format time series | Great Lakes Fishery Commission fish populations |
Each dataset directory contains a CITATION.txt with source attribution. If you want a dataset whose contents match its directory name (e.g. real forestry inventory), drop it in and update the corresponding chapter reference.
- R (version 4.0.0 or higher)
- RStudio (recommended IDE)
- Quarto (for building the book)
-
Clone the repository:
git clone https://github.com/jm0535/dains.git cd dains -
Install required R packages:
source("install_packages.R")Or manually install core packages:
install.packages(c( "tidyverse", "tidymodels", "ggplot2", "rstatix", "knitr", "rmarkdown", "performance", "see" ))
-
Download datasets (if needed):
source("download_datasets.R")
To build the HTML version of the book locally:
-
Install Quarto from quarto.org
-
Render the book:
quarto render
-
Preview locally:
quarto preview
The rendered book will be available in the docs/ directory.
dains/
βββ _quarto.yml # Quarto configuration
βββ index.qmd # Book landing page
βββ preface.qmd # Preface chapter
βββ references.qmd # References chapter
βββ chapters/ # Book chapters (01-10)
βββ solutions/ # Instructor answer keys (instructor-solutions branch only)
βββ data/ # Datasets by discipline
βββ docs/ # Rendered HTML output
βββ images/ # Book images and cover
βββ R/ # Helper R functions
βββ scripts/ # Utility scripts
βββ styles.css # Custom CSS styling
βββ references.bib # Bibliography
βββ apa.csl # Citation style
Contributions to improve the book are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-improvement) - Make your changes
- Run
quarto renderto ensure everything builds correctly - Commit your changes (
git commit -m 'Add some amazing improvement') - Push to the branch (
git push origin feature/amazing-improvement) - Open a Pull Request
Please read CONTRIBUTING.md for detailed guidelines.
A separate instructor-solutions branch carries worked answer keys for all chapter exercises under solutions/. The branch is intentionally kept out of the published book and out of main to keep solutions away from students. If you teach from this book and want access, open an issue with a brief verification request.
A few notes on how the book is built and maintained:
- CI/CD: Every push to
maintriggers.github/workflows/publish.yml, which renders the book with Quarto and deploys the output to GitHub Pages. The committeddocs/folder is not what gets published; CI re-renders on every push, so figures regenerate from the R code in each chapter. - Render-time gates: The workflow warns at 25 minutes of render time and fails at 40, so render regressions get caught before they reach production.
- Git LFS: Large binary assets (cover images, R logo, rendered PDFs) are tracked with Git LFS. See
docs/GIT_LFS_SETUP.mdfor setup notes. - Dependency tracking:
renvpins the R package versions used to build the book. The most recent dependency audit lives inrenv-audit.md. - Statistical rigor: Chapter 5 (statistical tests) and Chapter 8 (regression) include explicit assumption-check callouts (Shapiro-Wilk, Levene's test, expected cell counts for chi-square, residual diagnostics, VIF for collinearity) before each test is applied.
This project is licensed under the MIT License - see the LICENSE file for details.
Jimmy Moses School of Forestry, Faculty of Natural Resources Papua New Guinea University of Technology PMB 411, Lae, Morobe Province, Papua New Guinea
- The R Core Team for developing R
- The tidyverse team for revolutionizing R programming
- The Quarto team for the publishing system
- All data providers who make their datasets openly available
- Students and colleagues who provided feedback
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Last updated: May 2026