Data Analysis in Natural Sciences: An R-Based Approach

A comprehensive book that provides step-by-step instructions on data analysis for researchers and students in natural sciences using R. This book is designed to guide users through fundamental statistical concepts and practical data analysis techniques with a focus on ecological, environmental, and life sciences applications.

📖 Read the Book

Online Version: https://jm0535.github.io/dains/

📚 Contents

The book covers:

Part	Topics
Getting Started	Introduction to R, data analysis fundamentals, data basics
Data Analysis Fundamentals	Exploratory data analysis, hypothesis testing, statistical tests
Data Visualization	Visualization techniques, advanced graphics with ggplot2
Advanced Topics	Regression analysis, conservation applications
R in Context	Integrations with jamovi, JASP, Positron, Quarto, reticulate, plumber

Chapter Overview

Introduction to Data Analysis: R basics and analytical thinking
Data Basics: Data structures, importing, and cleaning
Exploratory Data Analysis: Descriptive statistics and pattern discovery
Hypothesis Testing: Statistical inference fundamentals
Statistical Tests: Common parametric and non-parametric tests, with assumption checks
Data Visualization: Creating effective scientific graphics
Advanced Visualization: Interactive and publication-quality figures
Regression Analysis: Linear models, diagnostics, and the tidymodels framework
Advanced Modeling: Mixed-effects, GLMs, and modern modeling approaches
Conservation Applications: Real-world ecological case studies
R in the Wider Ecosystem: Integrations with jamovi, JASP, Positron, VS Code, Jupyter, reticulate, plumber, and friends

📊 Datasets

All datasets live in the data/ directory, organized by scientific discipline. A few of the directory names reflect the chapter context in which the data are used rather than the literal subject of the CSV file (the files were sourced from public datasets and kept under their working names so chapter references stay stable). See data/MISMATCHES.md for the full audit.

Directory	Chapter use	Actual data
`agriculture/`	Crop yields by country and year	Our World in Data: Wheat / Rice / Maize tonnes per hectare
`botany/`	Categorical analysis example	Break Free From Plastic brand audit (polymer types)
`ecology/`	Biodiversity and threat status	IUCN Red List records
`economics/`	Quality-vs-price regression	Coffee Quality Institute scores
`entomology/`	Categorical / counts example	Austin (and Australian) animal-shelter outcomes
`environmental/`	Continuous variables example	Palmer Penguins morphology
`epidemiology/`	Time-series / spatial example	Atlantic hurricane tracks
`forestry/`	Continuous variables example	Star Wars character measurements (used as a stand-in)
`geography/`	Categorical example	EMA medicine authorisations
`marine/`	Long-format time series	Great Lakes Fishery Commission fish populations

Each dataset directory contains a CITATION.txt with source attribution. If you want a dataset whose contents match its directory name (e.g. real forestry inventory), drop it in and update the corresponding chapter reference.

🚀 Getting Started

Prerequisites

R (version 4.0.0 or higher)
RStudio (recommended IDE)
Quarto (for building the book)

Installation

Clone the repository:

git clone https://github.com/jm0535/dains.git
cd dains

Install required R packages:

source("install_packages.R")

Or manually install core packages:

install.packages(c(
  "tidyverse",
  "tidymodels",
  "ggplot2",
  "rstatix",
  "knitr",
  "rmarkdown",
  "performance",
  "see"
))

Download datasets (if needed):
```
source("download_datasets.R")
```

🔨 Building the Book

To build the HTML version of the book locally:

Install Quarto from quarto.org
Render the book:
```
quarto render
```
Preview locally:
```
quarto preview
```

The rendered book will be available in the docs/ directory.

📁 Project Structure

dains/
├── _quarto.yml          # Quarto configuration
├── index.qmd            # Book landing page
├── preface.qmd          # Preface chapter
├── references.qmd       # References chapter
├── chapters/            # Book chapters (01-10)
├── solutions/           # Instructor answer keys (instructor-solutions branch only)
├── data/                # Datasets by discipline
├── docs/                # Rendered HTML output
├── images/              # Book images and cover
├── R/                   # Helper R functions
├── scripts/             # Utility scripts
├── styles.css           # Custom CSS styling
├── references.bib       # Bibliography
└── apa.csl              # Citation style

🤝 Contributing

Contributions to improve the book are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-improvement)
Make your changes
Run quarto render to ensure everything builds correctly
Commit your changes (git commit -m 'Add some amazing improvement')
Push to the branch (git push origin feature/amazing-improvement)
Open a Pull Request

Please read CONTRIBUTING.md for detailed guidelines.

👩‍🏫 For Instructors

A separate instructor-solutions branch carries worked answer keys for all chapter exercises under solutions/. The branch is intentionally kept out of the published book and out of main to keep solutions away from students. If you teach from this book and want access, open an issue with a brief verification request.

🛠️ Project Infrastructure

A few notes on how the book is built and maintained:

CI/CD: Every push to main triggers .github/workflows/publish.yml, which renders the book with Quarto and deploys the output to GitHub Pages. The committed docs/ folder is not what gets published; CI re-renders on every push, so figures regenerate from the R code in each chapter.
Render-time gates: The workflow warns at 25 minutes of render time and fails at 40, so render regressions get caught before they reach production.
Git LFS: Large binary assets (cover images, R logo, rendered PDFs) are tracked with Git LFS. See docs/GIT_LFS_SETUP.md for setup notes.
Dependency tracking: renv pins the R package versions used to build the book. The most recent dependency audit lives in renv-audit.md.
Statistical rigor: Chapter 5 (statistical tests) and Chapter 8 (regression) include explicit assumption-check callouts (Shapiro-Wilk, Levene's test, expected cell counts for chi-square, residual diagnostics, VIF for collinearity) before each test is applied.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

✍️ Author

Jimmy Moses School of Forestry, Faculty of Natural Resources Papua New Guinea University of Technology PMB 411, Lae, Morobe Province, Papua New Guinea

🙏 Acknowledgments

The R Core Team for developing R
The tidyverse team for revolutionizing R programming
The Quarto team for the publishing system
All data providers who make their datasets openly available
Students and colleagues who provided feedback

📬 Contact

Issues: GitHub Issues
Discussions: GitHub Discussions

Last updated: May 2026

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github/workflows		.github/workflows
.trunk		.trunk
R		R
chapters		chapters
data		data
docs		docs
images		images
logs		logs
renv		renv
scripts		scripts
tests		tests
.Rprofile		.Rprofile
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
Data-Analysis-in-Natural-Sciences.Rproj		Data-Analysis-in-Natural-Sciences.Rproj
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
_quarto.yml		_quarto.yml
apa.csl		apa.csl
dains.Rproj		dains.Rproj
docker-compose.yml		docker-compose.yml
glossary.qmd		glossary.qmd
index.qmd		index.qmd
preface.qmd		preface.qmd
references.bib		references.bib
references.qmd		references.qmd
renv-audit.md		renv-audit.md
renv.lock		renv.lock
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analysis in Natural Sciences: An R-Based Approach

📖 Read the Book

📚 Contents

Chapter Overview

📊 Datasets

🚀 Getting Started

Prerequisites

Installation

🔨 Building the Book

📁 Project Structure

🤝 Contributing

👩‍🏫 For Instructors

🛠️ Project Infrastructure

📜 License

✍️ Author

🙏 Acknowledgments

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Analysis in Natural Sciences: An R-Based Approach

📖 Read the Book

📚 Contents

Chapter Overview

📊 Datasets

🚀 Getting Started

Prerequisites

Installation

🔨 Building the Book

📁 Project Structure

🤝 Contributing

👩‍🏫 For Instructors

🛠️ Project Infrastructure

📜 License

✍️ Author

🙏 Acknowledgments

📬 Contact

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages