R Package Structure

The structure indicated in Database Backend can theoretically be edited by any software capable of writing to DuckDB/Parquet. However, a language agnostic representation of statistical model objects (e.g. random forests) is out of scope. Hence, a concrete implementation needs to be tied to a specific software environment. The implementation of the logic will be in R and will be organised in a package structure.

See the package dev intro for an overview and principles. See the workflow phases for an outline of what package functions are associated with which phase.

Classes of R Package

Database Class `parquet_db`

The base database class providing domain-agnostic parquet-backed storage with DuckDB for in-memory SQL operations.

Key features:

Uses a folder structure where each table is stored as a parquet file (or hive-partitioned directory)
DuckDB runs in-memory for SQL operations while data is persisted to disk in parquet format
Supports ZSTD compression for efficient storage
Can load DuckDB extensions (e.g., "spatial")

Core methods:

initialize(path, extensions) - Create/connect to a database folder
execute(statement) - Execute SQL statements
get_query(statement) - Execute SQL queries and return data.table results
commit(x, table_name, method, ...) - Write data to parquet files
fetch(table_name, where, limit, ...) - Read data from parquet files
attach_table(table_name) / detach_table(table_name) - Register/unregister tables in DuckDB
with_tables(tables, fn) - Execute a function with tables temporarily attached

Database Class `evoland_db`

An R6 class that inherits from parquet_db and provides the domain-specific interface for land use change analysis. The class is defined across multiple files:

evoland_db.R - Core class definition and methods
evoland_db_tables.R - Table active bindings (read/write access)
evoland_db_views.R - View active bindings and query methods
evoland_db_neighbors.R - Neighbor analysis methods

Initialization

db <- evoland_db$new(
  path = "myproject.evolanddb",
  report_name = "my_scenario",
  report_name_pretty = "My Scenario Description"
)

Active Bindings for Tables

Tables can be read and written using active bindings with automatic validation:

# Read a table
coords <- db$coords_t

# Write/upsert a table
db$lulc_meta_t <- create_lulc_meta_t(lulc_spec)
db$lulc_data_t <- as_lulc_data_t(lulc_data)

Available table bindings: reporting_t, coords_t, periods_t, runs_t, lulc_meta_t, lulc_data_t, pred_meta_t, pred_data_t_float, pred_data_t_int, pred_data_t_bool, trans_meta_t, trans_preds_t, trans_rates_t, intrv_meta_t, intrv_masks_t, trans_models_t, alloc_params_t, neighbors_t

Active Bindings for Views

Computed views that don't store additional data:

lulc_meta_long_v - Unrolled LULC metadata with one row per source class
pred_sources_v - Distinct predictor URLs and their MD5 checksums
trans_v - Land use transitions derived from consecutive LULC observations
extent - Spatial extent of coords_t as terra::SpatExtent
coords_minimal - Minimal coordinate representation (id_coord, lon, lat)

Setter Methods

set_report(...) - Set reporting metadata key-value pairs
set_coords(type, epsg, extent, resolution) - Initialize coordinate grid
set_periods(period_length_str, start_observed, end_observed, end_extrapolated) - Define time periods
set_neighbors(max_distance, distance_breaks) - Compute neighbor relationships

Adder Methods

add_predictor(pred_spec, pred_data, pred_type) - Add a predictor variable to the database

Query Methods

trans_pred_data_v(id_trans, id_period, id_pred, na_value) - Wide table of transition results and predictor data
pred_data_wide_v(id_trans, id_period, na_value) - Wide predictor data for transition probability prediction
trans_rates_dinamica_v(id_period) - Transition rates formatted for Dinamica export
lulc_data_as_rast(extent, resolution, id_period) - Convert LULC data to terra SpatRast

Analysis Methods

set_full_trans_preds(overwrite) - Initialize full transition-predictor relationships
get_pruned_trans_preds_t(filter_fun, na_value, cores, ...) - Feature selection for transitions
fit_partial_models(fit_fun, gof_fun, sample_frac, seed, na_value, cores, ...) - Fit models on stratified samples
fit_full_models(partial_models, gof_criterion, maximize, na_value, cores) - Refit best models on full data
predict_trans_pot(id_period) - Predict transition potential

Allocation Methods

create_alloc_params_t(n_perturbations, sd) - Compute allocation parameters from historical data
eval_alloc_params_t(id_runs, work_dir, keep_intermediate) - Evaluate allocation parameters via simulation
alloc_dinamica(id_periods, id_run, work_dir, keep_intermediate) - Run Dinamica EGO simulation

Context Management

For multi-run scenarios with hierarchical inheritance:

db$use_run(id_run = 1)  # Activate run context
# ... operations scoped to run 1 ...
db$use_run(NULL)        # Return to global context

Table Classes

Each table in the schema has a corresponding S3 class that inherits from data.table. Creating objects is done via as_* functions, for instance:

coords <- as_coords_t(my_data)
periods <- as_periods_t(period_data)

Some tables also have create_* constructor functions that generate data from specifications:

lulc_meta <- create_lulc_meta_t(list(
  forest = list(pretty_name = "Forest", src_classes = 1:3),
  urban = list(pretty_name = "Urban", src_classes = 4:6)
))

periods <- create_periods_t(
  period_length_str = "P10Y",
  start_observed = "1985-01-01",
  end_observed = "2020-01-01",
  end_extrapolated = "2060-01-01"
)

Upon creation, type coercion and validation are performed via validate.* S3 methods. A specific S3 print method is implemented for each class showing class name, summary statistics, and a preview of the data.

Workflow Phases

The package supports four main phases: Setup, Ingestion, Calibration, and Prediction/Allocation.

Phase 0: Setup

Initialize the database with spatial and temporal configuration.

library(evoland)

db <- evoland_db$new(
  path = "switzerland.evolanddb",
  report_name = "ch_lulc",
  report_name_pretty = "Swiss Land Use Change Model"
)

# Define coordinate grid
db$set_coords(
  type = "square",
  epsg = 2056,
  extent = terra::ext(c(
    xmin = 2480000,
    xmax = 2840000,
    ymin = 1070000,
    ymax = 1300000
  )),
  resolution = 100
)

# Define time periods
db$set_periods(
  period_length_str = "P10Y",
  start_observed = "1985-01-01",
  end_observed = "2020-01-01",
  end_extrapolated = "2060-01-01"
)

Phase 1: Data Ingestion

LULC Data

# Define LULC classes with mappings from source data
db$lulc_meta_t <- create_lulc_meta_t(list(
  forest = list(
    pretty_name = "Forest",
    description = "All forest types",
    src_classes = c(50:60)
  ),
  urban = list(
    pretty_name = "Urban Areas",
    description = "Built-up areas",
    src_classes = c(1:14)
  )
  # ... more classes
))

# Ingest LULC observations
db$lulc_data_t <- as_lulc_data_t(lulc_observations)

Predictor Data

# Add predictors one at a time with metadata
db$add_predictor(
  pred_spec = list(
    elevation = list(
      unit = "masl",
      pretty_name = "Elevation",
      description = "Digital elevation model",
      sources = list(list(url = "...", md5sum = "..."))
    )
  ),
  pred_data = elevation_data,  # data.table with id_coord, id_period, value
  pred_type = "float"
)

Neighbor Relationships

# Compute spatial neighbors (can be slow for large datasets)
db$set_neighbors(
  max_distance = 1000,
  distance_breaks = c(0, 100, 500, 1000)
)

# Generate neighbor-based LULC count predictors
db$generate_neighbor_predictors()

Phase 2: Calibration

Transition Metadata

# Analyze observed transitions and determine viability
db$trans_meta_t <- create_trans_meta_t(
  db$trans_v,
  min_cardinality_abs = 10000,
  exclude_anterior = 9  # e.g., exclude "static" class
)

Feature Selection

# Initialize full predictor set
db$set_full_trans_preds(overwrite = TRUE)

# Apply covariance filtering
trans_preds_filtered <- db$get_pruned_trans_preds_t(
  filter_fun = covariance_filter,
  corcut = 0.7,
  na_value = 0,
  cores = 4
)
db$commit(trans_preds_filtered, "trans_preds_t", method = "overwrite")

# Optional: Apply guided regularized random forest filtering
trans_preds_grrf <- db$get_pruned_trans_preds_t(
  filter_fun = grrf_filter,
  num.trees = 100,
  gamma = 0.8,
  cores = 4
)
db$commit(trans_preds_grrf, "trans_preds_t", method = "overwrite")

Model Training

# Fit partial models with train/test split
partial_models <- db$fit_partial_models(
  fit_fun = fit_glm,      # or fit_ranger for random forests
  gof_fun = gof_glm,      # or gof_ranger
  sample_frac = 0.7,
  seed = 42,
  na_value = 0,
  cores = 4
)

# Select best models and refit on full data
full_models <- db$fit_full_models(
  partial_models = partial_models,
  gof_criterion = "auc",
  maximize = TRUE,
  cores = 4
)

db$trans_models_t <- full_models

Transition Rates

# Calculate observed historical rates
obs_rates <- create_obs_trans_rates_t(db$trans_v, db$trans_meta_t)
db$trans_rates_t <- obs_rates

# Extrapolate to future periods
db$trans_rates_t <- create_extr_trans_rates_t(obs_rates, db$periods_t)

Allocation Parameters

# Compute patch expansion/patcher parameters from historical data
db$alloc_params_t <- db$create_alloc_params_t(
  n_perturbations = 5,
  sd = 0.05
)

# Optional: Evaluate parameters against observed data (requires Dinamica EGO)
db$alloc_params_t <- db$eval_alloc_params_t()

Phase 3: Prediction and Allocation

# Predict transition potential for a future period
trans_pot <- db$predict_trans_pot(id_period = 5)

# Run full simulation with Dinamica EGO
db$alloc_dinamica(
  id_periods = c(4, 5, 6, 7, 8),
  id_run = 0,
  work_dir = "dinamica_runs"
)

Very Short Intro to R Package Development

It's generally easiest to follow the structures shown in Hadley Wickham's and Jennifer Bryan's R Packages.

File Structure

DESCRIPTION holds metadata and declares dependencies but does not import them.
NAMESPACE exports and imports objects out of and into the package namespace. We use roxygen2 to populate this file.
LICENSE.md holds a license text. We use the AGPL.
README.md should welcome developers, who we can assume to be identical with users for now.
R/ contains all exportable R logic. No nested directories are allowed.
src/ contains C++ code interfacing with R via Rcpp.
man/ and vignettes/ contain manual pages and vignettes respectively. The former is populated using roxygen. The latter can be written in quarto's markdown.
data-raw/ contains logic to populate data/, used to deliver (really small) sample datasets.
inst/tinytest/ contains tests using the tinytest framework.
inst/ contains any data that should be available verbatim when the package is installed.
.Rbuildignore indicates which files from the source package structure should not be included in built/installed packages.

R Source Organization

The R/ directory is organized as follows:

File(s)	Purpose
`parquet_db.R`	Base database class (domain-agnostic)
`evoland_db.R`	Main evoland database class
`evoland_db_tables.R`	Table active bindings
`evoland_db_views.R`	View methods and active bindings
`evoland_db_neighbors.R`	Neighbor analysis methods
`*_t.R` (e.g., `coords_t.R`)	Table class definitions with `as_`, `create_`, `validate.`, `print.`
`trans_models_glm.R`, `trans_models_rf.R`	Model fitting implementations
`covariance_filter.R`, `grrf_filter.R`	Feature selection algorithms
`alloc_dinamica.R`	Dinamica EGO integration
`fuzzy_similarity.R`	Map comparison metrics
`util*.R`	Utility functions
`init.R`	Package initialization

Coding Style

The style should follow the tidyverse styleguide. The code should be autoformatted before committing using air. The configuration for the latter is in air.toml.

Dependency Management

The dependencies used in the package are declared in DESCRIPTION, see the R Packages chapter. Dependencies should be kept minimal.

Core dependencies (Imports):

R6 - OOP for database classes
data.table - Efficient data manipulation
DBI, duckdb - Database operations
terra - Spatial data handling
glue - String interpolation
qs2 - Fast object serialization
Rcpp - C++ integration
stringi, curl - Utilities

Optional dependencies (Suggests):

ranger - Random forest models
pROC - ROC/AUC calculations
processx - External process management (Dinamica)
tinytest - Testing framework
quarto - Vignette building

The roxygen @importFrom should be used for functions that are called so often that the package::function() syntax becomes cumbersome.

Testing

This package uses tinytest for testing:

# Test the full package (build, install, test)
R -e "tinytest::build_test_install()"

# Test individual files during development
R -e "pkgload::load_all(); tinytest::run_test_file('inst/tinytest/test_coords_t.R')"

Non-exported functions are tested using the evoland:::private_function syntax.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R Package Structure

R Package Structure

Classes of R Package

Database Class `parquet_db`

Database Class `evoland_db`

Initialization

Active Bindings for Tables

Active Bindings for Views

Setter Methods

Adder Methods

Query Methods

Analysis Methods

Allocation Methods

Context Management

Table Classes

Workflow Phases

Phase 0: Setup

Phase 1: Data Ingestion

LULC Data

Predictor Data

Neighbor Relationships

Phase 2: Calibration

Transition Metadata

Feature Selection

Model Training

Transition Rates

Allocation Parameters

Phase 3: Prediction and Allocation

Very Short Intro to R Package Development

File Structure

R Source Organization

Coding Style

Dependency Management

Testing

Uh oh!

Uh oh!

Clone this wiki locally

R Package Structure

R Package Structure

Classes of R Package

Database Class parquet_db

Database Class evoland_db

Initialization

Active Bindings for Tables

Active Bindings for Views

Setter Methods

Adder Methods

Query Methods

Analysis Methods

Allocation Methods

Context Management

Table Classes

Workflow Phases

Phase 0: Setup

Phase 1: Data Ingestion

LULC Data

Predictor Data

Neighbor Relationships

Phase 2: Calibration

Transition Metadata

Feature Selection

Model Training

Transition Rates

Allocation Parameters

Phase 3: Prediction and Allocation

Very Short Intro to R Package Development

File Structure

R Source Organization

Coding Style

Dependency Management

Testing

Uh oh!

Uh oh!

Clone this wiki locally

Database Class `parquet_db`

Database Class `evoland_db`