This document describes the changes and improvements in BATD v1.5.0, with focus on the new unified extraction workflow and configuration system.
The new BATD_extract() function provides a simplified interface with automatic format detection:
# Automatic format detection (recommended)
data <- BATD_extract(
list_of_filenames = my_files,
site = "UCLA"
)
# Explicit format specification (optional)
data <- BATD_extract(
list_of_filenames = my_files,
site = "UCLA",
format = "NF" # or "OF" for older format
)
# Mixed-format batch processing (automatic routing)
data <- BATD_extract(
list_of_filenames = c(nf_files, of_files),
site = "UCLA"
)The system now automatically detects whether input files are in:
- NF (Newer Format): Single text file with colon-delimited data (date:, protocol:, gender:, birthYear:, etc.)
- OF (Older Format): Nested folder structure with tab-delimited files (Subject_Number, Gender, Handedness, Birthdate, etc.)
Detection uses confidence scoring (0-1 scale) and provides detailed reasoning:
format_result <- detect_file_format("path/to/file.txt")
# Returns: list(format = "NF", confidence = 0.95, reasoning = "Found 5 NF markers")Site-specific protocol information is now centralized in YAML format:
sites:
UCLA:
name: University of California, Los Angeles
location: Los Angeles, CA
protocols:
- id: 100
name: Static Detection Threshold
parameters:
num_test_trials: 20
- id: 900
name: Simultaneous Amplitude Discrimination
parameters:
num_test_trials: 30Global analysis parameters are configured here:
analysis:
session_gap_threshold: 24 # hours
min_trials_threshold: 5
extraction:
min_confidence_threshold: 0.7
default_site: "NA"
plot:
theme: "classic"
dpi: 300New functions for accessing configuration:
# Load site protocols
kki_protocols <- get_site_protocols("KKI")
# Get specific protocol name
protocol_name <- get_protocol_name(protocol_id = 100, site = "UCLA")
# Access analysis defaults
analysis_config <- get_defaults("analysis")
# Load entire configuration
all_defaults <- get_defaults()All existing function calls continue to work without changes:
# v1.4.0 code still works in v1.5.0
data_nf <- BATD_extract_NF(my_nf_files, site = "UCLA")
data_of <- BATD_extract_OF(my_of_files, site = "UCLA")
# These internally now use the refactored implementation
# with shared helpers and improved code organizationR/
├── BATD_extract_NF.R # Public API wrapper (backward compat)
├── BATD_extract_OF.R # Public API wrapper (backward compat)
├── BATD_analyze.R # Existing analysis functions
├── BATD_plot.R # Existing plotting functions
├── extract/
│ ├── extract.R # New unified wrapper function
│ ├── extract_nf.R # Refactored NF extraction (388 lines, -35%)
│ ├── extract_of.R # Refactored OF extraction (254 lines, -30%)
│ └── helpers/
│ ├── common_extraction.R # Shared helper functions
│ └── format_detection.R # Format detection logic
├── config/
│ └── load_config.R # Configuration loading functions
└── utils/
└── operators.R # Utility operators (%ni%)
config/
├── site_protocols.yaml # Site-specific protocol definitions
└── defaults.yaml # Global analysis defaults
- BATD_extract_NF: 595 lines → 388 lines (-35%, 207 lines eliminated)
- BATD_extract_OF: 361 lines → 254 lines (-30%, 107 lines eliminated)
- Total duplication eliminated: 314 lines
The following helper functions now replace duplicated code:
- normalize_demographics() - Converts demographic formats (M→Male, F→Female, etc.)
- account_for_repeated_runs() - Identifies and numbers repeated protocol runs
- standardize_column_types() - Ensures consistent column data types
- extract_participant_metadata() - Format-aware demographic extraction
- validate_extracted_data() - Quality validation for extracted data
No migration needed! All existing code continues to work. To use new features:
# Instead of specifying format:
# Old approach
data_nf <- BATD_extract_NF(files, "UCLA")
data_of <- BATD_extract_OF(files, "UCLA")
# New approach (auto-detection)
data <- BATD_extract(files, "UCLA") # Auto-detects formatThe new modular structure makes development easier:
# Access shared helpers in custom code
source(system.file("R/extract/helpers/common_extraction.R", package = "BATD"))
# Normalize demographics in your function
my_data <- normalize_demographics(my_data)
# Validate extracted data
validate_extracted_data(my_data)Edit config/site_protocols.yaml:
sites:
NEW_SITE:
name: Your Institution Name
location: City, State
contact: contact@institution.edu
protocols:
- id: 100
name: Static Detection Threshold
description: Optional description
parameters:
num_test_trials: 20Edit config/defaults.yaml:
analysis:
session_gap_threshold: 48 # Changed from 24 hours
min_trials_threshold: 8 # Changed from 5v1.5.0 adds the following dependencies:
- yaml (NEW) - For YAML configuration parsing
- dplyr, stringr - Data manipulation
- ggplot2, ggpubr - Visualization
- data.table, plyr - Data processing
Development dependency:
- testthat (NEW) - For comprehensive test suite (37 tests)
Run the new test suite to verify functionality:
# Install test dependencies if needed
devtools::install(dependencies = TRUE)
# Run all tests
devtools::test()
# Run specific test file
devtools::test(filter = "config")
# Check coverage
covr::code_coverage("BATD")Test categories:
- Operators (6 tests)
- Demographics normalization (4 tests)
- Format detection (6 tests)
- Common extraction helpers (8 tests)
- Unified wrapper (7 tests)
- Backward compatibility (8 tests)
- Configuration loading (18 tests)
Total: 57 tests
Refactoring improvements:
- Code maintainability: 314 fewer duplicated lines (-25% average)
- Development velocity: Shared helpers reduce new-feature development time
- Quality: Format detection prevents user errors from format misspecification
- Backward compatibility: 100% - no breaking changes
For issues or feature requests:
- Email: jasonhe93@gmail.com
- Report issues with specific format problems to enable better format detection
- Share site-specific protocol information to expand configuration coverage
- v1.5.0 (Jan 2024) - Major refactoring with modular architecture, YAML configuration, auto-detection, 57 tests
- v1.4.0 (Previous) - Original monolithic implementation