Open
Conversation
…n daily checks on macos-latest
P3: list_cansim_cached_tables single-pass optimization - Consolidate three separate lapply calls into a single iteration - Collects timeCached, rawSize, and title in one pass per cached table - Avoids repeated dir() and file read operations - Use vapply for type-safe extraction from collected metadata - Expected improvement: ~65-85% for list_cansim_cached_tables() P10: Avoid unnecessary tibble conversion - Check tibble::is_tibble() before calling as_tibble() - Skips conversion when data is already a tibble - Expected improvement: ~5-15% for normalize_cansim_values() Note: P6 (field cache utilization) and P7 (csv2sqlite transform copies) were evaluated but not implemented: - P6: Could not identify specific field cache location in current code - P7: Conditional piping would harm readability for minor gains Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Performance optimizations for caching and I/O operations.
Changes
P3: list_cansim_cached_tables single-pass optimization
lapply()calls into a single iteration over cached table pathstimeCached,rawSize, andtitlein one pass per cached tabledir()calls and file read operations that were being done 3x per pathvapply()for type-safe extraction from collected metadataP10: Avoid unnecessary tibble conversion
tibble::is_tibble()before callingas_tibble()Files Modified
R/cansim.R: tibble conversion checkR/cansim_parquet.R: single-pass cache metadata collectionBenchmark Notes
P3: list_cansim_cached_tables - Not directly benchmarked
Reason: Requires a populated cache directory with multiple cached tables to meaningfully benchmark.
Expected improvement: The optimization reduces file I/O operations from 3N to N (where N is number of cached tables):
To benchmark manually:
P10: tibble conversion check - Not directly benchmarked
Reason: Negligible impact - this is primarily a code quality improvement.
Analysis: The
is_tibble()check is O(1) and very fast. Most data flowing throughnormalize_cansim_values()is already a tibble from prior processing, so theas_tibble()call was largely unnecessary. The improvement is in avoiding the conversion overhead when data is already the correct type.Deferred Optimizations
P6 (field cache utilization): Evaluated but not implemented
P7 (csv2sqlite transform copies): Evaluated but not implemented
{if (...) ... else .}would harm code readabilitySummary
Test Plan
devtools::check()passes (0 errors, 0 warnings)🤖 Generated with Claude Code