Skip to content

Harden tip-count guards and remove fixed-size RF complement buffers for SL_MAX_TIPS resilience#194

Open
Copilot wants to merge 4 commits intomainfrom
copilot/start-implementation-tree-dist
Open

Harden tip-count guards and remove fixed-size RF complement buffers for SL_MAX_TIPS resilience#194
Copilot wants to merge 4 commits intomainfrom
copilot/start-implementation-tree-dist

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 15, 2026

This change starts implementation of the SL_MAX_TIPS resilience work by tightening guard behavior and removing assumptions that depend on fixed compile-time split dimensions. It aligns RF paths with existing guard semantics and makes failure modes explicit when tip counts exceed supported bounds.

  • Guard semantics made consistent across RF C++ entry points

    • cpp_robinson_foulds_distance() now enforces check_ntip() (previously missing).
    • check_ntip() now throws a runtime R error (Rcpp::stop) with the existing user-facing message instead of relying on assertion behavior.
    • Bound is explicitly computed as min(SL_MAX_TIPS, int16_t::max()), with a static_assert documenting SL_MAX_TIPS/int32 assumptions.
  • Removed fixed-size RF complement stack arrays

    • Replaced:
      • splitbit b_complement[SL_MAX_SPLITS][SL_MAX_BINS]
    • With runtime-sized heap storage in:
      • robinson_foulds_distance()
      • robinson_foulds_info()
    • This avoids hard coupling to static dimensions and prevents oversized stack allocations as limits evolve.
  • Centralized R-side tip-limit check in tree-distance utilities

    • Added internal helper in R/tree_distance_utilities.R:
      • .AssertNtipSupported()
      • .MaxSupportedTips (documented to stay aligned with C++ guard semantics)
    • Utility guard call sites now route through this helper.
  • Focused regression coverage for boundary behavior

    • Added tests in tests/testthat/test-tree_distance_utilities.R for:
      • valid boundaries (1000, 32766, 32767)
      • reject boundary (32768)
      • direct RF C++ entry points
      • wrapper path (.SplitDistanceAllPairs) using the same guard message
void check_ntip(const double n) {
  static_assert(SL_MAX_TIPS <= std::numeric_limits<int32>::max(),
                "SL_MAX_TIPS must fit in int32");
  constexpr int32 max_supported_tips = std::min(
    int32(SL_MAX_TIPS), int32(std::numeric_limits<int16_t>::max())
  );
  if (n > max_supported_tips) {
    Rcpp::stop("This many tips are not (yet) supported.");
  }
}
Original prompt

Start implementation

The user has attached the following file paths as relevant context:

  • .github\copilot-instructions.md
  • AGENTS.md
[Chronological Review: The conversation began with the user requesting a comparison between PR #193 and their own branch, focusing on how both address the issue of making TreeDist resilient to changes in SL_MAX_TIPS. The user detailed the differences in implementation, strengths, and weaknesses of both approaches. Following this, the user requested to start the implementation based on the evaluation provided, indicating a shift from analysis to action.]

[Intent Mapping: The user explicitly requested a comparison of PR #193 against their branch, highlighting specific features and shortcomings. They then requested to "start implementation," indicating readiness to proceed with coding based on the evaluation.]

[Technical Inventory: The conversation involved technical concepts such as:

  • PR Prepare for large-tree support #193: Focused on static array elimination, centralized R guard, C++ guard, user interrupts, and CI/memcheck improvements.
  • User's branch: Comprehensive migration addressing int16 to int32 migration, overflow protection, and extensive testing for large trees.
  • Tools mentioned: Rcpp for user interrupts and error handling, and valgrind for memory checking.]

[Code Archaeology: The user is currently working on tree_distances.cpp, which is central to the implementation of distance calculations in the TreeDist project. The conversation highlighted specific functions and checks that need to be modified or added.]

[Progress Assessment: The user has completed the analysis of both PR #193 and their branch, identifying key areas for improvement. They are now poised to implement changes based on the recommendations provided.]

[Context Validation: All critical information for continuation is captured, including the need to cherry-pick improvements from PR #193 into the user's branch and the specific changes to be made in tree_distances.cpp.]

[Recent Commands Analysis: The last agent command was the user requesting to "start implementation." There were no tool results from this command as it was a directive to begin coding. The immediate pre-summarization state involved the user preparing to implement the changes discussed in the evaluation of PR #193.]

1. Conversation Overview: - Primary Objectives: The user requested a comparison of PR #193 against their branch, stating, "Both address the same core problem — making TreeDist resilient to SL_MAX_TIPS being raised in TreeTools." They then requested to "start implementation" based on the evaluation. - Session Context: The conversation transitioned from a detailed analysis of the two approaches to a directive to begin implementing the recommended changes. - User Intent Evolution: The user moved from analysis and comparison to readiness for implementation, indicating a clear plan to enhance their branch.
  1. Technical Foundation:
  • PR Prepare for large-tree support #193: Focused on static array elimination, centralized R guard, and user interrupts.
  • User's Branch: Comprehensive migration to support larger trees, including int32 migration and overflow protection.
  1. Codebase Status:
  • File Name: tree_distances.cpp
  • Purpose: Central to distance calculations in TreeDist.
  • Current State: The user is preparing to implement changes based on the evaluation of PR Prepare for large-tree support #193.
  • Key Code Segments: Functions related to distance calculations and checks for SL_MAX_TIPS.
  • Dependencies: Changes will impact how distances are calculated and how the system handles larger trees.
  1. Problem Resolution:
  1. Progress Tracking:
  • Completed Tasks: Analysis of PR Prepare for large-tree support #193 and identification of areas for improvement.
  • Partially Complete Work: Implementation of changes in tree_distances.cpp is pending.
  • Validated Outcomes: The user has a clear plan for the next steps.
  1. Active Work State:
  • Current Focus: The user is ready to implement changes in tree_distances.cpp.
  • Recent Context: The user has just requested to start implementation after a detailed evaluation of PR Prepare for large-tree support #193.
  • Working Code: The user is focused on modifying tree_distances.cpp based on the evaluation.
  • Immediate Context: The user is set to begin coding the recommended changes.
  1. Recent Operations:
  • Last Agent Commands: The user requested to "start implementation."
  • Tool Results Summary: No tool results were generated as this was a directive to begin coding.
  • Pre-Summary State: The user was preparing to implement changes in tree_distances.cpp based on the evaluation of PR Prepare for large-tree support #193.
  • Operation Context: The user is focused on enhancing their branch by integrating the best aspects of PR Prepare for large-tree support #193.
  1. Continuation Plan:
  • Pending Task: Implement changes in tree_distances.cpp ...

Created from VS Code.

@ms609 ms609 marked this pull request as ready for review April 15, 2026 18:43
Copilot AI changed the title [WIP] Start implementation for TreeDist resiliency to SL_MAX_TIPS Harden tip-count guards and remove fixed-size RF complement buffers for SL_MAX_TIPS resilience Apr 15, 2026
Copilot AI requested a review from ms609 April 15, 2026 18:44
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.84%. Comparing base (65d9a37) to head (dd78ff5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #194      +/-   ##
==========================================
+ Coverage   95.78%   95.84%   +0.05%     
==========================================
  Files          57       57              
  Lines        5508     5514       +6     
==========================================
+ Hits         5276     5285       +9     
+ Misses        232      229       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

Performance benchmark results

Call Status Change Time (ms)
ClusteringInfoDistance(tr200) ⚪ NSD 1.13% 15.3 →
15.1, 15.3
ClusteringInfoDistance(tr50) ⚪ NSD 0.99% 11.6 →
11.4, 11.5
LAPJV(test2000) ⚪ NSD 0.9% 97.1 →
96.5, 95.4
LAPJV(test40) ⚪ NSD -0.11% 0.0176 →
0.0177, 0.0175
LAPJV(test400) ⚪ NSD 0.01% 3.19 →
3.2, 3.19
MutualClusteringInfo(tr200) ⚪ NSD 1.86% 22.1 →
22, 21.3
MutualClusteringInfo(tr50) ⚪ NSD 1.14% 23.9 →
23.9, 23.5
PathDist(postTrees) ⚪ NSD 1.44% 3.64 →
3.57, 3.6
PhylogeneticInfoDistance(tr200) ⚪ NSD 0.23% 235 →
234, 234
PhylogeneticInfoDistance(tr50) ⚪ NSD 0.92% 81.9 →
81.5, 81
RobinsonFoulds(tr200) ⚪ NSD 1.12% 2.91 →
2.88, 2.88
RobinsonFoulds(tr200) ⚪ NSD 1.1% 2.68 →
2.65, 2.64
RobinsonFoulds(tr50) ⚪ NSD 0.22% 4.45 →
4.46, 4.43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants