Skip to content

Commit bce885c

Browse files
roblanfclaude
andcommitted
Fix --scfl gap detection for models with rate heterogeneity
The gap detection filter in computeSubtreeAncestralState() used a fixed threshold (sum > 1.0) that broke for models with multiple rate categories (ncat_mix > 1) and for deep subtrees with scaled partial likelihoods. This caused --scfl to incorrectly filter nearly all sites as gaps, producing wrong sCF values for any model using +G, +R, +I+G, +I+R, or mixture models. The fix uses scale_num to distinguish real data from gaps at internal nodes: scaled sites (scale_num > 0) are always real data, while all-gap subtrees are detected by sum > ncat_mix with no scaling. Leaf nodes retain the original threshold. Both normal and SAFE_NUMERIC scale_num layouts are handled. Fixes #137 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 26b610e commit bce885c

1 file changed

Lines changed: 23 additions & 3 deletions

File tree

tree/discordance.cpp

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -421,9 +421,29 @@ void PhyloTree::computeSubtreeAncestralState(PhyloNeighbor *dad_branch, PhyloNod
421421
state_best = i;
422422
}
423423
}
424-
if (params->ancestral_site_concordance == 2 && sum > 1.0) {
425-
// UPDATE for sCLF: ignoring sites where one subtree shows just gap/ambiguity
426-
state_best = aln->STATE_UNKNOWN;
424+
// For sCFL: detect all-gap/ambiguity subtrees and mark them STATE_UNKNOWN.
425+
// Leaf nodes: sum = 1.0 (known) or nstates (gap); no category summation.
426+
// Internal nodes: sum ~ ncat_mix (real data) or nstates*ncat_mix (gap).
427+
// Scaled internal nodes (scale_num > 0) always have real data — gaps never
428+
// trigger scaling because their partial_lh stays at 1.0.
429+
if (params->ancestral_site_concordance == 2) {
430+
bool is_gap;
431+
if (dad_branch->node->isLeaf()) {
432+
is_gap = sum > 1.0;
433+
} else {
434+
// Check if ANY category has been scaled (=> real data, not gap)
435+
bool any_scaled = false;
436+
if (safe_numeric) {
437+
for (size_t c = 0; c < ncat_mix && !any_scaled; c++)
438+
any_scaled = dad_branch->scale_num[ptn * ncat_mix + c] != 0;
439+
} else {
440+
any_scaled = dad_branch->scale_num[ptn] != 0;
441+
}
442+
is_gap = !any_scaled && sum > (double)ncat_mix;
443+
}
444+
if (is_gap) {
445+
state_best = aln->STATE_UNKNOWN;
446+
}
427447
}
428448
sum = 1.0/sum;
429449
for (size_t i = 0; i < nstates; i++) {

0 commit comments

Comments
 (0)