Skip to content

Apply mAIC to PartitionFinder#127

Open
HuaiyanRen wants to merge 49 commits intomasterfrom
huaiyan
Open

Apply mAIC to PartitionFinder#127
HuaiyanRen wants to merge 49 commits intomasterfrom
huaiyan

Conversation

@HuaiyanRen
Copy link
Copy Markdown
Collaborator

No description provided.

thomaskf and others added 30 commits July 2, 2025 18:24
…. 2. greedy algorithm added. 3. calculation works for keep_empty_seq. 4 pairs generated in previous iteration are reconsidered for merging. 5. multi-thread works properly. 6. avoid unnecessary warning.

Q-mixture model now can be estimated by iteratively adding components from one
@HuaiyanRen HuaiyanRen requested a review from bqminh January 30, 2026 05:22
@bqminh
Copy link
Copy Markdown
Member

bqminh commented Mar 5, 2026

Claude Code:

PR #127 — "Apply mAIC to PartitionFinder"

Author: HuaiyanRen | Branch: huaiyan → master | 36 commits | ~178 additions / ~1,108 deletions across 26 files

What It Does

The stated goal is to integrate marginal AIC (mAIC) into the PartitionFinder merging algorithm — specifically:

  1. Remove the !params.contain_nonrev guard so mAIC is computed even when non-reversible models are present
  2. Remove the outError() that blocked the "1-taxon intersection + non-reversible model" case in partitionmodel.cpp
  3. Refactor checkAbsentStates() from void to int to aggregate absent-state counts across partitions

In practice, the PR is a snapshot of a development branch that diverged from master ~8 months ago and has not been rebased. The majority of the diff is unintended reversion of features and bug fixes added to master in the interim.

Critical Issues (Block Merge)

1. Compilation error — ASSSERT typo
In main/phylotesting.cpp:
ASSSERT(check); // three S's — undefined macro, build fails
ASSERT is the correct macro. This is a build-breaking error.

2. Memory leak in checkAbsentStates()
double *state_freq = new double[num_states]; // allocated
if (seq_type == SEQ_POMO)
return 0; // leaked — never freed
The early PoMo return exits before delete[] state_freq.

3. Mathematically incorrect mAIC for non-reversible models
The PR removes the !params.contain_nonrev guard and also removes the outError() that protected against "1-taxon intersection + non-reversible model". The fallthrough now silently uses equilibrium state frequencies (log_state_freq[char_id]) as the marginal approximation — which is only valid for reversible models. For non-reversible models, root placement matters and the approximation produces biased marginal likelihoods with no warning.

4. Massive unintended regressions (~1,000 lines of production code removed)

The PR inadvertently drops everything added to master while huaiyan was diverged:

  • MrBayes export: entire --mrbayes / printMrBayesBlockFile() feature + all printMrBayesModelText() implementations across 5 model classes
  • AliSim: simulate_alignment() C API, --pop-size, --skip-bl-check, seed enforcement, branch-length safety check, double-free fix for first_insertion
  • FunDi: iterative convergence loop with fallback replaced by a single optimization call
  • Weighted NNI perturbation: --weighted-perturbation, branchLengthSafe(), computeRankOverE(), sampleIndexByWeights()
  • MixtureFinder: protein/multistate support (--force-aa-mix-finder), per-data-type initial frequency set selection (initFreqSet() removed, hardcoded "FO" used instead)
  • libiqtree C API: other_options, blfix, simulate_alignment(), iqtree_free() all removed (breaking API change)
  • Bug fixes: split.cpp fix for ntaxa % UINT_BITS == 0; alignment parsimony auto-switch for large datasets

Moderate Issues

5. checkAbsentStates() semantics changed
The severity logic is altered: count >= num_states-1 (all-but-one absent) now produces the same fatal error as all-gaps. Warning messages downgraded from outWarning() to cout <<, losing standard warning machinery. The BRLEN_FIX bypass has no documented rationale.

6. MixtureFinder initFreqSet() removal
Replacing per-seq_type frequency selection with hardcoded "FO" silently produces wrong starting frequency sets for codon data (should be F1X4,F3X4) and morphological data (should be FQ).

7. Spelling errors introduced

  • "not asigned by user" (missing 's') in phyloanalysis.cpp
  • "defaut: 5" (missing 'l') in tools.cpp help text

8. CI weakened
CI restricted to master pushes only; GCC 9 and GCC 12 test matrix entries removed.

What Is Valuable

The core mAIC contribution (return-count refactor of checkAbsentStates, per-partition aggregation, enabling mAIC for non-reversible models) is a legitimate and useful addition — it is just buried under a large unintended reversion.

Verdict

Not ready to merge. The branch needs to be rebased onto current master and submitted as a targeted PR containing only the mAIC-specific changes, retaining all features and bug fixes that master accumulated in the interim. At minimum, the ASSSERT typo and PoMo memory leak must be fixed, and the non-reversible mAIC path needs either a correct mathematical implementation or a restored guard with a clear roadmap comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants