Skip to content

Avoid memory-universe copies during conformational topology discovery#371

Merged
harryswift01 merged 7 commits into
mainfrom
365-add-dask-execution-for-conformationdag
Jun 24, 2026
Merged

Avoid memory-universe copies during conformational topology discovery#371
harryswift01 merged 7 commits into
mainfrom
365-add-dask-execution-for-conformationdag

Conversation

@harryswift01

Copy link
Copy Markdown
Member

Summary

This PR improves conformational dihedral topology discovery by avoiding repeated construction of standalone in-memory MDAnalysis universes during static topology setup.

This optimisation builds directly on the previous conformational refactor PRs. Those changes separated conformational analysis from the static LevelDAG stage, introduced a dedicated ConformationDAG, and split the conformational workflow into clearer phases for topology discovery, angle collection, peak construction, state assignment, and reduction. That architecture made it possible to profile the conformational path more precisely and identify the true bottleneck.

While investigating Dask-backed conformational parallelisation, profiling showed that Dask introduced additional runtime overhead for the current conformational workload. Further investigation showed that the main bottleneck was not the serial conformational algorithm itself, but repeated MDAnalysis memory-universe creation during molecule and residue topology discovery.

Instead of adding Dask to conformational analysis, this PR applies a smaller targeted optimisation: topology discovery now uses lightweight AtomGroup selections where possible, while preserving the existing serial conformational workflow and regression behaviour.

Changes

Lightweight topology fragment extraction:

  • Added extract_fragment_atomgroup(...) to UniverseOperations.
  • The new helper mirrors the atom-index range used by extract_fragment(...) but returns an AtomGroup instead of building a standalone in-memory universe.
  • This avoids copying trajectory coordinates and forces during static conformational topology discovery.

Avoid memory-universe copies in dihedral topology discovery:

  • Updated conformational dihedral topology discovery to use lightweight fragment extraction when available.
  • Replaced heavy residue-selection calls through UniverseOperations.select_atoms(...) with lightweight AtomGroup selection.
  • Preserved the existing MDAnalysis AtomGroup/list-based topology contract used by the current conformational state builder.

Preserve residue and united-atom topology behaviour:

  • Kept united-atom dihedral discovery restricted to valid four-atom dihedrals contained within the selected heavy-residue AtomGroup.
  • Updated residue-level dihedral construction to avoid global resindex selection strings when operating on lightweight molecule AtomGroups.
  • Added handling to skip invalid residue-level dihedral candidates instead of passing malformed AtomGroups into MDAnalysis.analysis.dihedrals.Dihedral.

Update unit test coverage:

  • Updated topology unit tests to cover lightweight fragment extraction.
  • Added coverage for fallback behaviour when lightweight extraction is unavailable.
  • Added coverage for heavy-residue selection without calling UniverseOperations.select_atoms(...).
  • Added coverage for united-atom dihedral filtering, residue-level dihedral construction, invalid residue windows, and bonded-atom helper behaviour.

Impact

  • Reduces conformational topology discovery runtime significantly by avoiding repeated trajectory copying.
  • Keeps conformational analysis serial; Dask is intentionally not added to this path because profiling showed it would add overhead for the current workload.
  • Confirms that the main conformational bottleneck was MDAnalysis memory-universe creation, not the lack of Dask parallelism.
  • Builds on the previous conformational architecture PRs by using the clearer stage and phase boundaries to target the actual bottleneck.
  • Preserves the existing conformational state-building architecture and MDAnalysis Dihedral.run(...) workflow.

@harryswift01 harryswift01 added this to the 2.3.0 milestone Jun 23, 2026
@harryswift01 harryswift01 requested a review from jimboid June 23, 2026 15:54
@harryswift01 harryswift01 self-assigned this Jun 23, 2026
@harryswift01 harryswift01 added the feature request New feature or request label Jun 23, 2026
@harryswift01 harryswift01 linked an issue Jun 23, 2026 that may be closed by this pull request

@jimboid jimboid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to approved this PR. We discussed this both at the catchup on Monday and the all-hands meeting on Tuesday this week.

…om-axes

Cache customised united-atom axes topology for frame covariance
@harryswift01 harryswift01 merged commit e075ae1 into main Jun 24, 2026
41 of 45 checks passed
@harryswift01 harryswift01 deleted the 365-add-dask-execution-for-conformationdag branch June 24, 2026 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Dask Execution for ConformationDAG

2 participants