Skip to content

2.75x faster sampling via higher-order solvers and engineering optimizations#8

Open
jsilter wants to merge 2 commits intoNVIDIA-Digital-Bio:mainfrom
jsilter:JS_efficiency_v01
Open

2.75x faster sampling via higher-order solvers and engineering optimizations#8
jsilter wants to merge 2 commits intoNVIDIA-Digital-Bio:mainfrom
jsilter:JS_efficiency_v01

Conversation

@jsilter
Copy link
Copy Markdown

@jsilter jsilter commented Mar 16, 2026

Summary

  • Tier 1 engineering: SDPA attention and BF16 autocast; free speedups with no quality change
  • Higher-order solvers: Heun (ODE), Stochastic Heun (SDE), and Adaptive Heun (ODE with automatic step sizing), selectable via --solver
  • Evaluation pipeline: fast structural eval (seconds) and full designability eval (ProteinMPNN + ESMFold scRMSD)
  • Benchmark script and results for reproducible comparison

Key result

Stochastic Heun with 100 steps (200 NFEs) vs the Euler-400 baseline:

Euler-400 Stoch-Heun-100
Designability 69.4% 85.1%
Median scRMSD 0.836 A 0.695 A
Outlier rate 19.8% 1.3%
Wall time (150 samples) ~1314s ~478s (2.75x)

Quality improves because Stochastic Heun has strong convergence order 1.0 (vs 0.5 for Euler-Maruyama). The diffusion is additive noise (depends only on t), so no Ito correction is needed and the trapezoidal rule on the drift is valid.

Usage

# Recommended default for unconditional generation
python proteinfoundation/generate.py \
    --config_name inference_ucond_notri_stochastic_heun

# Or override solver on any config
python proteinfoundation/generate.py \
    --config_name inference_ucond_notri \
    --solver stochastic_heun --nsteps 100

# Reproduce the full benchmark
bash scripts/benchmark_solvers.sh 50 "100 200 300"

jsilter and others added 2 commits March 16, 2026 18:20
Engineering: SDPA attention (F.scaled_dot_product_attention),
BF16 autocast for forward passes. These are free-lunch speedups with
no quality impact.

Higher-order solvers for generation sampling:
- Heun (ODE): 2nd-order trapezoidal rule, halves step count at equal quality
- Adaptive Heun (ODE): Heun-Euler embedded error control with automatic
  step sizing; allocates fewer NFEs to easy structures, more to hard ones
- Stochastic Heun (SDE): 2nd-order SDE solver with strong convergence
  order 1.0 (vs 0.5 for Euler-Maruyama); achieves 85% designability
  vs 69% baseline at 2.75x speedup

Stochastic Heun-100 (200 NFEs) is the recommended default for
unconditional generation: better quality than Euler-400 in ~1/3 the time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Evaluation tools:
- eval_trajectory.py: full designability eval (ProteinMPNN + ESMFold scRMSD)
- eval_trajectory_fast.py: fast structural metrics (seconds, no GPU needed)
- backbone_perplexity.py: ProteinMPNN unconditional perplexity metric
- evaluate.py: --input_dir support for standalone eval of generated PDBs

Solver configs for Heun (ODE), Stochastic Heun (SDE), and Adaptive Heun,
all inheriting from the base unconditional config.

Benchmark script (scripts/benchmark_solvers.sh) runs a head-to-head
comparison of Euler-400, Heun-100, and Stochastic-Heun-100 with both
fast and full eval. No infrastructure dependencies; uses generate.py
and eval scripts directly.

Stochastic Heun-100 benchmark results included: 85.1% designability
(vs 69.4% Euler-400), 0.695A median scRMSD, ~2.75x wall-clock speedup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jsilter jsilter marked this pull request as ready for review March 16, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant