2.75x faster sampling via higher-order solvers and engineering optimizations by jsilter · Pull Request #8 · NVIDIA-Digital-Bio/la-proteina

jsilter · 2026-03-16T22:25:02Z

Summary

Tier 1 engineering: SDPA attention and BF16 autocast; free speedups with no quality change
Higher-order solvers: Heun (ODE), Stochastic Heun (SDE), and Adaptive Heun (ODE with automatic step sizing), selectable via --solver
Evaluation pipeline: fast structural eval (seconds) and full designability eval (ProteinMPNN + ESMFold scRMSD)
Benchmark script and results for reproducible comparison

Key result

Stochastic Heun with 100 steps (200 NFEs) vs the Euler-400 baseline:

	Euler-400	Stoch-Heun-100
Designability	69.4%	85.1%
Median scRMSD	0.836 A	0.695 A
Outlier rate	19.8%	1.3%
Wall time (150 samples)	~1314s	~478s (2.75x)

Quality improves because Stochastic Heun has strong convergence order 1.0 (vs 0.5 for Euler-Maruyama). The diffusion is additive noise (depends only on t), so no Ito correction is needed and the trapezoidal rule on the drift is valid.

Usage

# Recommended default for unconditional generation
python proteinfoundation/generate.py \
    --config_name inference_ucond_notri_stochastic_heun

# Or override solver on any config
python proteinfoundation/generate.py \
    --config_name inference_ucond_notri \
    --solver stochastic_heun --nsteps 100

# Reproduce the full benchmark
bash scripts/benchmark_solvers.sh 50 "100 200 300"

Engineering: SDPA attention (F.scaled_dot_product_attention), BF16 autocast for forward passes. These are free-lunch speedups with no quality impact. Higher-order solvers for generation sampling: - Heun (ODE): 2nd-order trapezoidal rule, halves step count at equal quality - Adaptive Heun (ODE): Heun-Euler embedded error control with automatic step sizing; allocates fewer NFEs to easy structures, more to hard ones - Stochastic Heun (SDE): 2nd-order SDE solver with strong convergence order 1.0 (vs 0.5 for Euler-Maruyama); achieves 85% designability vs 69% baseline at 2.75x speedup Stochastic Heun-100 (200 NFEs) is the recommended default for unconditional generation: better quality than Euler-400 in ~1/3 the time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Evaluation tools: - eval_trajectory.py: full designability eval (ProteinMPNN + ESMFold scRMSD) - eval_trajectory_fast.py: fast structural metrics (seconds, no GPU needed) - backbone_perplexity.py: ProteinMPNN unconditional perplexity metric - evaluate.py: --input_dir support for standalone eval of generated PDBs Solver configs for Heun (ODE), Stochastic Heun (SDE), and Adaptive Heun, all inheriting from the base unconditional config. Benchmark script (scripts/benchmark_solvers.sh) runs a head-to-head comparison of Euler-400, Heun-100, and Stochastic-Heun-100 with both fast and full eval. No infrastructure dependencies; uses generate.py and eval scripts directly. Stochastic Heun-100 benchmark results included: 85.1% designability (vs 69.4% Euler-400), 0.695A median scRMSD, ~2.75x wall-clock speedup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jsilter and others added 2 commits March 16, 2026 18:20

jsilter marked this pull request as ready for review March 16, 2026 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.75x faster sampling via higher-order solvers and engineering optimizations#8

2.75x faster sampling via higher-order solvers and engineering optimizations#8
jsilter wants to merge 2 commits intoNVIDIA-Digital-Bio:mainfrom
jsilter:JS_efficiency_v01

jsilter commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jsilter commented Mar 16, 2026

Summary

Key result

Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant