Skip to content

Document parallelization tuning and automatic sparse diagonalization#36

Open
kochjens wants to merge 10 commits into
mainfrom
docs/multiprocessing-tuning
Open

Document parallelization tuning and automatic sparse diagonalization#36
kochjens wants to merge 10 commits into
mainfrom
docs/multiprocessing-tuning

Conversation

@kochjens

@kochjens kochjens commented Jun 1, 2026

Copy link
Copy Markdown
Member

Documents the parallelization and diagonalization changes shipping in this release.

What changes

  • Parallel Processing guide: when num_cpus > 1 helps (the grid break-even), recommend_parallelization / num_cpus="auto" / AUTO_PARALLEL, calibrate_parallelization, the MULTIPROC_BLAS_THREADS cap, worker-pool reuse, the platform-determined fork/spawn start method, and the __main__ guard.
  • Diagonalization guide: new "Automatic sparse diagonalization" section (default behavior, the dim/evals gating, dense fallback, how to disable).
  • Settings table: MULTIPROC_BLAS_THREADS, AUTO_PARALLEL, PARALLEL_CALIBRATION_PATH, AUTO_SPARSE_DIAG.
  • Changelog entry (4.3.2).

Pairs with the corresponding scqubits code PRs (multiprocessing core, automatic sparse diagonalization, parallelization heuristic + calibration) and the demo_multiprocessing examples notebook.

- Add MULTIPROC_BLAS_THREADS to the user-accessible settings table.
- Extend the Parallel Processing guide with a "Limiting BLAS threads per
  worker" section (programmatic cap, threadpoolctl requirement for fork-based
  workers, no-op cases incl. Apple Accelerate, scipy-OpenBLAS caveat) and a
  "Worker-pool reuse" section.
- Add a changelog entry for MULTIPROC_BLAS_THREADS, pool reuse, and the
  per-grid-point bare-eigensystem shipping in ParameterSweep.
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

kochjens added 5 commits June 1, 2026 17:44
Mirror the expectation-setting from the rewritten demo_multiprocessing notebook:
parallelization helps only when grid-size x per-point cost exceeds the per-task
overhead, so small/cheap sweeps see no speedup (or a slowdown) and that is normal;
note the two usual causes of confusing num_cpus comparisons (below break-even, or
BLAS not capped), and that sparse diagonalization is often a bigger lever.
…TART_METHOD

- Add a "Process start method (fork vs spawn)" section to the Parallel Processing
  guide: the platform defaults (fork on Linux, spawn on macOS/Windows), why fork is
  unsafe on macOS (both Intel and Apple Silicon), the __main__-guard requirement for
  plain scripts under spawn, the once-per-session (not per-sweep) cost thanks to pool
  reuse, the override, and the orphaned-worker note.
- Add MULTIPROC_START_METHOD to the user-accessible settings table.
- Add a 4.3.2 changelog entry.
…-determined

The start method is no longer a public setting (fork on Linux, spawn on
macOS/Windows is the only safe choice per platform). Remove the settings
row, rewrite the parallel-guide section to present it as platform-
determined rather than configurable, delete the override code cell, and
fix the stale "macOS forks" bullet in the BLAS-cap section.
…elization

Rewrite the parallel guide around the shipped tuning API: a new "Letting scqubits
choose the settings" section covering recommend_parallelization, the num_cpus="auto"
sentinel, settings.AUTO_PARALLEL, and the one-time calibrate_parallelization(); drop
the dead tools/autotune_multiprocessing.py reference. Add AUTO_PARALLEL and
PARALLEL_CALIBRATION_PATH to the settings table, and changelog entries.
The three multiprocessing/diagonalization PRs release together, so the docs cover
the sparse-diag default alongside the parallelization work already on this branch.

- custom_diagonalization notebook: new "Automatic sparse diagonalization" section
  explaining the default-on behavior (esys/evals_method = None -> sparse eigsh for
  large spectra with few eigenvalues), the dim/evals gating, the dense fallback on
  raise or residual-check failure, and how to disable.
- settings guide: AUTO_SPARSE_DIAG row added to the settings table.
- changelog (4.3.2): automatic sparse diagonalization entry.
@kochjens kochjens changed the title docs(settings): document MULTIPROC_BLAS_THREADS and worker-pool reuse Document parallelization tuning and automatic sparse diagonalization Jun 4, 2026
kochjens added 4 commits June 5, 2026 23:18
MULTIPROC_BLAS_THREADS now defaults to "auto" (cap each worker to
cores // num_cpus), so parallel sweeps no longer oversubscribe the cores out
of the box. Update the docs to match:

- parallel.ipynb no longer opens with the manual "export thread-count env
  vars before importing numpy" workaround; it states that capping is automatic
  and documents the "auto"/int/None values. The break-even diagnostic no longer
  lists oversubscription as a default failure mode.
- guide-settings table and changelog: default "auto", and threadpoolctl is now
  a runtime dependency rather than optional.
… model, calibration guidance

- New top section 'Updating scqubits -- what changes for you?': reassures that
  existing code is unchanged, explains the automatic BLAS-thread cap, and gives a
  do-this table + checklist for getting more speed (auto / calibrate / AUTO_PARALLEL).
- Rewrote 'Letting scqubits choose the settings' with a 30-second mental model
  (the switch num_cpus vs the map = calibration data), an explicit
  num_cpus='auto' vs AUTO_PARALLEL precedence block, and expanded calibration
  guidance: re-running overwrites the file, and calibrate on an idle, plugged-in
  machine (battery/CPU-throttling skews the measurements).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant