Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,73 @@ Version 4.3.2

**ADDITIONS**

- New function `scqubits.recommend_parallelization(...)`: a
workload-aware heuristic that picks `num_cpus` and a per-worker
BLAS-thread cap from the Hilbert-space dimension, grid size,
eigenvalue count, and sparse-vs-dense regime. It applies the
choice live (no kernel restart) and starts no worker processes,
so it is safe to call from Jupyter and from plain scripts.
Sweep/spectrum methods accept `num_cpus="auto"` to tune
themselves before running, and `scqubits.settings.AUTO_PARALLEL`
(default `False`) makes unspecified `num_cpus` do the same. See
the :ref:`settings guide <guide-settings>`.

- New function `scqubits.calibrate_parallelization()`: a one-time
measurement that times a short battery of sweeps in isolated
subprocesses and records this machine's per-task overhead,
pool-startup cost, and per-point diagonalization cost to
`~/.scqubits/parallel_calibration.json` (override with
`scqubits.settings.PARALLEL_CALIBRATION_PATH`). When present, the
recommendation uses this measured break-even instead of the
built-in defaults.

- Parallel sweeps now use the ``spawn`` process start method on
macOS (and Windows), and ``fork`` on Linux. Fork is unsafe on
macOS -- Apple's Accelerate/GCD and the Objective-C runtime are
not fork-safe, so forking a worker pool after the numerics have
started threads can crash or hang (CPython itself defaults macOS
to ``spawn`` since 3.8; this affects both Intel and Apple
Silicon). With ``spawn``, a plain script that uses ``num_cpus >
1`` must guard its entry point with ``if __name__ ==
"__main__":`` (Jupyter/IPython are unaffected; a one-time
reminder is emitted otherwise). The worker pool is cached and
reused, so the one-time ``spawn`` startup cost is paid once per
session, not per sweep.

- New setting `scqubits.settings.MULTIPROC_BLAS_THREADS`
(`"auto"`, a positive int, or `None`; default `"auto"`): caps the
number of BLAS/OpenMP threads per worker process during parallel
sweeps (`NUM_CPUS` > 1) to avoid core oversubscription. The
default `"auto"` caps each worker to `cores // num_cpus`, so
parallel sweeps no longer oversubscribe the cores out of the box;
a positive int sets a fixed cap, and `None` leaves threading
untouched. The cap is applied only while the worker pool is
created and the parent environment is restored afterwards
(serial work is unaffected). It reaches spawn-based workers
(macOS, Windows) via the thread-count environment variables; for
fork-based workers (Linux) it uses `threadpoolctl` (now a
scqubits dependency). A one-time warning is emitted when the cap
cannot take effect. See the :ref:`settings guide <guide-settings>`.
- `ParameterSweep` now reuses a single worker pool across the
per-subsystem and dressed sweeps within one run (cached in
`scqubits.settings.POOL` and shut down automatically at
interpreter exit), instead of starting a fresh pool for each,
and ships only the per-grid-point bare eigensystem to each
worker, reducing inter-process serialization on large sweeps.

- Automatic sparse diagonalization: when `esys_method` /
`evals_method` are left at their default (`None`), `scqubits`
now uses sparse `scipy` `eigsh` instead of dense diagonalization
for a large Hamiltonian of which only a small fraction of the
spectrum is requested -- the dressed-spectrum regime of composite
`HilbertSpace` objects -- where it is dramatically faster and
avoids forming the full dense matrix (which may not even fit in
memory). Controlled by `scqubits.settings.AUTO_SPARSE_DIAG`
(default `True`; thresholds `SPARSE_DIAG_MIN_DIM` and
`SPARSE_DIAG_MAX_EVALS_FRAC`); it falls back to the dense solver
if the sparse solver raises or its result fails a residual check.
Set `AUTO_SPARSE_DIAG = False` to always use the dense path.

- Named constructors for `Circuit` from a YAML description:
`Circuit.from_yaml_file(path, ...)` (path on disk) and
`Circuit.from_yaml_string(yaml_text, ...)` (inline YAML). These
Expand Down
23 changes: 23 additions & 0 deletions docs/source/guide/settings/guide-settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,29 @@ scqubits has a few internal parameters that can be changed by the user:
+------------------------------+------------------------------+-------------------------------------------------------------------+
| ``NUM_CPUS`` | int | Number of cores to be used in parallelization (default: 1) |
+------------------------------+------------------------------+-------------------------------------------------------------------+
| ``MULTIPROC_BLAS_THREADS`` | "auto", int, or None | Cap BLAS/OpenMP threads per worker during parallel sweeps |
| | (default: "auto") | (``NUM_CPUS`` > 1). Default "auto" caps each worker to |
| | | cores // num_cpus so workers never oversubscribe; an int |
| | | sets a fixed cap; None leaves threading untouched. Uses |
| | | ``threadpoolctl`` (a dependency) for fork-based (Linux) |
| | | workers; no effect when numpy BLAS exposes no thread |
| | | control (e.g. Apple Accelerate). |
+------------------------------+------------------------------+-------------------------------------------------------------------+
| ``AUTO_PARALLEL`` | True / False (default: False)| When True, sweeps called without an explicit ``num_cpus`` use the |
| | | parallelization heuristic (``recommend_parallelization``) to pick |
| | | ``num_cpus`` and a BLAS-thread cap. Per-call opt-in is also |
| | | available via ``num_cpus="auto"``. |
+------------------------------+------------------------------+-------------------------------------------------------------------+
| ``PARALLEL_CALIBRATION_PATH``| str or None (default: None) | Location of the one-time machine calibration written by |
| | | ``calibrate_parallelization``. None uses |
| | | ``~/.scqubits/parallel_calibration.json``. |
+------------------------------+------------------------------+-------------------------------------------------------------------+
| ``AUTO_SPARSE_DIAG`` | True / False (default: True) | When True, default diagonalization (esys_method/evals_method = |
| | | None) uses sparse scipy eigsh for large spectra where only a few |
| | | eigenvalues are needed, with automatic dense fallback (thresholds |
| | | SPARSE_DIAG_MIN_DIM, SPARSE_DIAG_MAX_EVALS_FRAC). See the |
| | | diagonalization guide. |
+------------------------------+------------------------------+-------------------------------------------------------------------+
| ``FUZZY_SLICING`` | True / False (default: False)| Whether to enable approximate value-based slicing |
+------------------------------+------------------------------+-------------------------------------------------------------------+
| ``FUZZY_WARNING`` | True / False (default: True) | Whether to warn user about use of approximate values in slicing |
Expand Down
18 changes: 18 additions & 0 deletions docs/source/guide/settings/ipynb/custom_diagonalization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,24 @@
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Automatic sparse diagonalization\n",
"\n",
"When `esys_method` and `evals_method` are left at their default (`None`), `scqubits` does not always use the same dense solver. For a **large** Hamiltonian of which only a **small fraction** of the spectrum is requested — the typical situation for the dressed spectrum of a composite `HilbertSpace` — it automatically switches to sparse `scipy` `eigsh`, which is dramatically faster than dense diagonalization in this regime (and avoids forming the full dense matrix, which may not even fit in memory).\n",
"\n",
"This is controlled by `scqubits.settings.AUTO_SPARSE_DIAG` (default `True`). Sparse diagonalization is selected only when\n",
"\n",
"- the Hilbert-space dimension is at least `settings.SPARSE_DIAG_MIN_DIM` (default `1000`), **and**\n",
"- the number of requested eigenvalues is at most `settings.SPARSE_DIAG_MAX_EVALS_FRAC` times the dimension (default `0.1`).\n",
"\n",
"Otherwise — and whenever an explicit `esys_method`/`evals_method` is set — the behavior is unchanged. If the sparse solver raises, or its result fails a cheap residual check (a safeguard against the rare case where `eigsh` returns an inaccurate subspace without raising), `scqubits` automatically falls back to the dense solver.\n",
"\n",
"To disable automatic sparse diagonalization and always use the dense path, set `scqubits.settings.AUTO_SPARSE_DIAG = False`."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Loading