Skip to content

feat(modules): Swin-style PatchMerging with register-row support#122

Open
Dafidofff wants to merge 75 commits into
mainfrom
feat/patch-merging
Open

feat(modules): Swin-style PatchMerging with register-row support#122
Dafidofff wants to merge 75 commits into
mainfrom
feat/patch-merging

Conversation

@Dafidofff

@Dafidofff Dafidofff commented May 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds PatchMerging, ViT5HierarchicalClassificationNet, and the full v6_hierarchical example suite to the repo, along with docs-tracker updates.


Changes

nvsubquadratic/modules/patch_merging.py — NEW

Swin-style 2×2 spatial patch merging for hierarchical ViT-5/Hyena networks.

  • Pure-spatial layout [B, H·W, C][B, (H/2)·(W/2), out_dim]
  • Register-row layout — first grid_w tokens are a register row; register tokens pass through a dedicated reg_proj Linear (independent of the patch path) and are repacked to width grid_w // 2; padding slots come from a non-persistent zero buffer
  • Post-concat norm (configurable via LazyConfig) + bias-free reduction Linear, trunc_normal_(std=0.02) init
  • flop_count() helper

tests/modules/test_patch_merging.py — NEW

7 tests: output shapes (parametrised), zero-pad correctness, independent reg/patch paths, error guards (odd grid, over-large num_registers), FLOP count sanity.

nvsubquadratic/networks/vit5_hierarchical_classification.py — NEW

Swin-style 4-stage hierarchical ViT-5 classifier built on PatchMerging.

  • Two layouts: pure (GAP over flat patch grid) and register_row (FiLM registers as first grid row at every stage, excluded from GAP)
  • Per-stage dims / depths / block configs fully specified via LazyConfig
  • flop_count() aggregates patch-embed + blocks + merges + head

tests/networks/test_vit5_hierarchical_classification.py — NEW

6 tests: forward shapes, param-count comparison, FLOP count, error guards, backward pass (spot-checks patch_embed and reg_proj grads).

examples/vit5_imagenet/v6_hierarchical/ — NEW (14 files)

  • _base_config.py — shared Swin-T-like Hyena hierarchy config (p=4, dims [96,192,384,768], depths [2,2,6,2], 800-epoch ImageNet recipe)
  • hyena_hier_p4_pure.py / hyena_hier_p4_film.py — pure and register-row FiLM ImageNet configs
  • cifar10_hyena_hier.py / cifar10_hyena_flat.py — CIFAR-10 Hyena hier/flat configs
  • _cifar10_patch_ablation_base.py + 6 leaf configs — patch-size (p4/p8/p16) × hier/flat ablation grid for CIFAR-10

docs-tracker.md

Mark patch_merging.py and vit5_hierarchical_classification.py as [x].


Commits

  1. feat(modules)PatchMerging + test
  2. feat(networks+examples)ViT5HierarchicalClassificationNet, v6 example configs, tracker update

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 25, 2026 13:03

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Swin-style 2×2 spatial downsampling block (PatchMerging) to support hierarchical ViT/Hyena-style models, including an optional “register-row” token layout, and introduces a dedicated test suite to validate shapes, padding behavior, guards, and FLOP accounting.

Changes:

  • Introduce nvsubquadratic.modules.PatchMerging supporting both pure-spatial [B, H·W, C] and register-row [B, grid_w + H·W, C] layouts.
  • Add flop_count() for bookkeeping and validation guards for even grid sizes / register limits.
  • Add a new pytest suite covering output shapes, zero-padding invariants, routing independence, constructor guards, and FLOP counts.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

File Description
nvsubquadratic/modules/patch_merging.py New patch-merging module with optional register-row path + FLOP counting.
tests/modules/test_patch_merging.py New tests validating shapes, padding, error guards, routing, and FLOPs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


regs_proj = self.reg_proj(regs) # [B, num_regs, out_dim]
if self.reg_zero_pad is not None:
pad = self.reg_zero_pad.expand(B, -1, -1)
Comment on lines +10 to +14
tokens form a "register row" (the layout used by ``ViT5ClassificationNet``
with ``prepend_registers=True`` and no CLS). The patch grid is merged as
above; register tokens are projected independently with their own linear so
the FiLM conditioning signal survives the channel-dim change, then re-padded
to the new (halved) grid width.
Comment on lines +53 to +65
def test_register_row_pad_is_zero(device) -> None:
"""Output register-row padding slots must remain zero after the projection."""
B, in_dim, out_dim, grid, num_regs = 2, 32, 64, 28, 4
pm = PatchMerging(
in_dim=in_dim,
out_dim=out_dim,
grid_h=grid,
grid_w=grid,
norm_cfg=LazyConfig(RMSNorm)(dim=4 * in_dim, eps=1e-6, use_quack=False),
num_registers=num_regs,
has_register_row=True,
).to(device)

David Wessels added 27 commits May 25, 2026 16:04
David Wessels and others added 26 commits May 25, 2026 22:30
…_conv1d): add module and class docstrings with math context
…_purpose_resnet,classification_resnet): add module and class docstrings with math/arch context
…l_purpose_resnet, classification_resnet as done
…id,qk_norm,quack_utils): add module/class docstrings
…trainer, default_cfg, lightning_wrappers, datamodules, and utils
…ics, utils, testing, and experiments as done
…e docstrings and expand MixupConfig/AugmentConfig
The docs/reviews/ files were intermediate artifacts from the
write→review→integrate docstring pipeline. Their content has been
fully integrated into the source docstrings. They were being picked
up by Sphinx (source_suffix includes .md) but not referenced in any
toctree, generating 'document not in toctree' warnings — which fail
the CI build with -W --keep-going.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
residual_block.py: GeneralPurposeResnet → ResidualNetwork and
ClassificationResnet → ClassificationResNet in both See Also blocks.
The old names were non-existent and would have rendered as broken
links in the Sphinx API reference.

vit5_residual_block.py: expand token layout description to include
optional zero-padding tokens that ViT5ClassificationNet appends for
Hyena blocks when _block_needs_padding is true. Both the module
docstring and the forward() Args block now document the full layout:
[patches, (CLS,) registers, (padding,)] with T % grid_w == 0 for
padded blocks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…es are Hyena short-conv configs

The module docstring claimed CausalConv1D was used in mamba_nd.py,
but it is not imported there at all. The actual call sites are the
Hyena short-conv configuration helpers:
  examples/spatial_recall_v2/mixer_defaults.py
  examples/spatial_recall_1d/mixer_defaults.py

Update the 'Use in …' section header and body to reflect reality.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s div-by-zero

Copilot flagged that the docstring implied drop_prob=1.0 would zero
every sample cleanly, but worried the keep_prob=0 division would
produce inf/NaN first. The implementation already has an explicit
'if keep_prob > 0.0' guard (line 67) that skips the rescaling
division, so Bernoulli(0) produces an all-zero mask and x*0=0 with
no numerical issue. Update the docstring to document this guarantee.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ored

CONVENTIONS.md documented that the pre-commit ruff hook catches D100,
but pyproject.toml had D100 in the global ignore list, so both the
pre-commit hook and the CI diff-check silently skipped missing module
docstrings (ruff applies config-file ignores on top of CLI --select).

Fix:
- Remove D100 from pyproject.toml ignore so the hook matches the docs
- Add missing module docstring to nvsubquadratic/parallel/utils.py
  (the only file in scope that was missing one)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…forced

Four files were missing module-level docstrings that ruff D100 now
catches (after removing D100 from the global ignore list):

- docs/conf.py — Sphinx configuration file
- examples/imagenet_diffusion/ccnn_jit_baseline.py — CCNN-Hyena JiT-B-matched diffusion baseline
- examples/imagenet_diffusion/hf_uvit_baseline.py — HuggingFace UViT diffusion baseline
- examples/imagenet_diffusion/jit_baseline.py — JiT-B flow-matching diffusion baseline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements 2x2 spatial patch merging for hierarchical ViT-5/Hyena networks.

Key features:
- Pure-spatial layout: [B, H*W, C] -> [B, (H/2)*(W/2), out_dim]
- Register-row layout: passes the leading grid_w register tokens through
  a dedicated reg_proj Linear so FiLM conditioning survives the
  channel-dim change, then repacks them to width grid_w//2
- Post-concat norm (configurable via LazyConfig) + bias-free
  reduction Linear, both trunc_normal_ initialised (std=0.02)
- flop_count() helper for FLOP bookkeeping
- Full test suite covering shape, zero-pad, independent paths,
  error guards, and FLOP count

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Swin-style 4-stage hierarchical ViT-5 classifier with PatchMerging between
stages. Two readout layouts:
- pure: flat patch grid [B, H*W, C] + GAP
- register_row: FiLM register tokens prepended as first grid row at every
  stage; GAP excludes them

Key properties:
- Per-stage dims/depths fully configurable via LazyConfig
- flop_count() aggregates patch-embed + blocks + merges + head
- Backward tested: patch_embed and reg_proj grads both non-zero

ImageNet configs for 4-stage Swin-T-like Hyena hierarchy (p=4, dims
[96,192,384,768], depths [2,2,6,2]) in both 'pure' and 'register_row' FiLM
variants, plus CIFAR-10 patch and capacity ablation configs.

Mark patch_merging.py and vit5_hierarchical_classification.py as [x].

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r; align ImageNet configs with vit5_hybrid reference

## CIFAR-10 subfolder
Move all CIFAR-10 experiment files into examples/vit5_imagenet/v6_hierarchical/cifar10/:
  _cifar10_patch_ablation_base.py  → cifar10/_base.py
  cifar10_{flat,hier}_p{4,8,16}.py → cifar10/{flat,hier}_p{4,8,16}.py
  cifar10_hyena_{hier,flat}.py     → cifar10/hyena_{hier,flat}.py

Leaf configs import updated to reference cifar10._base.

## _base_config.py — align with vit5_hybrid/_film.py + _learnable_omega.py
- Mask: swap torch.nn.Identity for BlockAlignedGaussianModulationND
  (data_dim=2, extent=1.0, direct parametrization) matching
  apply_learnable_omega_blockdiag_overrides in the reference config.
- FiLM constants: add FILM_INIT_TYPE='identity', FILM_WEIGHT_DECAY=5e-3,
  FILM_AFTER_POS_EMBED=True matching vit5_hybrid/_film.py best-run defaults.
- KernelFiLMGenerator: add init_type + no_weight_decay; bump num_film_layers
  from KERNEL_NUM_LAYERS-1 to KERNEL_NUM_LAYERS (3) since film_after_pos_embed
  adds one extra pair for the positional-embedding sine.
- _siren_kernel_cfg: set cfg.film_after_pos_embed=True when film_cfg provided.

## hyena_hier_p4_{pure,film}.py — match full_hyena_learnable_omega_blockdiag.py
- Append MaskMonitorCallback + OmegaScaleMonitorCallback (log every 50 steps).
- Set distinct wandb.job_group: 'v6_hier_pure' / 'v6_hier_film'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Dafidofff Dafidofff force-pushed the feat/patch-merging branch from 59a29ec to 20e3be7 Compare May 26, 2026 14:53
…far10 configs

The v6_hierarchical CIFAR-10 example configs added in this PR import
`experiments.datamodules.cifar10.CIFAR10DataModule` but the file was
local-only. Add it to the PR with docstrings on all public methods so
ruff D102/D107 pass under the new docstring CI.

Also apply mdformat to docs-tracker.md to satisfy the pre-commit hook
(column-width normalization).
farhadrgh added a commit that referenced this pull request May 26, 2026
…t__s

PR B of the documentation follow-up: bring docs-tracker.md to an honest
state so every file under nvsubquadratic/ and experiments/ is either an
[x] or a documented exclusion.

Tracker rows resolved (closed [ ] -> [x]):
- huggingface_diffusers.py — expanded module docstring (DiT/UVit adapter
  contract, BHL↔BCHW translation, dtype handling, shared timestep state
  monkey-patch registration).  Per-class docstrings for the four public
  classes.
- jit_utils.py — expanded module docstring linking to the upstream JiT
  repo and enumerating helpers (VisionRotaryEmbedding{,Fast}, RMSNorm,
  sin-cos PE).  License header left untouched per repo policy on this
  branch.
- jit.py — already had per-class docstrings; no change.

Tracker rows still [ ] (PR pointer):
- patch_merging.py / vit5_hierarchical_classification.py — annotate the
  pending row with #122 so reviewers can track when this flips.

Tracker rows newly added (missing entries):
- baselines/unet_convnext.py and unet_convnext_v2.py (both already had
  good module docstrings).
- parallel/utils.py — CP comm utilities.
- parallel/test_a2a_comms.py — kept in place with a tracker note;
  moving to tests/ would expand this PR's scope.
- datamodules/emnist.py, datamodules/pde/well.py,
  datamodules/utils/dali_rand_augment.py.

Package __init__.py tidy-ups (license headers untouched):
- nvsubquadratic/__init__.py — drop dead `# TODO: Import …` block and
  `# TODO: Add main exports` placeholders in __all__.
- nvsubquadratic/modules/__init__.py — add a one-line module docstring.
- nvsubquadratic/ops/__init__.py — add a one-line module docstring.

Drive-by: AutoregressiveWrapper docstring formatting (`.. todo::` block
collapsed to one paragraph so the strict Sphinx build is clean).

Verification:
- `ruff check --select D100,D101,D102,D103,D301,D417` clean on the three
  __init__.py files and the touched network files.
- `make -C docs html ... SPHINXOPTS=-W --keep-going` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
farhadrgh added a commit that referenced this pull request May 27, 2026
…ase (#123)

* docs(write/sequence_mixer): add module and class docstrings

* docs(write/hyena_nd): add module and class docstrings with math context

* docs(review/sequence_mixer): reviewer feedback

* docs(review/hyena_nd): reviewer feedback

* docs(integrate/sequence_mixer): apply reviewer feedback

* docs(write/mixed_fftconv): add module and function docstrings

* docs(write/kernels_nd): add module and class docstrings with math context

* docs(review/mixed_fftconv): reviewer feedback

* docs(review/kernels_nd): reviewer feedback

* docs(integrate/mixed_fftconv): apply reviewer feedback

* docs(integrate/kernels_nd): apply reviewer feedback

* docs(tracker): mark mixed_fftconv, hyena_nd, kernels_nd, sequence_mixer as done

* docs(write/patchify): add module and class docstrings

* docs(write/residual_block): add module and class docstrings

* docs(write/film): add module and class docstrings with math context

* docs(review/residual_block): reviewer feedback

* docs(review/film): reviewer feedback

* docs(review/patchify): reviewer feedback

* docs(integrate/residual_block): apply reviewer feedback

* docs(integrate/patchify): apply reviewer feedback

* docs(write/ckconv_nd): add module and class docstrings with math context

* docs(review/ckconv_nd): reviewer feedback

* docs(integrate/ckconv_nd): apply reviewer feedback

* docs(integrate/film): apply reviewer feedback (from worktree)

* docs(tracker): mark ckconv_nd, residual_block, patchify, film as done

* docs(write/position_encoding): add module and class docstrings with math context

* docs(review/position_encoding): reviewer feedback

* docs(write/attention): add module and class docstrings with math context

* docs(integrate/position_encoding): apply reviewer feedback

* docs(review/attention): reviewer feedback

* docs(integrate/attention): apply reviewer feedback

* docs(integrate/vit5_residual_block+ckconv_multihead_nd): apply reviewer feedback

* docs(tracker): mark attention, position_encoding, ckconv_multihead_nd, vit5_residual_block as done

* docs(write/vit5_hyena_adapter): add module and class docstrings

* docs(write/condition_mixer): add module and class docstrings

* docs(write/mamba_nd): add module and class docstrings with math context

* docs(review/condition_mixer): reviewer feedback

* docs(integrate/condition_mixer): apply reviewer feedback

* docs(write/vit5_attention): add module and class docstrings

* docs(write/vit5_hyena_adapter): add module and class docstrings

* docs(write/vit5_attention): add module and class docstrings

* docs(review/vit5_hyena_adapter): reviewer feedback

* docs(review/vit5_attention): reviewer feedback

* docs(integrate/vit5_hyena_adapter): apply reviewer feedback

* docs(write/mamba_nd): add module and class docstrings with math context

* docs(review/mamba_nd): reviewer feedback

* docs(integrate/mamba_nd): apply reviewer feedback

* docs(integrate/vit5_attention): apply reviewer feedback

* docs(tracker): mark vit5_attention, vit5_hyena_adapter, condition_mixer, mamba_nd as done

* docs(write+integrate/mlp,grn,layer_scale,masks_nd): add module and class docstrings with math context

* docs(tracker): mark mlp, grn, layer_scale, masks_nd as done

* docs(write+integrate/rms_norm,rms_norm_channel_first,drop_path,causal_conv1d): add module and class docstrings with math context

* docs(tracker): mark rms_norm, rms_norm_channel_first, drop_path, causal_conv1d as done

* docs(write+integrate/schedulers,distributed_depthwise_conv_nd,general_purpose_resnet,classification_resnet): add module and class docstrings with math/arch context

* docs(tracker): mark schedulers, distributed_depthwise_conv_nd, general_purpose_resnet, classification_resnet as done

* docs(write+integrate/vit5_classification,a2a_comms,lazy_config,cleanfid,qk_norm,quack_utils): add module/class docstrings

* docs(write+integrate/experiments): add module docstrings across run, trainer, default_cfg, lightning_wrappers, datamodules, and utils

* docs(tracker): mark vit5_classification, a2a_comms, lazy_config, metrics, utils, testing, and experiments as done

* docs(write+integrate/callbacks,ucf101,dali_imagenet_fused): add module docstrings and expand MixupConfig/AugmentConfig

* docs(tracker): mark callbacks, ucf101, dali_imagenet_fused as done — documentation complete

* docs: add CONVENTIONS.md with docstring style guide and PR enforcement strategy

* docs: add docstring CI workflow, extend PR template with doc checklist, expand README docs section

* docs: remove stale review artifacts causing Sphinx warnings

The docs/reviews/ files were intermediate artifacts from the
write→review→integrate docstring pipeline. Their content has been
fully integrated into the source docstrings. They were being picked
up by Sphinx (source_suffix includes .md) but not referenced in any
toctree, generating 'document not in toctree' warnings — which fail
the CI build with -W --keep-going.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: fix Sphinx cross-ref class names and ViT-5 token layout doc

residual_block.py: GeneralPurposeResnet → ResidualNetwork and
ClassificationResnet → ClassificationResNet in both See Also blocks.
The old names were non-existent and would have rendered as broken
links in the Sphinx API reference.

vit5_residual_block.py: expand token layout description to include
optional zero-padding tokens that ViT5ClassificationNet appends for
Hyena blocks when _block_needs_padding is true. Both the module
docstring and the forward() Args block now document the full layout:
[patches, (CLS,) registers, (padding,)] with T % grid_w == 0 for
padded blocks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(causal_conv1d): fix incorrect Mamba usage note — actual call sites are Hyena short-conv configs

The module docstring claimed CausalConv1D was used in mamba_nd.py,
but it is not imported there at all. The actual call sites are the
Hyena short-conv configuration helpers:
  examples/spatial_recall_v2/mixer_defaults.py
  examples/spatial_recall_1d/mixer_defaults.py

Update the 'Use in …' section header and body to reflect reality.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(drop_path): clarify drop_prob=1.0 is safe — implementation guards div-by-zero

Copilot flagged that the docstring implied drop_prob=1.0 would zero
every sample cleanly, but worried the keep_prob=0 division would
produce inf/NaN first. The implementation already has an explicit
'if keep_prob > 0.0' guard (line 67) that skips the rescaling
division, so Bernoulli(0) produces an all-zero mask and x*0=0 with
no numerical issue. Update the docstring to document this guarantee.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(ruff): enforce D100 (missing module docstring) — was silently ignored

CONVENTIONS.md documented that the pre-commit ruff hook catches D100,
but pyproject.toml had D100 in the global ignore list, so both the
pre-commit hook and the CI diff-check silently skipped missing module
docstrings (ruff applies config-file ignores on top of CLI --select).

Fix:
- Remove D100 from pyproject.toml ignore so the hook matches the docs
- Add missing module docstring to nvsubquadratic/parallel/utils.py
  (the only file in scope that was missing one)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add missing module docstrings to satisfy D100 now that it is enforced

Four files were missing module-level docstrings that ruff D100 now
catches (after removing D100 from the global ignore list):

- docs/conf.py — Sphinx configuration file
- examples/imagenet_diffusion/ccnn_jit_baseline.py — CCNN-Hyena JiT-B-matched diffusion baseline
- examples/imagenet_diffusion/hf_uvit_baseline.py — HuggingFace UViT diffusion baseline
- examples/imagenet_diffusion/jit_baseline.py — JiT-B flow-matching diffusion baseline

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(layer_scale): revert init_values → init_value rename that broke callers

The docs pass accidentally renamed the public API parameter from
init_value to init_values.  ViT5ResidualBlock and all tests call
LayerScale(..., init_value=...), so every test that constructs a
residual block with layer_scale_init > 0 crashed with:

  TypeError: LayerScale.__init__() got an unexpected keyword argument 'init_value'

Restore the original parameter name and update the docstring to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: expand api.rst to cover full PR surface; fix new docstring warnings

Surface every nvsubquadratic module this PR documented in the Sphinx
site:
  - Ops: add mixed-precision FFT conv section
  - Modules: add Kernels & filters (SIREN/RFF/masks), Normalization
    (RMSNorm, GRN, LayerScale), Position encoding & patching, Gating &
    conditioning (FiLM, DropPath, QKVConditionMixer), Residual blocks,
    Schedulers; expand Mixers and Convolutions
  - Top-level: add Networks, Parallel, Utilities, Metrics sections

Fix the 27 new docstring warnings surfaced by the wider autodoc coverage,
all in modules the docstring-rewrite PR also touched:
  - kernels_nd: convert comma-grouped Args lists to per-arg entries and
    a prose paragraph for inherited args; fix `*spatial` emphasis in
    SIRENKernelND.forward; remove indented production-defaults block
    in BlockDiagonalMultiOmegaSIRENKernelND
  - ckconv_nd, film: rewrite flop_count docstrings to use proper RST
    bullets and inline code instead of indented continuation lines
  - rms_norm_channel_first: drop the manual `channels_first` Attributes
    entry that duplicated the auto-discovered class attribute
  - vit5_residual_block: collapse the multi-line inline-literal `T = ...`
    expression onto a single bullet description

Build is strict-clean under `SPHINXOPTS=-W --keep-going`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: split api.rst into per-area sub-pages; mock cleanfid

Reorganise the API reference from a single 19-section page into five
sub-pages, each with its own sidebar TOC entry:

  - docs/api/ops.rst        — 7 ops sections (fp32 ref, CUDA, circular,
                               multi-head, chunking, mixed precision,
                               direct 1D causal)
  - docs/api/modules.rst    — 8 module sections (mixers, convs, kernels,
                               norms, position/patching, gating,
                               residual blocks, schedulers)
  - docs/api/networks.rst   — end-to-end classification networks
  - docs/api/parallel.rst   — context-parallel comm primitives
  - docs/api/utilities.rst  — QK-norm, RoPE, metrics

The top-level docs/api.rst becomes a thin landing index with a maxdepth-2
toctree pointing at the five sub-pages.

Also mock `cleanfid` in autodoc_mock_imports — the external `cleanfid`
package isn't installed on the docs runner; its import in
nvsubquadratic.metrics.cleanfid was failing under strict-mode CI.  Add
docs/api/generated/ to .gitignore so the per-sub-page autosummary stubs
don't get tracked.

Strict build (`-W --keep-going`) succeeds locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: collapse sidebar to top-level pages; drop redundant pdoc guide

Theme tweaks in docs/conf.py:
  navigation_depth = 2  — primary sidebar stops at api/ops, api/modules,
                          etc.  The flat list of autosummary-generated
                          function/class stubs no longer clutters the
                          left sidebar; per-page H2 group headers stay
                          accessible via the right "On this page" panel.
  show_nav_level = 1
  show_toc_level = 2

README cleanup: drop the pdoc-and-IDE-hover docs viewing options.  Sphinx
is the canonical site (every PR-documented module is now in
docs/api/*.rst), so pdoc was redundant.  Replace with a one-line note
that IDE hover / help() also work since docstrings are inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(examples): co-locate fid_stats with imagenet_diffusion configs

The two FID stats .npz files are only consumed by examples in
examples/imagenet_diffusion/ and by scripts/generate_jit_fid_stats.py.
Move them under the example directory so the data sits next to the
only thing that uses it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(api): rename docs/api → docs/api_reference; add core + experiments

PR A of the documentation follow-up: surface every documented module in
the rendered Sphinx site without rewriting any docstrings.

Layout:
  docs/api_reference/index.rst        — toctree landing
  docs/api_reference/ops.rst          — (renamed from docs/api/ops.rst)
  docs/api_reference/modules.rst      — (renamed)
  docs/api_reference/networks.rst     — (renamed)
  docs/api_reference/parallel.rst     — (renamed)
  docs/api_reference/core.rst         — NEW: lazy_config, metrics, utils
                                          (qk_norm, rope, quack), testing
                                          helpers — supersedes the old
                                          api/utilities.rst, deleted here
  docs/api_reference/experiments.rst  — NEW: experiments.run, .trainer,
                                          .default_cfg dataclasses, all
                                          Lightning wrappers, callbacks,
                                          datamodules, utils

docs/index.rst now points the toctree at `api_reference/index`; the
thin `docs/api.rst` stub is gone (one landing page, not two).

CI infra adjustments needed to surface the experiments package:
  * autodoc_mock_imports gains pytorch_lightning, lightning, matplotlib,
    PIL, datasets, h5py, scipy, the_well, torch_fidelity, torchmetrics,
    torchvision, timm, wandb, rich, tqdm — pure-Python deps the docs
    runner skips to stay lean.  Also lift `nvidia` to a top-level mock
    so any future `nvidia.*` sub-import is covered.
  * sphinx.ext.todo enabled with `todo_include_todos = True` so
    `.. todo::` blocks in docstrings render rather than erroring out.
  * .gitignore: add `docs/build/` (legacy artifact path) and
    `docs/api_reference/generated/` (per-sub-page autosummary stubs);
    whitelist `docs/api_reference/core.*` so the doc page isn't caught
    by the repo-wide `core.*` core-dump pattern.

Tighten `_rewrite_repo_links` in conf.py with a comment explaining the
regex's anchoring so future edits don't accidentally rewrite intra-docs
links.

Drive-by fix: the `AutoregressiveWrapper.__doc__` had an indented bullet
list under a `.. todo::` directive that RST mis-parsed.  Convert to a
single-paragraph note so the strict build is clean.

No docstrings were rewritten in this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(tracker): close open rows + add absent files; tidy package __init__s

PR B of the documentation follow-up: bring docs-tracker.md to an honest
state so every file under nvsubquadratic/ and experiments/ is either an
[x] or a documented exclusion.

Tracker rows resolved (closed [ ] -> [x]):
- huggingface_diffusers.py — expanded module docstring (DiT/UVit adapter
  contract, BHL↔BCHW translation, dtype handling, shared timestep state
  monkey-patch registration).  Per-class docstrings for the four public
  classes.
- jit_utils.py — expanded module docstring linking to the upstream JiT
  repo and enumerating helpers (VisionRotaryEmbedding{,Fast}, RMSNorm,
  sin-cos PE).  License header left untouched per repo policy on this
  branch.
- jit.py — already had per-class docstrings; no change.

Tracker rows still [ ] (PR pointer):
- patch_merging.py / vit5_hierarchical_classification.py — annotate the
  pending row with #122 so reviewers can track when this flips.

Tracker rows newly added (missing entries):
- baselines/unet_convnext.py and unet_convnext_v2.py (both already had
  good module docstrings).
- parallel/utils.py — CP comm utilities.
- parallel/test_a2a_comms.py — kept in place with a tracker note;
  moving to tests/ would expand this PR's scope.
- datamodules/emnist.py, datamodules/pde/well.py,
  datamodules/utils/dali_rand_augment.py.

Package __init__.py tidy-ups (license headers untouched):
- nvsubquadratic/__init__.py — drop dead `# TODO: Import …` block and
  `# TODO: Add main exports` placeholders in __all__.
- nvsubquadratic/modules/__init__.py — add a one-line module docstring.
- nvsubquadratic/ops/__init__.py — add a one-line module docstring.

Drive-by: AutoregressiveWrapper docstring formatting (`.. todo::` block
collapsed to one paragraph so the strict Sphinx build is clean).

Verification:
- `ruff check --select D100,D101,D102,D103,D301,D417` clean on the three
  __init__.py files and the touched network files.
- `make -C docs html ... SPHINXOPTS=-W --keep-going` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: narrative pages — getting started, architecture, examples, benchmarks

PR C of the documentation follow-up: a new reader landing on the site
can install the library, see the three-layer architecture, find an
example for their task, and read the throughput numbers — all without
leaving the docs.

New pages (toctree'd into docs/index.rst above the API Reference):

- docs/getting_started.md — distilled install matrix (links the README
  for long-form), CUDA/GPU requirements, and a minimal "Hello, Hyena"
  snippet that ends in a working `fftconv2d_fp32_bhl(x, kernel)`
  forward pass.
- docs/architecture.md — ASCII diagram of the
  nvsubquadratic / subquadratic-ops / megatron-core layering, what each
  layer owns, the BHL/BLH/`_w_reshape`/`_chunked`/fp16 naming
  conventions, the QKVSequenceMixer operator-agnostic dispatch story,
  and the LazyConfig system.
- docs/examples/index.md — one paragraph per top-level recipe under
  examples/ (classification, diffusion, spatial recall, benchmarks,
  scientific).  Links to each example's README or primary config and
  to examples/overview_tracker.md for the active roadmap.
- docs/benchmarks.md — FLOP-scaling plot (symlinked into _static/ so
  the file stays single-source under benchmarks/) and an MyST
  `{include}` of benchmarks/README.md for the ViT-5-Small throughput
  tables.  Links out to benchmarks/ops/FP16_FFTCONV_RESULTS.md.

Scope note in docs-tracker.md updated to acknowledge that docs/
narrative pages are now in scope on this branch (read-only links to
the README and examples/overview_tracker.md — they're not duplicated).

Verification:
- `make -C docs html ... SPHINXOPTS=-W --keep-going` clean with the
  new pages, the `{include}` resolves, and the symlinked PNG is copied
  into _build/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(api): surface huggingface/jit/baselines/fp16 ops

Close the remaining acceptance-check gaps from PR A's deferred-by-design
list:

api_reference/networks.rst:
  - Diffusion — Hugging Face adapters section (HF DiT/UVit configs and
    wrappers).
  - Diffusion — JiT backbone section (JiT, JiTBlock, the 8 model classes,
    the 7 JiT_* factory functions, plus jit_utils helpers).
  - Baselines section (UNet-ConvNeXt v1/v2 + their Well-task wrappers).

api_reference/ops.rst:
  - FFT convolutions (fp16) — half-precision linear-conv variants
    (BHL + _w_reshape + _chunked, 12 functions).
  - Circular FFT convolutions (fp16) — periodic-boundary fp16 variants
    (6 functions).  Link the FP16 derivation page from the section
    intro for the dual-mean-centering background.

Drive-by: collapse `[B, C, *spatial]` -> ``[B, C, *spatial]`` in
UNetConvNext.forward and UNetConvNextV2.forward docstrings so the
`*spatial` token doesn't get parsed as an RST emphasis open.

Final acceptance: 13 -> 6 tracked-but-not-in-API entries, and all six
remaining are correctly excluded (4 are still `[ ]` in the tracker, one
is a test file, one is an internal helper).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docs): mock diffusers for autodoc

nvsubquadratic.networks.huggingface_diffusers imports DiTTransformer2DModel
and UVit2DModel from `diffusers` at module load.  diffusers is a runtime
dep (installed in the conda env) but isn't pulled in on the docs runner
under `pip install -e . --no-deps`, so autodoc fails to import the module
when the api_reference/networks.rst page references it.

Adding diffusers to autodoc_mock_imports lets autodoc resolve the dotted
references (`diffusers.models.DiTTransformer2DModel`, etc.) via the mock
attribute chain.

Verified locally with `pip uninstall diffusers` then strict build:
`make -C docs html ... SPHINXOPTS=-W --keep-going` succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(repo-organization): benchmarks + scripts/visualization READMEs; … (#124)

* docs(repo-organization): benchmarks + scripts/visualization READMEs; migrate reports

Brings benchmarks/, scripts/visualization/, and the top-level
visualizations/ directory onto the same documentation footing as the
library code, and migrates point-in-time profiling artefacts into the
reports/ framework where they belong.

Tasks 1–2: benchmarks/
- New per-subdirectory READMEs: benchmarks/ops/, vit5_imagenet/, well/.
- Module-level docstrings (4-question format) on every previously-short
  benchmark script.
- One-line driver-header comments on the three SLURM .sh runners.
- ruff D100 is clean across benchmarks/.

Task 3: scripts/visualization/
- README documents the Streamlit (.json) vs Gradio (.npz) kernel-viewer
  divergence and when to reach for each.
- visualize_patch_size_throughput.py moved into the visualization/
  subdir; usage example in its docstring updated.
- Stale `scripts/visualize_kernels*.py` usage paths in the two kernel
  viewers' docstrings corrected to `scripts/visualization/...`.

Task 4: visualizations/ -> reports/spatial_recall/
- 10 PNGs migrated via `git mv` (history preserved).
- New REPORT.md narrates each 1D / 2D / 3D task, links each figure to
  its `examples/spatial_recall_*` config, and documents the
  regeneration command.
- Top-level visualizations/ directory removed.

Task 5: dated profile artefacts -> reports/vit5_imagenet_dataloader_profiling/
- Seven files migrated via `git mv` (Day 1 + Day 2 .jsonl runs and the
  two profilers + their SLURM drivers).
- New REPORT.md folds the old `dataloader_profile_2026-02-25.md` prose
  into a two-day write-up that also covers the Day 2
  `step_breakdown_2026-02-26.jsonl` (instrumented step + GPU-event view
  + theoretical-min gap).
- Pointer line added in benchmarks/vit5_imagenet/README.md.
- Old .md removed (content now lives in REPORT.md).

Task 6: docs/ link
- docs/reports.md standalone landing page indexing every topic by
  absolute GitHub blob URL (the `{include}` fallback path — the
  index-table links in reports/README.md use bare relative paths that
  don't resolve under MyST cross-references).
- Added to the docs/index.rst toctree above Ops Overview.
- Strict Sphinx build (`-W --keep-going`) clean.

Tracker:
- docs-tracker.md Scope extended: benchmarks/, scripts/visualization/,
  and reports/ are now in scope at a lighter bar (module docstrings +
  per-subdir README; no Sphinx API entry).
- Three new progress tables (benchmarks/, scripts/visualization/,
  reports/) with a row per file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(repo): move root slurm/ -> scripts/slurm/

Consolidate SLURM submit / driver scripts under one tree.  The root
slurm/ directory held a mix of per-experiment submit scripts and
cluster-specific helpers that predate the portable wrapper introduced
in PR #113.  Move them all under scripts/slurm/ so there's a single
place for SLURM-related scripts.

Name-collision resolution:
  slurm/submit.sh -> scripts/slurm/submit_imagenet64_diff.sh

The root slurm/submit.sh (2026-03-16) was a one-off ImageNet-64
diffusion submit script with hardcoded
``--account=healthcareeng_research``, 4 nodes, 4 h time limit, and the
``imagenet64.n4`` singleton job name.  scripts/slurm/submit.sh
(2026-05-18, PR #113) is the new portable wrapper that auto-detects
project root and reads cluster.env for per-cluster overrides.  Both
are kept; the older one is renamed to its actual job-name semantics.

No other collisions — the rest of the root slurm/ tree (queue.sh,
queue_well.sh, submit_hybrid*.sh, submit_in1k_*.sh, submit_well.sh,
download_well.sh, tg_download_well.sh, the diffusion/ subdir, and
enroot/build_sqsh.sh) lands flat under scripts/slurm/.

All moves via ``git mv`` so ``git log --follow`` traces history through
the rename.  Updated documentation references to the new paths in:

  - README.md (Enroot section: scripts/slurm/enroot/build_sqsh.sh).
  - examples/imagenet_diffusion/README.md (sbatch path + SLURM-scripts
    list).
  - examples/vit5_imagenet/vit5_hybrid/plan.md (historical
    retrospective; bulk-rewrite of all script paths).
  - examples/well/README.md, examples/well/v{1,2}/TRACKER.md
    (sbatch scripts/slurm/download_well.sh ...).
  - reports/ckconv_block_diagonal_kernel/REPORT.md (submit_hybrid.sh
    paths in the run commands).

``#SBATCH --output=slurm/%x_%j.out`` lines in the per-experiment
sbatch scripts are left alone — that's a runtime log-directory
convention, not a source-tree reference; sbatch creates the directory
when the job runs.

Strict Sphinx build (``-W --keep-going``) clean after the move.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(repo): move benchmark scripts from scripts/ to benchmarks/

Address the scripts/ vs benchmarks/ overlap: scripts/ was a catch-all
that included four benchmark-shaped files, one a stale duplicate.

  scripts/benchmark_imagenet_diffusion_gpu.py   -> (deleted — duplicate)
  scripts/benchmark_imagenet_throughput.py      -> benchmarks/vit5_imagenet/
  scripts/benchmark_patch_size_2d.py            -> benchmarks/
  scripts/profile_batch_size.py                 -> benchmarks/well/
                                                   (supernova_explosion_64
                                                   is a WELL sub-dataset)

Each move via `git mv` so `git log --follow` traces through the rename.
All four carry the 4-question module docstring format (three already
did; profile_batch_size.py expanded here).

References updated:
- scripts/visualization/README.md (pointer to benchmark_patch_size_2d.py).
- benchmarks/vit5_imagenet/README.md (new section for
  benchmark_imagenet_throughput.py).
- benchmarks/well/README.md (bullet for profile_batch_size.py).
- docs-tracker.md (three new `[x]` rows under benchmarks/).

After this change scripts/ holds only utility / glue scripts (data
prep, GPU/license sanity, kernel viewers, SLURM submit) and
benchmarks/ is the single home for performance measurement.

ruff D100 clean on benchmarks/; strict Sphinx build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(tests): move test_a2a_comms into tests/parallel

Tests should live under the canonical tests/ tree, not inside the
library package.  Move nvsubquadratic/parallel/test_a2a_comms.py into
tests/parallel/test_a2a_comms.py (with a new __init__.py) so
collection happens through the standard tests/ root.

Updated references:
- docs-tracker.md: the tracker note for `test_a2a_comms.py` now points
  at the new path instead of explaining why it lived next to its
  target.
- tests/README.md: added a `parallel/` row to the directory diagram.

License headers on this file and on tests/conftest.py left untouched
per repo policy on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(scripts): co-locate jit fid stats with other data prep

scripts/generate_jit_fid_stats.py lived at the top of scripts/ but is
purely a data-prep script — it pre-computes FID reference statistics
on ImageNet for the JiT diffusion eval path.  Other data-prep helpers
(extract_imagenet_to_folder.py, compute_imagenet_stats.py,
stage_imagenet.sh, …) already live under scripts/data/, so move this
one in too.

  scripts/generate_jit_fid_stats.py
    -> scripts/data/generate_jit_fid_stats.py

Move via `git mv` so `git log --follow` traces the rename.  The file
had no module docstring; added one in the 4-question format (what /
hardware / how to invoke / where output goes) referencing the new
path.

`grep -rn` confirms no other tracked files referenced the old path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(benchmarks): flatten vit5_imagenet/scripts/ runner directory

benchmarks/vit5_imagenet/scripts/ held only three SLURM drivers
(bench_compile.sh, bench_optimized.sh, bench_profile.sh) for the
matching bench_vit5_*.py files in the parent directory.  An extra
one-file-deep subdirectory adds nothing — move them up next to the
.py files they invoke and drop the empty scripts/ subdir.

  benchmarks/vit5_imagenet/scripts/bench_compile.sh   -> ../bench_compile.sh
  benchmarks/vit5_imagenet/scripts/bench_optimized.sh -> ../bench_optimized.sh
  benchmarks/vit5_imagenet/scripts/bench_profile.sh   -> ../bench_profile.sh

Each .sh runner invokes its target via the repo-root path
``PYTHONPATH=. python benchmarks/vit5_imagenet/bench_vit5_*.py``, so
no in-file path changes are needed.

Updated references:
- benchmarks/README.md: sbatch invocations now point at the flat path.
- benchmarks/vit5_imagenet/README.md: SLURM-scripts section reflects
  the new layout.
- docs-tracker.md: the tracker row globs updated from `scripts/bench_*.sh`
  to `bench_*.sh`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add package overview / library tour

Bottom-up tour of nvsubquadratic/ between Architecture and Examples
in the docs site: one short paragraph each for ops/, modules/,
networks/, parallel/, and the supporting utils/metrics/testing/
lazy_config layer.  Opens with the bottom-up organising principle and
closes with a "Where to go next" pointer at the API reference,
Examples, and the Ops Overview math primer.

docs/index.rst:
- "Where to go next" bullet list gains a Package Overview entry.
- Toctree gains `Package Overview <package_overview>` between
  Architecture and Examples.

Strict Sphinx build (`-W --keep-going`) clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: rewrite package overview around a directory tree; tracker follow-ups

Two related changes:

1. Package overview: revise docs/package_overview.md to lead with an
   actual `nvsubquadratic/` directory tree (one line per .py with a
   short purpose tag) and then narrate each area underneath.  The
   previous version was a wall of cross-references with no layout
   signal — the tree gives readers a fast scan of what lives where
   before they read the prose.

2. docs-tracker.md follow-ups:
   - Scope sentence enumerates the docs/ narrative pages explicitly
     ("Getting Started, Architecture, Package Overview, Examples,
     Benchmarks, Reports") rather than leaving Package Overview
     implicit.
   - parallel/ table loses the `test_a2a_comms.py` row (the file has
     moved to tests/parallel/, which is out of tracker scope) and
     gains a one-line pointer underneath so the move is documented
     without leaving a misleading row in the parallel/ table.

Strict Sphinx build (`-W --keep-going`) clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: replace package overview with repository overview

The previous "Package Overview" page only mapped `nvsubquadratic/`,
which left every other top-level directory (experiments/, examples/,
benchmarks/, reports/, scripts/, tests/, docs/) invisible to a new
reader.  Promote the page to a true repository overview:

  docs/package_overview.md -> docs/repository_overview.md   (git mv)

Content reorganised around the repo tree:
- ASCII tree of the repo root (every top-level dir + the notable
  files: README, pyproject.toml, Dockerfile, CONVENTIONS.md,
  docs-tracker.md, setup_conda_env.sh, nvsubquadratic.def).
- Second tree drills into `nvsubquadratic/` itself (carried over from
  the previous page — that's still useful, just no longer the
  whole page).
- "What each top-level directory does" paragraph per area
  (nvsubquadratic / experiments / examples / benchmarks / reports /
  scripts / tests / docs) and what makes each distinct.
- Closes with the "Where to go next" pointer.

docs/index.rst:
- Toctree entry "Package Overview <package_overview>" renamed to
  "Repository Overview <repository_overview>".
- "Where to go next" bullet copy reflects the broader scope.

docs-tracker.md:
- Scope sentence enumerates Repository Overview alongside the other
  narrative pages.

Strict Sphinx build (`-W --keep-going`) clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: link CONVENTIONS.md + docs-tracker.md from the landing page

Earlier in this session I prototyped a docs/conventions.md page that
{include}-d the root CONVENTIONS.md so the docstring guide would
become searchable from the docs site.  Replaced that approach with a
simpler link — no extra docs page, no GitHub-web rendering wrinkle
where the {include} directive shows up as raw markdown.  Add a small
"Contributor docs" section to docs/index.rst with absolute GitHub
URLs to both root files (CONVENTIONS.md and docs-tracker.md).

Source of truth stays at the root files; the docs site just points
at them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: David Wessels <dwessel@hipster-l1.science.uva.nl>
Co-authored-by: David Wessels <dwessel@hipster-l2.science.uva.nl>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Farhad Ramezanghorbani <farhadr@nvidia.com>
Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants