feat(modules): Swin-style PatchMerging with register-row support#122
Open
Dafidofff wants to merge 75 commits into
Open
feat(modules): Swin-style PatchMerging with register-row support#122Dafidofff wants to merge 75 commits into
Dafidofff wants to merge 75 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a Swin-style 2×2 spatial downsampling block (PatchMerging) to support hierarchical ViT/Hyena-style models, including an optional “register-row” token layout, and introduces a dedicated test suite to validate shapes, padding behavior, guards, and FLOP accounting.
Changes:
- Introduce
nvsubquadratic.modules.PatchMergingsupporting both pure-spatial[B, H·W, C]and register-row[B, grid_w + H·W, C]layouts. - Add
flop_count()for bookkeeping and validation guards for even grid sizes / register limits. - Add a new
pytestsuite covering output shapes, zero-padding invariants, routing independence, constructor guards, and FLOP counts.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
nvsubquadratic/modules/patch_merging.py |
New patch-merging module with optional register-row path + FLOP counting. |
tests/modules/test_patch_merging.py |
New tests validating shapes, padding, error guards, routing, and FLOPs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| regs_proj = self.reg_proj(regs) # [B, num_regs, out_dim] | ||
| if self.reg_zero_pad is not None: | ||
| pad = self.reg_zero_pad.expand(B, -1, -1) |
Comment on lines
+10
to
+14
| tokens form a "register row" (the layout used by ``ViT5ClassificationNet`` | ||
| with ``prepend_registers=True`` and no CLS). The patch grid is merged as | ||
| above; register tokens are projected independently with their own linear so | ||
| the FiLM conditioning signal survives the channel-dim change, then re-padded | ||
| to the new (halved) grid width. |
Comment on lines
+53
to
+65
| def test_register_row_pad_is_zero(device) -> None: | ||
| """Output register-row padding slots must remain zero after the projection.""" | ||
| B, in_dim, out_dim, grid, num_regs = 2, 32, 64, 28, 4 | ||
| pm = PatchMerging( | ||
| in_dim=in_dim, | ||
| out_dim=out_dim, | ||
| grid_h=grid, | ||
| grid_w=grid, | ||
| norm_cfg=LazyConfig(RMSNorm)(dim=4 * in_dim, eps=1e-6, use_quack=False), | ||
| num_registers=num_regs, | ||
| has_register_row=True, | ||
| ).to(device) | ||
|
|
added 27 commits
May 25, 2026 16:04
…er, mamba_nd as done
…ass docstrings with math context
…_conv1d): add module and class docstrings with math context
…al_conv1d as done
…_purpose_resnet,classification_resnet): add module and class docstrings with math/arch context
…l_purpose_resnet, classification_resnet as done
…id,qk_norm,quack_utils): add module/class docstrings
…trainer, default_cfg, lightning_wrappers, datamodules, and utils
…ics, utils, testing, and experiments as done
…e docstrings and expand MixupConfig/AugmentConfig
…documentation complete
…t, expand README docs section
The docs/reviews/ files were intermediate artifacts from the write→review→integrate docstring pipeline. Their content has been fully integrated into the source docstrings. They were being picked up by Sphinx (source_suffix includes .md) but not referenced in any toctree, generating 'document not in toctree' warnings — which fail the CI build with -W --keep-going. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
residual_block.py: GeneralPurposeResnet → ResidualNetwork and ClassificationResnet → ClassificationResNet in both See Also blocks. The old names were non-existent and would have rendered as broken links in the Sphinx API reference. vit5_residual_block.py: expand token layout description to include optional zero-padding tokens that ViT5ClassificationNet appends for Hyena blocks when _block_needs_padding is true. Both the module docstring and the forward() Args block now document the full layout: [patches, (CLS,) registers, (padding,)] with T % grid_w == 0 for padded blocks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…es are Hyena short-conv configs The module docstring claimed CausalConv1D was used in mamba_nd.py, but it is not imported there at all. The actual call sites are the Hyena short-conv configuration helpers: examples/spatial_recall_v2/mixer_defaults.py examples/spatial_recall_1d/mixer_defaults.py Update the 'Use in …' section header and body to reflect reality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s div-by-zero Copilot flagged that the docstring implied drop_prob=1.0 would zero every sample cleanly, but worried the keep_prob=0 division would produce inf/NaN first. The implementation already has an explicit 'if keep_prob > 0.0' guard (line 67) that skips the rescaling division, so Bernoulli(0) produces an all-zero mask and x*0=0 with no numerical issue. Update the docstring to document this guarantee. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ored CONVENTIONS.md documented that the pre-commit ruff hook catches D100, but pyproject.toml had D100 in the global ignore list, so both the pre-commit hook and the CI diff-check silently skipped missing module docstrings (ruff applies config-file ignores on top of CLI --select). Fix: - Remove D100 from pyproject.toml ignore so the hook matches the docs - Add missing module docstring to nvsubquadratic/parallel/utils.py (the only file in scope that was missing one) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…forced Four files were missing module-level docstrings that ruff D100 now catches (after removing D100 from the global ignore list): - docs/conf.py — Sphinx configuration file - examples/imagenet_diffusion/ccnn_jit_baseline.py — CCNN-Hyena JiT-B-matched diffusion baseline - examples/imagenet_diffusion/hf_uvit_baseline.py — HuggingFace UViT diffusion baseline - examples/imagenet_diffusion/jit_baseline.py — JiT-B flow-matching diffusion baseline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements 2x2 spatial patch merging for hierarchical ViT-5/Hyena networks. Key features: - Pure-spatial layout: [B, H*W, C] -> [B, (H/2)*(W/2), out_dim] - Register-row layout: passes the leading grid_w register tokens through a dedicated reg_proj Linear so FiLM conditioning survives the channel-dim change, then repacks them to width grid_w//2 - Post-concat norm (configurable via LazyConfig) + bias-free reduction Linear, both trunc_normal_ initialised (std=0.02) - flop_count() helper for FLOP bookkeeping - Full test suite covering shape, zero-pad, independent paths, error guards, and FLOP count Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Swin-style 4-stage hierarchical ViT-5 classifier with PatchMerging between stages. Two readout layouts: - pure: flat patch grid [B, H*W, C] + GAP - register_row: FiLM register tokens prepended as first grid row at every stage; GAP excludes them Key properties: - Per-stage dims/depths fully configurable via LazyConfig - flop_count() aggregates patch-embed + blocks + merges + head - Backward tested: patch_embed and reg_proj grads both non-zero ImageNet configs for 4-stage Swin-T-like Hyena hierarchy (p=4, dims [96,192,384,768], depths [2,2,6,2]) in both 'pure' and 'register_row' FiLM variants, plus CIFAR-10 patch and capacity ablation configs. Mark patch_merging.py and vit5_hierarchical_classification.py as [x]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r; align ImageNet configs with vit5_hybrid reference
## CIFAR-10 subfolder
Move all CIFAR-10 experiment files into examples/vit5_imagenet/v6_hierarchical/cifar10/:
_cifar10_patch_ablation_base.py → cifar10/_base.py
cifar10_{flat,hier}_p{4,8,16}.py → cifar10/{flat,hier}_p{4,8,16}.py
cifar10_hyena_{hier,flat}.py → cifar10/hyena_{hier,flat}.py
Leaf configs import updated to reference cifar10._base.
## _base_config.py — align with vit5_hybrid/_film.py + _learnable_omega.py
- Mask: swap torch.nn.Identity for BlockAlignedGaussianModulationND
(data_dim=2, extent=1.0, direct parametrization) matching
apply_learnable_omega_blockdiag_overrides in the reference config.
- FiLM constants: add FILM_INIT_TYPE='identity', FILM_WEIGHT_DECAY=5e-3,
FILM_AFTER_POS_EMBED=True matching vit5_hybrid/_film.py best-run defaults.
- KernelFiLMGenerator: add init_type + no_weight_decay; bump num_film_layers
from KERNEL_NUM_LAYERS-1 to KERNEL_NUM_LAYERS (3) since film_after_pos_embed
adds one extra pair for the positional-embedding sine.
- _siren_kernel_cfg: set cfg.film_after_pos_embed=True when film_cfg provided.
## hyena_hier_p4_{pure,film}.py — match full_hyena_learnable_omega_blockdiag.py
- Append MaskMonitorCallback + OmegaScaleMonitorCallback (log every 50 steps).
- Set distinct wandb.job_group: 'v6_hier_pure' / 'v6_hier_film'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
59a29ec to
20e3be7
Compare
…far10 configs The v6_hierarchical CIFAR-10 example configs added in this PR import `experiments.datamodules.cifar10.CIFAR10DataModule` but the file was local-only. Add it to the PR with docstrings on all public methods so ruff D102/D107 pass under the new docstring CI. Also apply mdformat to docs-tracker.md to satisfy the pre-commit hook (column-width normalization).
farhadrgh
added a commit
that referenced
this pull request
May 26, 2026
…t__s
PR B of the documentation follow-up: bring docs-tracker.md to an honest
state so every file under nvsubquadratic/ and experiments/ is either an
[x] or a documented exclusion.
Tracker rows resolved (closed [ ] -> [x]):
- huggingface_diffusers.py — expanded module docstring (DiT/UVit adapter
contract, BHL↔BCHW translation, dtype handling, shared timestep state
monkey-patch registration). Per-class docstrings for the four public
classes.
- jit_utils.py — expanded module docstring linking to the upstream JiT
repo and enumerating helpers (VisionRotaryEmbedding{,Fast}, RMSNorm,
sin-cos PE). License header left untouched per repo policy on this
branch.
- jit.py — already had per-class docstrings; no change.
Tracker rows still [ ] (PR pointer):
- patch_merging.py / vit5_hierarchical_classification.py — annotate the
pending row with #122 so reviewers can track when this flips.
Tracker rows newly added (missing entries):
- baselines/unet_convnext.py and unet_convnext_v2.py (both already had
good module docstrings).
- parallel/utils.py — CP comm utilities.
- parallel/test_a2a_comms.py — kept in place with a tracker note;
moving to tests/ would expand this PR's scope.
- datamodules/emnist.py, datamodules/pde/well.py,
datamodules/utils/dali_rand_augment.py.
Package __init__.py tidy-ups (license headers untouched):
- nvsubquadratic/__init__.py — drop dead `# TODO: Import …` block and
`# TODO: Add main exports` placeholders in __all__.
- nvsubquadratic/modules/__init__.py — add a one-line module docstring.
- nvsubquadratic/ops/__init__.py — add a one-line module docstring.
Drive-by: AutoregressiveWrapper docstring formatting (`.. todo::` block
collapsed to one paragraph so the strict Sphinx build is clean).
Verification:
- `ruff check --select D100,D101,D102,D103,D301,D417` clean on the three
__init__.py files and the touched network files.
- `make -C docs html ... SPHINXOPTS=-W --keep-going` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
farhadrgh
added a commit
that referenced
this pull request
May 27, 2026
…ase (#123) * docs(write/sequence_mixer): add module and class docstrings * docs(write/hyena_nd): add module and class docstrings with math context * docs(review/sequence_mixer): reviewer feedback * docs(review/hyena_nd): reviewer feedback * docs(integrate/sequence_mixer): apply reviewer feedback * docs(write/mixed_fftconv): add module and function docstrings * docs(write/kernels_nd): add module and class docstrings with math context * docs(review/mixed_fftconv): reviewer feedback * docs(review/kernels_nd): reviewer feedback * docs(integrate/mixed_fftconv): apply reviewer feedback * docs(integrate/kernels_nd): apply reviewer feedback * docs(tracker): mark mixed_fftconv, hyena_nd, kernels_nd, sequence_mixer as done * docs(write/patchify): add module and class docstrings * docs(write/residual_block): add module and class docstrings * docs(write/film): add module and class docstrings with math context * docs(review/residual_block): reviewer feedback * docs(review/film): reviewer feedback * docs(review/patchify): reviewer feedback * docs(integrate/residual_block): apply reviewer feedback * docs(integrate/patchify): apply reviewer feedback * docs(write/ckconv_nd): add module and class docstrings with math context * docs(review/ckconv_nd): reviewer feedback * docs(integrate/ckconv_nd): apply reviewer feedback * docs(integrate/film): apply reviewer feedback (from worktree) * docs(tracker): mark ckconv_nd, residual_block, patchify, film as done * docs(write/position_encoding): add module and class docstrings with math context * docs(review/position_encoding): reviewer feedback * docs(write/attention): add module and class docstrings with math context * docs(integrate/position_encoding): apply reviewer feedback * docs(review/attention): reviewer feedback * docs(integrate/attention): apply reviewer feedback * docs(integrate/vit5_residual_block+ckconv_multihead_nd): apply reviewer feedback * docs(tracker): mark attention, position_encoding, ckconv_multihead_nd, vit5_residual_block as done * docs(write/vit5_hyena_adapter): add module and class docstrings * docs(write/condition_mixer): add module and class docstrings * docs(write/mamba_nd): add module and class docstrings with math context * docs(review/condition_mixer): reviewer feedback * docs(integrate/condition_mixer): apply reviewer feedback * docs(write/vit5_attention): add module and class docstrings * docs(write/vit5_hyena_adapter): add module and class docstrings * docs(write/vit5_attention): add module and class docstrings * docs(review/vit5_hyena_adapter): reviewer feedback * docs(review/vit5_attention): reviewer feedback * docs(integrate/vit5_hyena_adapter): apply reviewer feedback * docs(write/mamba_nd): add module and class docstrings with math context * docs(review/mamba_nd): reviewer feedback * docs(integrate/mamba_nd): apply reviewer feedback * docs(integrate/vit5_attention): apply reviewer feedback * docs(tracker): mark vit5_attention, vit5_hyena_adapter, condition_mixer, mamba_nd as done * docs(write+integrate/mlp,grn,layer_scale,masks_nd): add module and class docstrings with math context * docs(tracker): mark mlp, grn, layer_scale, masks_nd as done * docs(write+integrate/rms_norm,rms_norm_channel_first,drop_path,causal_conv1d): add module and class docstrings with math context * docs(tracker): mark rms_norm, rms_norm_channel_first, drop_path, causal_conv1d as done * docs(write+integrate/schedulers,distributed_depthwise_conv_nd,general_purpose_resnet,classification_resnet): add module and class docstrings with math/arch context * docs(tracker): mark schedulers, distributed_depthwise_conv_nd, general_purpose_resnet, classification_resnet as done * docs(write+integrate/vit5_classification,a2a_comms,lazy_config,cleanfid,qk_norm,quack_utils): add module/class docstrings * docs(write+integrate/experiments): add module docstrings across run, trainer, default_cfg, lightning_wrappers, datamodules, and utils * docs(tracker): mark vit5_classification, a2a_comms, lazy_config, metrics, utils, testing, and experiments as done * docs(write+integrate/callbacks,ucf101,dali_imagenet_fused): add module docstrings and expand MixupConfig/AugmentConfig * docs(tracker): mark callbacks, ucf101, dali_imagenet_fused as done — documentation complete * docs: add CONVENTIONS.md with docstring style guide and PR enforcement strategy * docs: add docstring CI workflow, extend PR template with doc checklist, expand README docs section * docs: remove stale review artifacts causing Sphinx warnings The docs/reviews/ files were intermediate artifacts from the write→review→integrate docstring pipeline. Their content has been fully integrated into the source docstrings. They were being picked up by Sphinx (source_suffix includes .md) but not referenced in any toctree, generating 'document not in toctree' warnings — which fail the CI build with -W --keep-going. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: fix Sphinx cross-ref class names and ViT-5 token layout doc residual_block.py: GeneralPurposeResnet → ResidualNetwork and ClassificationResnet → ClassificationResNet in both See Also blocks. The old names were non-existent and would have rendered as broken links in the Sphinx API reference. vit5_residual_block.py: expand token layout description to include optional zero-padding tokens that ViT5ClassificationNet appends for Hyena blocks when _block_needs_padding is true. Both the module docstring and the forward() Args block now document the full layout: [patches, (CLS,) registers, (padding,)] with T % grid_w == 0 for padded blocks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(causal_conv1d): fix incorrect Mamba usage note — actual call sites are Hyena short-conv configs The module docstring claimed CausalConv1D was used in mamba_nd.py, but it is not imported there at all. The actual call sites are the Hyena short-conv configuration helpers: examples/spatial_recall_v2/mixer_defaults.py examples/spatial_recall_1d/mixer_defaults.py Update the 'Use in …' section header and body to reflect reality. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(drop_path): clarify drop_prob=1.0 is safe — implementation guards div-by-zero Copilot flagged that the docstring implied drop_prob=1.0 would zero every sample cleanly, but worried the keep_prob=0 division would produce inf/NaN first. The implementation already has an explicit 'if keep_prob > 0.0' guard (line 67) that skips the rescaling division, so Bernoulli(0) produces an all-zero mask and x*0=0 with no numerical issue. Update the docstring to document this guarantee. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(ruff): enforce D100 (missing module docstring) — was silently ignored CONVENTIONS.md documented that the pre-commit ruff hook catches D100, but pyproject.toml had D100 in the global ignore list, so both the pre-commit hook and the CI diff-check silently skipped missing module docstrings (ruff applies config-file ignores on top of CLI --select). Fix: - Remove D100 from pyproject.toml ignore so the hook matches the docs - Add missing module docstring to nvsubquadratic/parallel/utils.py (the only file in scope that was missing one) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add missing module docstrings to satisfy D100 now that it is enforced Four files were missing module-level docstrings that ruff D100 now catches (after removing D100 from the global ignore list): - docs/conf.py — Sphinx configuration file - examples/imagenet_diffusion/ccnn_jit_baseline.py — CCNN-Hyena JiT-B-matched diffusion baseline - examples/imagenet_diffusion/hf_uvit_baseline.py — HuggingFace UViT diffusion baseline - examples/imagenet_diffusion/jit_baseline.py — JiT-B flow-matching diffusion baseline Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(layer_scale): revert init_values → init_value rename that broke callers The docs pass accidentally renamed the public API parameter from init_value to init_values. ViT5ResidualBlock and all tests call LayerScale(..., init_value=...), so every test that constructs a residual block with layer_scale_init > 0 crashed with: TypeError: LayerScale.__init__() got an unexpected keyword argument 'init_value' Restore the original parameter name and update the docstring to match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: expand api.rst to cover full PR surface; fix new docstring warnings Surface every nvsubquadratic module this PR documented in the Sphinx site: - Ops: add mixed-precision FFT conv section - Modules: add Kernels & filters (SIREN/RFF/masks), Normalization (RMSNorm, GRN, LayerScale), Position encoding & patching, Gating & conditioning (FiLM, DropPath, QKVConditionMixer), Residual blocks, Schedulers; expand Mixers and Convolutions - Top-level: add Networks, Parallel, Utilities, Metrics sections Fix the 27 new docstring warnings surfaced by the wider autodoc coverage, all in modules the docstring-rewrite PR also touched: - kernels_nd: convert comma-grouped Args lists to per-arg entries and a prose paragraph for inherited args; fix `*spatial` emphasis in SIRENKernelND.forward; remove indented production-defaults block in BlockDiagonalMultiOmegaSIRENKernelND - ckconv_nd, film: rewrite flop_count docstrings to use proper RST bullets and inline code instead of indented continuation lines - rms_norm_channel_first: drop the manual `channels_first` Attributes entry that duplicated the auto-discovered class attribute - vit5_residual_block: collapse the multi-line inline-literal `T = ...` expression onto a single bullet description Build is strict-clean under `SPHINXOPTS=-W --keep-going`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: split api.rst into per-area sub-pages; mock cleanfid Reorganise the API reference from a single 19-section page into five sub-pages, each with its own sidebar TOC entry: - docs/api/ops.rst — 7 ops sections (fp32 ref, CUDA, circular, multi-head, chunking, mixed precision, direct 1D causal) - docs/api/modules.rst — 8 module sections (mixers, convs, kernels, norms, position/patching, gating, residual blocks, schedulers) - docs/api/networks.rst — end-to-end classification networks - docs/api/parallel.rst — context-parallel comm primitives - docs/api/utilities.rst — QK-norm, RoPE, metrics The top-level docs/api.rst becomes a thin landing index with a maxdepth-2 toctree pointing at the five sub-pages. Also mock `cleanfid` in autodoc_mock_imports — the external `cleanfid` package isn't installed on the docs runner; its import in nvsubquadratic.metrics.cleanfid was failing under strict-mode CI. Add docs/api/generated/ to .gitignore so the per-sub-page autosummary stubs don't get tracked. Strict build (`-W --keep-going`) succeeds locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: collapse sidebar to top-level pages; drop redundant pdoc guide Theme tweaks in docs/conf.py: navigation_depth = 2 — primary sidebar stops at api/ops, api/modules, etc. The flat list of autosummary-generated function/class stubs no longer clutters the left sidebar; per-page H2 group headers stay accessible via the right "On this page" panel. show_nav_level = 1 show_toc_level = 2 README cleanup: drop the pdoc-and-IDE-hover docs viewing options. Sphinx is the canonical site (every PR-documented module is now in docs/api/*.rst), so pdoc was redundant. Replace with a one-line note that IDE hover / help() also work since docstrings are inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(examples): co-locate fid_stats with imagenet_diffusion configs The two FID stats .npz files are only consumed by examples in examples/imagenet_diffusion/ and by scripts/generate_jit_fid_stats.py. Move them under the example directory so the data sits next to the only thing that uses it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(api): rename docs/api → docs/api_reference; add core + experiments PR A of the documentation follow-up: surface every documented module in the rendered Sphinx site without rewriting any docstrings. Layout: docs/api_reference/index.rst — toctree landing docs/api_reference/ops.rst — (renamed from docs/api/ops.rst) docs/api_reference/modules.rst — (renamed) docs/api_reference/networks.rst — (renamed) docs/api_reference/parallel.rst — (renamed) docs/api_reference/core.rst — NEW: lazy_config, metrics, utils (qk_norm, rope, quack), testing helpers — supersedes the old api/utilities.rst, deleted here docs/api_reference/experiments.rst — NEW: experiments.run, .trainer, .default_cfg dataclasses, all Lightning wrappers, callbacks, datamodules, utils docs/index.rst now points the toctree at `api_reference/index`; the thin `docs/api.rst` stub is gone (one landing page, not two). CI infra adjustments needed to surface the experiments package: * autodoc_mock_imports gains pytorch_lightning, lightning, matplotlib, PIL, datasets, h5py, scipy, the_well, torch_fidelity, torchmetrics, torchvision, timm, wandb, rich, tqdm — pure-Python deps the docs runner skips to stay lean. Also lift `nvidia` to a top-level mock so any future `nvidia.*` sub-import is covered. * sphinx.ext.todo enabled with `todo_include_todos = True` so `.. todo::` blocks in docstrings render rather than erroring out. * .gitignore: add `docs/build/` (legacy artifact path) and `docs/api_reference/generated/` (per-sub-page autosummary stubs); whitelist `docs/api_reference/core.*` so the doc page isn't caught by the repo-wide `core.*` core-dump pattern. Tighten `_rewrite_repo_links` in conf.py with a comment explaining the regex's anchoring so future edits don't accidentally rewrite intra-docs links. Drive-by fix: the `AutoregressiveWrapper.__doc__` had an indented bullet list under a `.. todo::` directive that RST mis-parsed. Convert to a single-paragraph note so the strict build is clean. No docstrings were rewritten in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(tracker): close open rows + add absent files; tidy package __init__s PR B of the documentation follow-up: bring docs-tracker.md to an honest state so every file under nvsubquadratic/ and experiments/ is either an [x] or a documented exclusion. Tracker rows resolved (closed [ ] -> [x]): - huggingface_diffusers.py — expanded module docstring (DiT/UVit adapter contract, BHL↔BCHW translation, dtype handling, shared timestep state monkey-patch registration). Per-class docstrings for the four public classes. - jit_utils.py — expanded module docstring linking to the upstream JiT repo and enumerating helpers (VisionRotaryEmbedding{,Fast}, RMSNorm, sin-cos PE). License header left untouched per repo policy on this branch. - jit.py — already had per-class docstrings; no change. Tracker rows still [ ] (PR pointer): - patch_merging.py / vit5_hierarchical_classification.py — annotate the pending row with #122 so reviewers can track when this flips. Tracker rows newly added (missing entries): - baselines/unet_convnext.py and unet_convnext_v2.py (both already had good module docstrings). - parallel/utils.py — CP comm utilities. - parallel/test_a2a_comms.py — kept in place with a tracker note; moving to tests/ would expand this PR's scope. - datamodules/emnist.py, datamodules/pde/well.py, datamodules/utils/dali_rand_augment.py. Package __init__.py tidy-ups (license headers untouched): - nvsubquadratic/__init__.py — drop dead `# TODO: Import …` block and `# TODO: Add main exports` placeholders in __all__. - nvsubquadratic/modules/__init__.py — add a one-line module docstring. - nvsubquadratic/ops/__init__.py — add a one-line module docstring. Drive-by: AutoregressiveWrapper docstring formatting (`.. todo::` block collapsed to one paragraph so the strict Sphinx build is clean). Verification: - `ruff check --select D100,D101,D102,D103,D301,D417` clean on the three __init__.py files and the touched network files. - `make -C docs html ... SPHINXOPTS=-W --keep-going` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: narrative pages — getting started, architecture, examples, benchmarks PR C of the documentation follow-up: a new reader landing on the site can install the library, see the three-layer architecture, find an example for their task, and read the throughput numbers — all without leaving the docs. New pages (toctree'd into docs/index.rst above the API Reference): - docs/getting_started.md — distilled install matrix (links the README for long-form), CUDA/GPU requirements, and a minimal "Hello, Hyena" snippet that ends in a working `fftconv2d_fp32_bhl(x, kernel)` forward pass. - docs/architecture.md — ASCII diagram of the nvsubquadratic / subquadratic-ops / megatron-core layering, what each layer owns, the BHL/BLH/`_w_reshape`/`_chunked`/fp16 naming conventions, the QKVSequenceMixer operator-agnostic dispatch story, and the LazyConfig system. - docs/examples/index.md — one paragraph per top-level recipe under examples/ (classification, diffusion, spatial recall, benchmarks, scientific). Links to each example's README or primary config and to examples/overview_tracker.md for the active roadmap. - docs/benchmarks.md — FLOP-scaling plot (symlinked into _static/ so the file stays single-source under benchmarks/) and an MyST `{include}` of benchmarks/README.md for the ViT-5-Small throughput tables. Links out to benchmarks/ops/FP16_FFTCONV_RESULTS.md. Scope note in docs-tracker.md updated to acknowledge that docs/ narrative pages are now in scope on this branch (read-only links to the README and examples/overview_tracker.md — they're not duplicated). Verification: - `make -C docs html ... SPHINXOPTS=-W --keep-going` clean with the new pages, the `{include}` resolves, and the symlinked PNG is copied into _build/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(api): surface huggingface/jit/baselines/fp16 ops Close the remaining acceptance-check gaps from PR A's deferred-by-design list: api_reference/networks.rst: - Diffusion — Hugging Face adapters section (HF DiT/UVit configs and wrappers). - Diffusion — JiT backbone section (JiT, JiTBlock, the 8 model classes, the 7 JiT_* factory functions, plus jit_utils helpers). - Baselines section (UNet-ConvNeXt v1/v2 + their Well-task wrappers). api_reference/ops.rst: - FFT convolutions (fp16) — half-precision linear-conv variants (BHL + _w_reshape + _chunked, 12 functions). - Circular FFT convolutions (fp16) — periodic-boundary fp16 variants (6 functions). Link the FP16 derivation page from the section intro for the dual-mean-centering background. Drive-by: collapse `[B, C, *spatial]` -> ``[B, C, *spatial]`` in UNetConvNext.forward and UNetConvNextV2.forward docstrings so the `*spatial` token doesn't get parsed as an RST emphasis open. Final acceptance: 13 -> 6 tracked-but-not-in-API entries, and all six remaining are correctly excluded (4 are still `[ ]` in the tracker, one is a test file, one is an internal helper). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docs): mock diffusers for autodoc nvsubquadratic.networks.huggingface_diffusers imports DiTTransformer2DModel and UVit2DModel from `diffusers` at module load. diffusers is a runtime dep (installed in the conda env) but isn't pulled in on the docs runner under `pip install -e . --no-deps`, so autodoc fails to import the module when the api_reference/networks.rst page references it. Adding diffusers to autodoc_mock_imports lets autodoc resolve the dotted references (`diffusers.models.DiTTransformer2DModel`, etc.) via the mock attribute chain. Verified locally with `pip uninstall diffusers` then strict build: `make -C docs html ... SPHINXOPTS=-W --keep-going` succeeds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(repo-organization): benchmarks + scripts/visualization READMEs; … (#124) * docs(repo-organization): benchmarks + scripts/visualization READMEs; migrate reports Brings benchmarks/, scripts/visualization/, and the top-level visualizations/ directory onto the same documentation footing as the library code, and migrates point-in-time profiling artefacts into the reports/ framework where they belong. Tasks 1–2: benchmarks/ - New per-subdirectory READMEs: benchmarks/ops/, vit5_imagenet/, well/. - Module-level docstrings (4-question format) on every previously-short benchmark script. - One-line driver-header comments on the three SLURM .sh runners. - ruff D100 is clean across benchmarks/. Task 3: scripts/visualization/ - README documents the Streamlit (.json) vs Gradio (.npz) kernel-viewer divergence and when to reach for each. - visualize_patch_size_throughput.py moved into the visualization/ subdir; usage example in its docstring updated. - Stale `scripts/visualize_kernels*.py` usage paths in the two kernel viewers' docstrings corrected to `scripts/visualization/...`. Task 4: visualizations/ -> reports/spatial_recall/ - 10 PNGs migrated via `git mv` (history preserved). - New REPORT.md narrates each 1D / 2D / 3D task, links each figure to its `examples/spatial_recall_*` config, and documents the regeneration command. - Top-level visualizations/ directory removed. Task 5: dated profile artefacts -> reports/vit5_imagenet_dataloader_profiling/ - Seven files migrated via `git mv` (Day 1 + Day 2 .jsonl runs and the two profilers + their SLURM drivers). - New REPORT.md folds the old `dataloader_profile_2026-02-25.md` prose into a two-day write-up that also covers the Day 2 `step_breakdown_2026-02-26.jsonl` (instrumented step + GPU-event view + theoretical-min gap). - Pointer line added in benchmarks/vit5_imagenet/README.md. - Old .md removed (content now lives in REPORT.md). Task 6: docs/ link - docs/reports.md standalone landing page indexing every topic by absolute GitHub blob URL (the `{include}` fallback path — the index-table links in reports/README.md use bare relative paths that don't resolve under MyST cross-references). - Added to the docs/index.rst toctree above Ops Overview. - Strict Sphinx build (`-W --keep-going`) clean. Tracker: - docs-tracker.md Scope extended: benchmarks/, scripts/visualization/, and reports/ are now in scope at a lighter bar (module docstrings + per-subdir README; no Sphinx API entry). - Three new progress tables (benchmarks/, scripts/visualization/, reports/) with a row per file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(repo): move root slurm/ -> scripts/slurm/ Consolidate SLURM submit / driver scripts under one tree. The root slurm/ directory held a mix of per-experiment submit scripts and cluster-specific helpers that predate the portable wrapper introduced in PR #113. Move them all under scripts/slurm/ so there's a single place for SLURM-related scripts. Name-collision resolution: slurm/submit.sh -> scripts/slurm/submit_imagenet64_diff.sh The root slurm/submit.sh (2026-03-16) was a one-off ImageNet-64 diffusion submit script with hardcoded ``--account=healthcareeng_research``, 4 nodes, 4 h time limit, and the ``imagenet64.n4`` singleton job name. scripts/slurm/submit.sh (2026-05-18, PR #113) is the new portable wrapper that auto-detects project root and reads cluster.env for per-cluster overrides. Both are kept; the older one is renamed to its actual job-name semantics. No other collisions — the rest of the root slurm/ tree (queue.sh, queue_well.sh, submit_hybrid*.sh, submit_in1k_*.sh, submit_well.sh, download_well.sh, tg_download_well.sh, the diffusion/ subdir, and enroot/build_sqsh.sh) lands flat under scripts/slurm/. All moves via ``git mv`` so ``git log --follow`` traces history through the rename. Updated documentation references to the new paths in: - README.md (Enroot section: scripts/slurm/enroot/build_sqsh.sh). - examples/imagenet_diffusion/README.md (sbatch path + SLURM-scripts list). - examples/vit5_imagenet/vit5_hybrid/plan.md (historical retrospective; bulk-rewrite of all script paths). - examples/well/README.md, examples/well/v{1,2}/TRACKER.md (sbatch scripts/slurm/download_well.sh ...). - reports/ckconv_block_diagonal_kernel/REPORT.md (submit_hybrid.sh paths in the run commands). ``#SBATCH --output=slurm/%x_%j.out`` lines in the per-experiment sbatch scripts are left alone — that's a runtime log-directory convention, not a source-tree reference; sbatch creates the directory when the job runs. Strict Sphinx build (``-W --keep-going``) clean after the move. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(repo): move benchmark scripts from scripts/ to benchmarks/ Address the scripts/ vs benchmarks/ overlap: scripts/ was a catch-all that included four benchmark-shaped files, one a stale duplicate. scripts/benchmark_imagenet_diffusion_gpu.py -> (deleted — duplicate) scripts/benchmark_imagenet_throughput.py -> benchmarks/vit5_imagenet/ scripts/benchmark_patch_size_2d.py -> benchmarks/ scripts/profile_batch_size.py -> benchmarks/well/ (supernova_explosion_64 is a WELL sub-dataset) Each move via `git mv` so `git log --follow` traces through the rename. All four carry the 4-question module docstring format (three already did; profile_batch_size.py expanded here). References updated: - scripts/visualization/README.md (pointer to benchmark_patch_size_2d.py). - benchmarks/vit5_imagenet/README.md (new section for benchmark_imagenet_throughput.py). - benchmarks/well/README.md (bullet for profile_batch_size.py). - docs-tracker.md (three new `[x]` rows under benchmarks/). After this change scripts/ holds only utility / glue scripts (data prep, GPU/license sanity, kernel viewers, SLURM submit) and benchmarks/ is the single home for performance measurement. ruff D100 clean on benchmarks/; strict Sphinx build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(tests): move test_a2a_comms into tests/parallel Tests should live under the canonical tests/ tree, not inside the library package. Move nvsubquadratic/parallel/test_a2a_comms.py into tests/parallel/test_a2a_comms.py (with a new __init__.py) so collection happens through the standard tests/ root. Updated references: - docs-tracker.md: the tracker note for `test_a2a_comms.py` now points at the new path instead of explaining why it lived next to its target. - tests/README.md: added a `parallel/` row to the directory diagram. License headers on this file and on tests/conftest.py left untouched per repo policy on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(scripts): co-locate jit fid stats with other data prep scripts/generate_jit_fid_stats.py lived at the top of scripts/ but is purely a data-prep script — it pre-computes FID reference statistics on ImageNet for the JiT diffusion eval path. Other data-prep helpers (extract_imagenet_to_folder.py, compute_imagenet_stats.py, stage_imagenet.sh, …) already live under scripts/data/, so move this one in too. scripts/generate_jit_fid_stats.py -> scripts/data/generate_jit_fid_stats.py Move via `git mv` so `git log --follow` traces the rename. The file had no module docstring; added one in the 4-question format (what / hardware / how to invoke / where output goes) referencing the new path. `grep -rn` confirms no other tracked files referenced the old path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(benchmarks): flatten vit5_imagenet/scripts/ runner directory benchmarks/vit5_imagenet/scripts/ held only three SLURM drivers (bench_compile.sh, bench_optimized.sh, bench_profile.sh) for the matching bench_vit5_*.py files in the parent directory. An extra one-file-deep subdirectory adds nothing — move them up next to the .py files they invoke and drop the empty scripts/ subdir. benchmarks/vit5_imagenet/scripts/bench_compile.sh -> ../bench_compile.sh benchmarks/vit5_imagenet/scripts/bench_optimized.sh -> ../bench_optimized.sh benchmarks/vit5_imagenet/scripts/bench_profile.sh -> ../bench_profile.sh Each .sh runner invokes its target via the repo-root path ``PYTHONPATH=. python benchmarks/vit5_imagenet/bench_vit5_*.py``, so no in-file path changes are needed. Updated references: - benchmarks/README.md: sbatch invocations now point at the flat path. - benchmarks/vit5_imagenet/README.md: SLURM-scripts section reflects the new layout. - docs-tracker.md: the tracker row globs updated from `scripts/bench_*.sh` to `bench_*.sh`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add package overview / library tour Bottom-up tour of nvsubquadratic/ between Architecture and Examples in the docs site: one short paragraph each for ops/, modules/, networks/, parallel/, and the supporting utils/metrics/testing/ lazy_config layer. Opens with the bottom-up organising principle and closes with a "Where to go next" pointer at the API reference, Examples, and the Ops Overview math primer. docs/index.rst: - "Where to go next" bullet list gains a Package Overview entry. - Toctree gains `Package Overview <package_overview>` between Architecture and Examples. Strict Sphinx build (`-W --keep-going`) clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: rewrite package overview around a directory tree; tracker follow-ups Two related changes: 1. Package overview: revise docs/package_overview.md to lead with an actual `nvsubquadratic/` directory tree (one line per .py with a short purpose tag) and then narrate each area underneath. The previous version was a wall of cross-references with no layout signal — the tree gives readers a fast scan of what lives where before they read the prose. 2. docs-tracker.md follow-ups: - Scope sentence enumerates the docs/ narrative pages explicitly ("Getting Started, Architecture, Package Overview, Examples, Benchmarks, Reports") rather than leaving Package Overview implicit. - parallel/ table loses the `test_a2a_comms.py` row (the file has moved to tests/parallel/, which is out of tracker scope) and gains a one-line pointer underneath so the move is documented without leaving a misleading row in the parallel/ table. Strict Sphinx build (`-W --keep-going`) clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: replace package overview with repository overview The previous "Package Overview" page only mapped `nvsubquadratic/`, which left every other top-level directory (experiments/, examples/, benchmarks/, reports/, scripts/, tests/, docs/) invisible to a new reader. Promote the page to a true repository overview: docs/package_overview.md -> docs/repository_overview.md (git mv) Content reorganised around the repo tree: - ASCII tree of the repo root (every top-level dir + the notable files: README, pyproject.toml, Dockerfile, CONVENTIONS.md, docs-tracker.md, setup_conda_env.sh, nvsubquadratic.def). - Second tree drills into `nvsubquadratic/` itself (carried over from the previous page — that's still useful, just no longer the whole page). - "What each top-level directory does" paragraph per area (nvsubquadratic / experiments / examples / benchmarks / reports / scripts / tests / docs) and what makes each distinct. - Closes with the "Where to go next" pointer. docs/index.rst: - Toctree entry "Package Overview <package_overview>" renamed to "Repository Overview <repository_overview>". - "Where to go next" bullet copy reflects the broader scope. docs-tracker.md: - Scope sentence enumerates Repository Overview alongside the other narrative pages. Strict Sphinx build (`-W --keep-going`) clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: link CONVENTIONS.md + docs-tracker.md from the landing page Earlier in this session I prototyped a docs/conventions.md page that {include}-d the root CONVENTIONS.md so the docstring guide would become searchable from the docs site. Replaced that approach with a simpler link — no extra docs page, no GitHub-web rendering wrinkle where the {include} directive shows up as raw markdown. Add a small "Contributor docs" section to docs/index.rst with absolute GitHub URLs to both root files (CONVENTIONS.md and docs-tracker.md). Source of truth stays at the root files; the docs site just points at them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: David Wessels <dwessel@hipster-l1.science.uva.nl> Co-authored-by: David Wessels <dwessel@hipster-l2.science.uva.nl> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Farhad Ramezanghorbani <farhadr@nvidia.com> Co-authored-by: Farhad Ramezanghorbani <farhadrgh@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
PatchMerging,ViT5HierarchicalClassificationNet, and the fullv6_hierarchicalexample suite to the repo, along with docs-tracker updates.Changes
nvsubquadratic/modules/patch_merging.py— NEWSwin-style 2×2 spatial patch merging for hierarchical ViT-5/Hyena networks.
[B, H·W, C]→[B, (H/2)·(W/2), out_dim]grid_wtokens are a register row; register tokens pass through a dedicatedreg_projLinear (independent of the patch path) and are repacked to widthgrid_w // 2; padding slots come from a non-persistent zero bufferLazyConfig) + bias-freereductionLinear,trunc_normal_(std=0.02)initflop_count()helpertests/modules/test_patch_merging.py— NEW7 tests: output shapes (parametrised), zero-pad correctness, independent reg/patch paths, error guards (odd grid, over-large num_registers), FLOP count sanity.
nvsubquadratic/networks/vit5_hierarchical_classification.py— NEWSwin-style 4-stage hierarchical ViT-5 classifier built on
PatchMerging.pure(GAP over flat patch grid) andregister_row(FiLM registers as first grid row at every stage, excluded from GAP)LazyConfigflop_count()aggregates patch-embed + blocks + merges + headtests/networks/test_vit5_hierarchical_classification.py— NEW6 tests: forward shapes, param-count comparison, FLOP count, error guards, backward pass (spot-checks
patch_embedandreg_projgrads).examples/vit5_imagenet/v6_hierarchical/— NEW (14 files)_base_config.py— shared Swin-T-like Hyena hierarchy config (p=4, dims[96,192,384,768], depths[2,2,6,2], 800-epoch ImageNet recipe)hyena_hier_p4_pure.py/hyena_hier_p4_film.py— pure and register-row FiLM ImageNet configscifar10_hyena_hier.py/cifar10_hyena_flat.py— CIFAR-10 Hyena hier/flat configs_cifar10_patch_ablation_base.py+ 6 leaf configs — patch-size (p4/p8/p16) × hier/flat ablation grid for CIFAR-10docs-tracker.mdMark
patch_merging.pyandvit5_hierarchical_classification.pyas[x].Commits
feat(modules)—PatchMerging+ testfeat(networks+examples)—ViT5HierarchicalClassificationNet, v6 example configs, tracker update🤖 Generated with Claude Code