Skip to content

feat(projects): Add comprehensive example projects#176

Merged
ibro45 merged 61 commits intomainfrom
projects
Dec 9, 2025
Merged

feat(projects): Add comprehensive example projects#176
ibro45 merged 61 commits intomainfrom
projects

Conversation

@ibro45
Copy link
Collaborator

@ibro45 ibro45 commented Dec 9, 2025

Summary

  • Add 8 example projects demonstrating Lighter across diverse ML domains: image classification (CIFAR-10), EEG analysis, HuggingFace LLM text classification, LoRA fine-tuning, medical segmentation, self-supervised learning, video recognition, and vision-language models
  • Create documentation index for example projects with quick reference table
  • Refactor cifar10 project structure to use standardized models/ and networks/ directories

Projects Added

Project Domain Key Features
cifar10 Image Classification Basic LighterModule, MetricCollection, FileWriter callback
eeg EEG Analysis Braindecode integration, EEGDash data loading, NeurIPS 2025 Challenge
huggingface_llm Text Classification HuggingFace Transformers, tokenizer config
lora Parameter-Efficient Fine-Tuning PEFT LoRA adapters, memory-efficient training
medical_segmentation Medical Imaging MONAI 3D UNet, ITKWriter, Dice loss
self_supervised SSL Computer Vision SimCLR, NT-Xent loss, Lightly integration
video_recognition Video Understanding PyTorchVideo, R3D/Transformer architectures
vision_language Vision-Language CLIP training, image-text contrastive learning

Summary by CodeRabbit

  • New Features

    • Added many new example projects with ready-to-run configs and READMEs (EEG regression, LoRA fine-tuning, HuggingFace LLMs, self-supervised SimCLR, medical segmentation, video recognition, vision-language, CIFAR10, and more).
  • Documentation

    • Reorganized examples into an "Example Projects" hub and updated links/navigation across docs.
    • Removed legacy standalone image-classification and multi-GPU example pages; navigation updated to reference the new hub.

✏️ Tip: You can customize this high-level summary in your review settings.

ibro45 and others added 30 commits November 16, 2025 17:35
- Add explicit Callable[..., Any] type hints in adapters and callbacks
- Add proper generic types for dicts and lists in schema and utils
- Add type annotations to writer functions (write_tensor, write_image, write_video)
- Improve type safety with Optional and Union types where appropriate
- Add type: ignore directives for complex generic scenarios

These changes improve IDE support, catch potential type errors earlier,
and make the codebase more maintainable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace stateful self.mode attribute with dynamic _get_current_mode() method
that queries the trainer's state (training, validating, testing, predicting).

Benefits:
- Mode is always in sync with trainer state
- Eliminates potential stale mode bugs
- Makes code more robust and easier to reason about
- Properly handles sanity check as validation mode

This change affects _prepare_batch, _forward, _calculate_loss, and
_calculate_metrics methods, all of which now query mode dynamically.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace separate config and overrides parameters with unified inputs list.
Sparkwheel now auto-detects file paths vs overrides based on content.

Changes:
- Replace config + overrides params with single inputs list
- Remove comma-separated config file parsing (use space separation)
- Simplify CLI examples: 'lighter fit base.yaml exp.yaml lr=0.001'
- Update Runner.run() to use Config.update() with auto-detection
- Improve CLI help text and examples

This aligns with Sparkwheel's idiomatic usage pattern and simplifies
the user experience.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Upgrade sparkwheel to >=0.0.6 for improved auto-detection features
- Update uv.lock with new dependency versions
- Fix GitHub workflow YAML formatting for ignoreLabels field

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update documentation to reflect dynamic mode detection from trainer state.
Add explanations of how _get_current_mode() queries trainer.training,
trainer.validating, etc. to ensure mode is always in sync.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Update CLI tests to use new inputs parameter instead of config+overrides
- Update Runner tests to pass inputs list instead of separate parameters
- Remove tests for self.mode attribute (replaced with _get_current_mode())
- Update integration tests to use new CLI argument format
- Fix type annotations in test fixtures

All tests updated to align with the refactored Runner CLI interface
and System mode detection mechanism.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Major architectural changes:
- Replace System class with LighterModule (model.py) for cleaner separation
- Add Data module (data.py) for dataloader configuration
- Remove adapters pattern - functionality integrated into LighterModule
- Delete utility modules: model.py, patches.py, types/containers.py
- Update type enums and utility functions to support new architecture
- Simplify data utilities and misc helpers

This refactor simplifies the core API by removing the adapter abstraction
layer and providing a more direct interface through LighterModule.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Callback system improvements:
- Flatten writer callback structure (remove nested writer/ directory)
- Create base_writer.py with BaseWriter abstract class
- Consolidate csv_writer.py and file_writer.py at top level
- Enhance Freezer callback with improved parameter handling
- Remove callbacks/utils.py (functionality moved to callbacks themselves)

This simplifies the callback organization and makes writers easier to
discover and import.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Runner improvements:
- Leverage Sparkwheel's auto-detection for config validation
- Remove separate schema.py module (validation now handled by Sparkwheel)
- Simplify config loading and pruning logic
- Improve error handling and CLI interface
- Better integration with LighterModule instead of System

This change reduces code complexity by relying on Sparkwheel's built-in
schema validation capabilities rather than maintaining a separate schema module.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Example project changes:
- Add __lighter__.py for Sparkwheel auto-discovery
- Add model.py with CIFAR10Module (replaces inline System config)
- Move experiments/ → configs/ directory
- Create example.yaml, example_autodiscovery.yaml, example_overrides.yaml
- Update configurations to use LighterModule and new structure

This demonstrates the new pattern where models are defined in Python files
and referenced in configs, rather than being fully defined in YAML.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Test suite changes:
- Add conftest.py with shared fixtures
- Add tests/fixtures/ with plain Lightning module fixtures
- Refactor unit tests for LighterModule (replace System tests)
- Update callback tests for new writer structure
- Refactor runner tests for schema-less validation
- Add test_plain_lightning.py for plain Lightning integration
- Remove obsolete tests: adapters, schema, containers, model utils
- Update integration tests for CIFAR and config validation

All tests now work with the refactored Model/Data/Callback architecture
and Sparkwheel-based configuration system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Documentation overhaul:
- Delete old structure: design/, how-to/, tutorials/, migration/
- Add comprehensive guides: best-practices, configuration, custom-code,
  lighter-module, lightning-module, training
- Add practical examples: image-classification, multi-gpu
- Add quickstart guide and CLI reference
- Rewrite index.md and FAQ for new architecture
- Update mkdocs.yml navigation structure
- Remove outdated images (coverage.svg, feature screenshots, diagrams)

New docs focus on practical guides and examples rather than abstract
design documents, making it easier for users to get started.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Project infrastructure updates:
- Add .github/workflows/release.yml for automated releases
- Update pyproject.toml dependencies and metadata
- Expand .gitignore for better coverage
- Rewrite README.md to reflect new architecture and features
- Update uv.lock with latest dependency versions

These changes complete the redesign by updating all project-level
configuration to support the new architecture.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Documentation improvements:
- Add comprehensive coverage of _args_, _mode_, and _disabled_ syntax
- Document callable and debug modes with practical examples
- Expand logging documentation with detailed loss/metrics/optimizer stats
- Add examples for multi-component loss logging
- Clarify automatic optimizer stats logging (LR, momentum, betas, weight decay)
- Update quick reference table with new syntax elements

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Breaking change to metrics API:
- Remove automatic list-to-MetricCollection conversion in LighterModule
- Require explicit MetricCollection wrapper in configs for multiple metrics
- Add validation with helpful error messages showing correct syntax
- Support both single Metric and MetricCollection in logging
- Update CIFAR10 examples to use MetricCollection wrapper
- Update test suite for new metrics API

This change makes metric configuration more explicit and aligns with
torchmetrics best practices for handling multiple metrics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add DDP testing infrastructure:
- Add test_callbacks_writers_ddp.py with 10 DDP tests for writers
- Test rank-specific temp files, distributed gathering, and cleanup
- Test barrier synchronization and edge cases (empty ranks, None paths)
- Mock distributed environment to enable fast, debuggable tests on single device
- Update existing error tests for consistency

This mock-based approach provides fast feedback during development while
ensuring comprehensive coverage of distributed code paths.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Configuration improvements:
- Clarify codecov threshold comment for better understanding
- Use wildcard pattern for docs coverage ignore (docs/**/* instead of docs/*)
- Add test artifact ignores (config.yaml files in root and project directories)

These changes improve coverage reporting accuracy and prevent test-generated
configuration files from being accidentally committed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Major improvements to Runner class:
- Remove OutputDir class; save config directly to trainer.log_dir instead
- Add _save_config() method that integrates with Lightning's logging system
- Improve workflow documentation with detailed step-by-step docstrings
- Refactor CLI to use shared add_common_args() function (DRY principle)
- Change CLI boolean flags to action="store_true" for better UX
- Remove unused datetime and dataclass imports
- Remove config parameter from _execute() (no longer needed)

Benefits:
- Simpler architecture: leverages PyTorch Lightning's built-in directory management
- Better integration: config.yaml appears alongside checkpoints in version directories
- Cleaner CLI: --verbose instead of --verbose True
- More maintainable: shared argument definitions prevent drift

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add safety mechanism to prevent infinite loops when replacement samples
are also corrupted:
- Add max_retries parameter (default: 100) to limit retry iterations
- Raise RuntimeError with detailed message if limit is exceeded
- Track iteration count separately from corrupted sample count
- Update docstring with parameter documentation and exception info

This prevents the collate function from hanging indefinitely when datasets
have high corruption rates or systematic corruption issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Cleanup changes:
- Remove setattr_dot_notation() function from misc.py (unused in codebase)
- Remove tests for setattr_dot_notation
- Remove version-specific comment from model.py header
- Update test file docstring to remove version reference

This simplifies the codebase by removing dead code and outdated version
references, making the code more maintainable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Documentation improvements:
- Add missing torch imports to README and index examples
- Fix parameter name in examples (lr → learning_rate for consistency)
- Clarify DDP behavior: devices=k selects k physical GPUs (no virtualization)
- Improve multi-node DDP command examples with environment variables
- Update CIFAR10 model docstring to remove version reference

These changes improve documentation accuracy and help users understand
that DDP requires actual physical GPUs, preventing confusion about
GPU "simulation" or virtualization.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update tests to match new Runner architecture:
- Add mocks for new _save_config() and _save_hyperparameters() methods
- Remove OutputDir tests (class no longer exists)
- Add new tests for _save_config() with trainer.log_dir
- Remove config parameter from _execute() calls in all tests
- Test that _save_config() handles missing log_dir gracefully
- Update all integration tests with new mock structure

All tests pass with the refactored Runner implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Update image-classification examples to use CLI overrides instead of
  separate config files for different architectures
- Clarify config composition via CLI (passing multiple config files)
- Document that _disabled_ components are removed from lists/dicts
- Simplify pretrained finetuning example
- Add info box explaining Sparkwheel merge behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
ibro45 and others added 16 commits December 1, 2025 16:44
CLIP-style dual-encoder for image-text alignment:
- ResNet-50 image encoder + Transformer text encoder
- Freezer callback for backbone warmup
- Differential learning rates per component
- Learnable temperature parameter
- Custom collate function with _mode_: callable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Parameter-efficient fine-tuning on CIFAR-100:
- Built-in LoRA wrapper (~1% trainable parameters)
- Freezer callback for backbone freezing
- $ expressions for trainable parameter filtering
- Configurable rank, alpha, dropout
- CsvWriter for prediction logging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Remove model.py and models/net.py that were part of the old architecture.
The current project uses networks/net.py instead, and training works correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Prevents dataset directory from being tracked.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Remove per-project whitelist approach. Now just ignores generated files
(lightning_logs/, .datasets/, outputs/) within projects folder.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Rename project from efficient_finetuning to lora
- Replace custom LoRA implementation with HuggingFace PEFT library
- Update config and README for PEFT-based workflow
- Add adapter save/load/merge functionality

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Simplify main README to just list projects with dependencies
- Standardize usage: pip install lighter <deps>, cd to project, run lighter
- Remove pyproject.toml references (projects don't have them)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add num_workers to vars section (default: 4)
- Replace hardcoded num_workers: 0 with reference to vars
- Fix explicit re-exports in __init__.py

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Move MONAI transforms from Python functions to YAML config
- Replace nibabel with MONAI's ITKWriter for NRRD output
- Make num_workers configurable via vars

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Expand README with detailed SSL explanation
- Simplify dataset.py (remove helper functions)
- Enhance SimCLR model with better documentation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Remove limit_train_batches and limit_val_batches from config.
Users can add trainer::fast_dev_run=true for quick testing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix explicit re-exports in dataset.py
- Update config defaults

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix explicit re-exports in dataset.py
- Update config defaults

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Dec 9, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Warning

Rate limit exceeded

@ibro45 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 11 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 7b529d0 and 26f4227.

📒 Files selected for processing (3)
  • .github/PULL_REQUEST_TEMPLATE.md (2 hunks)
  • projects/medical_segmentation/writers.py (1 hunks)
  • projects/vision_language/dataset.py (1 hunks)

Walkthrough

Consolidates example docs into a new examples index, removes legacy example pages, restructures docs navigation and mkdocs.yml, updates .gitignore, adds eight new example project directories with configs, datasets, models, and utilities, removes a future import in dynamic_imports, and bumps sparkwheel dependency.

Changes

Cohort / File(s) Summary
Docs removed & added
docs/examples/image-classification.md, docs/examples/multi-gpu.md, docs/examples/index.md
Removed two legacy example pages and added a centralized examples index.
Docs navigation updates
docs/index.md, docs/quickstart.md, docs/guides/*.md, docs/reference/cli.md, mkdocs.yml
Updated links and navigation to point to the new examples index and restructured Examples subtree.
.gitignore changes
.gitignore, projects/video_recognition/.gitignore
Reorganized ignore patterns, removed some central entries, added/relocated outputs and .datasets/ ignore rules.
Projects root
projects/README.md
Renamed Projects README to Example Projects and added a project table.
CIFAR-10 project (updates)
projects/cifar10/README.md, projects/cifar10/configs/example.yaml, projects/cifar10/models/model.py, projects/cifar10/networks/__init__.py
Added README, updated config module paths, removed defensive criterion checks in training/validation, and added networks package initializer with Net re-export.
EEG project (new)
projects/eeg/... (README.md, __init__.py, __lighter__.py, configs/*.yaml, data/*, models/*, submission.py)
New end-to-end EEG example: dataset loaders, subject/window splitting, models (EEGRegressionModel, aggregator), configs, submission tooling, and package exports.
HuggingFace LLM project (new)
projects/huggingface_llm/... (README.md, __lighter__.py, configs/imdb.yaml, dataset.py, models/*)
New text-classification example with dataset helper, model wrapper, and config.
LoRA project (new)
projects/lora/... (README.md, __init__.py, __lighter__.py, configs/lora.yaml, dataset.py, models/*, networks/*)
New LoRA/PEFT example: datasets, LoRA wrapper, backbone networks, LoRA model, configs, and package exports.
Medical segmentation project (new)
projects/medical_segmentation/... (README.md, __init__.py, __lighter__.py, configs/spleen.yaml, models/*, writers.py)
MONAI-based 3D segmentation example: config, model (SegmentationModel), writer util, and metadata.
Self-supervised project (new)
projects/self_supervised/... (README.md, __init__.py, __lighter__.py, configs/simclr.yaml, models/*, networks/*)
SimCLR example: encoder builders, SimCLRModel, configs, and exports.
Video recognition project (new)
projects/video_recognition/... (.gitignore, README.md, __init__.py, __lighter__.py, configs/*.yaml, dataset.py, models/*, networks/*, writers.py)
Video examples: VideoClipDataset, collate fn, R3D and VideoTransformer implementations, models, writers, and configs.
Vision-language project (new)
projects/vision_language/... (README.md, __init__.py, __lighter__.py, configs/clip.yaml, dataset.py, models/*, networks/*)
CLIP-style project: datasets, CLIP networks and encoders, CLIPLighterModel, collate, and config.
Framework & tooling
src/lighter/utils/dynamic_imports.py, pyproject.toml
Removed from __future__ import annotations and bumped sparkwheel dependency 0.0.9 → 0.0.10.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas needing extra attention:

  • projects/eeg/data/hbn_dataset.py — subject/window splitting, filtering, and dataset returns.
  • projects/eeg/submission.py — checkpoint state_dict extraction and weight export/strip logic.
  • projects/lora/networks/lora.py — PEFT integration, adapter save/load/merge correctness and parameter selection.
  • projects/video_recognition/networks/video_models.py — 3D convs, patch embedding, temporal reshaping and transformer tokenization.
  • projects/vision_language/networks/clip_model.py — encoder normalization, temperature/learnable log-temperature handling, and similarity computation.
  • projects/cifar10/models/model.py — removed defensive criterion checks (risk of silent runtime errors if criterion missing).

Possibly related PRs

Suggested reviewers

  • surajpaib

Poem

🐰 I nibbled through docs and code with care,

Planted eight projects in a neat row there,
From EEG beats to video frames that play,
LoRA, CLIP, and SimCLR hop into the day,
A jubilant twitch — the repo's spry and fair!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.55% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main purpose of the changeset: adding comprehensive example projects to the Lighter framework.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

♻️ Duplicate comments (1)
projects/huggingface_llm/networks/__init__.py (1)

1-5: Broken import until create_model exists in model_factory

As written, from .model_factory import create_model will fail because create_model is not defined there (see comment on projects/huggingface_llm/networks/model_factory.py). Once the factory is added, this re-export pattern is fine.

🟡 Minor comments (11)
projects/self_supervised/README.md-1-110 (1)

1-110: Fix YAML configuration example in README—target path is incomplete.

The documented configuration example on line 72 shows an incomplete _target_ path:

model:
  _target_: project.models.SimCLRModel

However, the actual configs/simclr.yaml uses the full submodule path:

model:
  _target_: project.models.model.SimCLRModel

Users copying the example from the README will encounter module resolution errors. Update line 72 to include the submodule name: _target_: project.models.model.SimCLRModel.

Additionally, verify the network configuration example (if shown in README) against the actual config, which uses project.networks.encoder.create_simclr_model.

projects/medical_segmentation/dataset.py-1-11 (1)

1-11: Docstring mentions datasets not imported.

The module docstring lists three MONAI datasets (DecathlonDataset, MedNISTDataset, TciaDataset), but only DecathlonDataset is imported. This mismatch could confuse users.

Consider either:

  1. Importing all mentioned datasets, or
  2. Updating the docstring to focus on DecathlonDataset specifically

Apply this diff for option 2:

-"""Medical imaging datasets using MONAI.
-
-MONAI provides ready-to-use medical imaging datasets including:
-- DecathlonDataset: Medical Segmentation Decathlon (10 tasks)
-- MedNISTDataset: Medical version of MNIST
-- TciaDataset: The Cancer Imaging Archive datasets
-
-All transforms are defined directly in the YAML config using MONAI's transform classes.
-"""
+"""Medical imaging dataset using MONAI.
+
+This module exposes MONAI's DecathlonDataset for the Medical Segmentation Decathlon.
+All transforms are defined directly in the YAML config using MONAI's transform classes.
+
+MONAI provides other ready-to-use datasets (MedNISTDataset, TciaDataset) that can be
+imported as needed.
+"""
docs/examples/index.md-13-20 (1)

13-20: Align vision_language description with actual project behavior

The table currently describes vision_language as “BLIP-2 image captioning,” but elsewhere in this PR the project is documented/implemented as a CLIP-style vision-language model with contrastive training. It’d be good to update this row so the description matches what the code actually does (or vice versa if BLIP-2 is the intended direction).

projects/medical_segmentation/writers.py-40-53 (1)

40-53: Validate batch dimension and spatial rank before writing NRRD

The docstring claims the function accepts "(D, H, W) or (C, D, H, W)", but the code silently accepts 5D input and drops all but the first batch element with data = data[0]. This means if someone passes a batch with B > 1, the extra data is discarded without warning. Additionally, after the optional batch squeeze, there's no validation that the result is actually 3D or 4D—arbitrary shapes pass through to ITKWriter.

Consider making the expectations explicit:

-    # Remove batch dim if present: (B, C, D, H, W) -> (C, D, H, W) or (D, H, W)
-    if data.ndim == 5:
-        data = data[0]
+    # Remove batch dim if present: expect a single example
+    if data.ndim == 5:
+        if data.shape[0] != 1:
+            raise ValueError(
+                f"write_nrrd expects a batch dimension of size 1, got {data.shape[0]}."
+            )
+        data = data[0]
+
+    # After optional batch squeeze, require 3D (DHW) or 4D (CDHW)
+    if data.ndim not in (3, 4):
+        raise ValueError(
+            f"write_nrrd expects a 3D or 4D tensor (DHW or CDHW), got shape {data.shape}."
+        )

Also update the docstring to document the optional (1, C, D, H, W) batch shape if this behavior is intentional.

projects/vision_language/configs/clip.yaml-144-144 (1)

144-144: Reconsider drop_last=true for validation.

Setting drop_last=true for the validation dataloader will discard incomplete batches, potentially excluding data from evaluation metrics. Unless there's a specific reason (e.g., batch normalization requirements), validation should typically use drop_last=false to evaluate on all samples.

projects/huggingface_llm/models/model.py-42-52 (1)

42-52: Remove unnecessary input_ids from predict_step return value.

The predict_step returns "input_ids", but the CsvWriter callback in imdb.yaml (line 30) only specifies keys: ["prediction"]. CsvWriter only writes keys explicitly listed in its configuration, so input_ids is ignored and serves no purpose. Remove it from the return dictionary.

projects/video_recognition/dataset.py-79-80 (1)

79-80: Unreachable truncation branch.

Since max_t is computed as max(v.shape[1] for v in videos), no video can have v.shape[1] > max_t. This branch is dead code.

If truncation is intended for a fixed max_t parameter, consider making it configurable. Otherwise, remove the unreachable branch:

     for v in videos:
         if v.shape[1] < max_t:
             pad_size = max_t - v.shape[1]
             v = torch.nn.functional.pad(v, (0, 0, 0, 0, 0, pad_size))
-        elif v.shape[1] > max_t:
-            v = v[:, :max_t]
         padded_videos.append(v)
projects/video_recognition/models/model.py-65-66 (1)

65-66: Potential failure if fewer than 5 classes.

probs.topk(5, dim=1) will raise a runtime error if the model outputs fewer than 5 classes. Consider clamping k to the number of classes.

         # Get top-5 predictions
-        top5_probs, top5_indices = probs.topk(5, dim=1)
+        k = min(5, probs.shape[1])
+        topk_probs, topk_indices = probs.topk(k, dim=1)

Committable suggestion skipped: line range outside the PR's diff.

projects/vision_language/dataset.py-52-59 (1)

52-59: Missing f-string prefix causes literal {self.root} in error message.

Lines 55-57 are plain strings, not f-strings, so {self.root} will be displayed literally instead of interpolated.

         raise FileNotFoundError(
             f"No data found for Flickr30k dataset at '{self.root}'.\n\n"
-            "Please download the dataset:\n"
-            "1. Request access: https://shannon.cs.illinois.edu/DenotationGraph/\n"
-            "2. Extract images to: {self.root}/flickr30k-images/\n"
-            "3. Download captions to: {self.root}/results_20130124.token\n\n"
-            "Or use Flickr8kDataset for a smaller, easier-to-obtain alternative."
+            f"Please download the dataset:\n"
+            f"1. Request access: https://shannon.cs.illinois.edu/DenotationGraph/\n"
+            f"2. Extract images to: {self.root}/flickr30k-images/\n"
+            f"3. Download captions to: {self.root}/results_20130124.token\n\n"
+            f"Or use Flickr8kDataset for a smaller, easier-to-obtain alternative."
         )
projects/eeg/data/hbn_dataset.py-373-384 (1)

373-384: Potential division by zero if valid_frac + test_frac is 0.

If both valid_frac and test_frac are 0, the second train_test_split call at line 381 will divide by zero. Additionally, if valid_frac + test_frac >= 1.0, test_size becomes invalid.

Consider adding validation:

 def get_train_val_test_split(
     subjects: list[str],
     valid_frac: float = 0.1,
     test_frac: float = 0.1,
     seed: int = 2025,
 ) -> tuple[list[str], list[str], list[str]]:
+    if valid_frac + test_frac <= 0:
+        return subjects, [], []
+    if valid_frac + test_frac >= 1.0:
+        raise ValueError("valid_frac + test_frac must be < 1.0")

Committable suggestion skipped: line range outside the PR's diff.

projects/eeg/models/eeg_model.py-159-175 (1)

159-175: "weighted" strategy documented but not implemented.

The docstring mentions "weighted: Weighted average based on confidence" but the implementation falls back to mean. Either implement it or remove from documentation.

     Aggregation strategies:
     - mean: Average of all window predictions
     - median: Median of all window predictions
-    - weighted: Weighted average based on confidence

Committable suggestion skipped: line range outside the PR's diff.

🧹 Nitpick comments (31)
projects/cifar10/README.md (1)

38-38: Add trailing newline.

The file is missing a trailing newline at the end, which is a common convention for text files.

 - [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html)
+
projects/lora/README.md (1)

56-63: Optionally specify language identifiers for fenced code blocks.

The mathematical/pseudocode blocks could optionally use language identifiers like text or python for better rendering and consistency with markdown linting tools.

Apply this diff:

-```
+```text
 h = Wx

LoRA adds low-rank decomposition BA:
- +text
h = Wx + (BA)x * (alpha/rank)

projects/lora/dataset.py (1)

51-104: Consider randomizing or allowing class selection control.

The few-shot dataset implementation always selects the first num_classes classes deterministically (Line 84: selected_classes = list(class_indices.keys())[:num_classes]). While this ensures reproducibility, it limits flexibility for few-shot learning experiments that typically evaluate across different class subsets.

Consider one of these approaches:

Option 1: Add random class selection with seed control:

     def __init__(
         self,
         base_dataset: Dataset,
         samples_per_class: int = 5,
         num_classes: int = 10,
         seed: int = 42,
+        random_classes: bool = True,
     ) -> None:
         self.samples_per_class = samples_per_class
         self.num_classes = num_classes
 
         # Group samples by class
         class_indices: dict[int, list[int]] = {}
         for idx in range(len(base_dataset)):
             _, label = base_dataset[idx]
             if label not in class_indices:
                 class_indices[label] = []
             class_indices[label].append(idx)
 
         # Sample k examples from n classes
         generator = torch.Generator().manual_seed(seed)
-        selected_classes = list(class_indices.keys())[:num_classes]
+        all_classes = list(class_indices.keys())
+        if random_classes:
+            perm = torch.randperm(len(all_classes), generator=generator)
+            selected_classes = [all_classes[i] for i in perm[:num_classes].tolist()]
+        else:
+            selected_classes = all_classes[:num_classes]

Option 2: Allow explicit class specification:

     def __init__(
         self,
         base_dataset: Dataset,
         samples_per_class: int = 5,
         num_classes: int = 10,
         seed: int = 42,
+        selected_classes: list[int] | None = None,
     ) -> None:
+        # ... existing class grouping code ...
+        
+        if selected_classes is None:
+            selected_classes = list(class_indices.keys())[:num_classes]
+        else:
+            # Validate that all selected classes exist
+            for cls in selected_classes:
+                if cls not in class_indices:
+                    raise ValueError(f"Class {cls} not found in base dataset")
+        
+        self.num_classes = len(selected_classes)
projects/lora/configs/lora.yaml (2)

95-98: Hardcoded T_max should reference max_epochs for maintainability.

T_max: 20 duplicates the value from trainer.max_epochs. If someone changes max_epochs, they may forget to update T_max, causing the learning rate schedule to mismatch the training duration.

Consider referencing the trainer's max_epochs if the config system supports cross-references:

    T_max: "%trainer::max_epochs"

Or add a variable:

 vars:
+  max_epochs: 20
   # Task configuration
   num_classes: 100  # CIFAR100

Then reference it in both places.


162-166: Missing pin_memory in test_dataloader.

The train_dataloader and val_dataloader both specify pin_memory: true, but test_dataloader omits it. This is a minor inconsistency.

   test_dataloader:
     _target_: torch.utils.data.DataLoader
     batch_size: "%vars::batch_size"
     num_workers: "%vars::num_workers"
+    pin_memory: true
     shuffle: false
projects/lora/networks/lora.py (1)

115-116: Consider using logging instead of print statements.

Using print() for status messages works for examples but doesn't integrate with logging configuration. For a reusable wrapper, logging.info() would be more appropriate.

+import logging
+
+logger = logging.getLogger(__name__)
+
 ...
-        self.model.print_trainable_parameters()
+        # Log trainable parameters summary
+        trainable, total, pct = self.model.get_nb_trainable_parameters()
+        logger.info(f"Trainable params: {trainable:,} / {total:,} ({pct:.2f}%)")
 ...
-        print(f"LoRA adapter saved to: {path}")
+        logger.info(f"LoRA adapter saved to: {path}")
 ...
-        print(f"LoRA adapter loaded from: {path}")
+        logger.info(f"LoRA adapter loaded from: {path}")

This is optional since print_trainable_parameters() is a common pattern in PEFT examples.

Also applies to: 130-131, 140-141

docs/examples/index.md (1)

26-34: Add a language identifier to the project-structure code block

markdownlint (MD040) is complaining because the fenced block showing the projects/<name>/ tree has no language set. You can keep it as plain text and still satisfy the linter by specifying a language:

-```
+```text
 projects/<name>/
 ├── __lighter__.py      # Project marker (enables project.* imports)
 ├── __init__.py
 ├── *.py                # Custom modules
 ├── configs/
 │   └── *.yaml          # Experiment configs
 └── README.md           # Project-specific documentation
-```
+```

Based on static analysis hints.

projects/video_recognition/__lighter__.py (1)

1-7: Marker metadata looks good; keep it aligned with actual features

Comments accurately describe the intended capabilities; just make sure they stay in sync with the actual dataset, writer, and config implementations as the example evolves (e.g., CsvWriter usage, multi-GPU settings).

projects/vision_language/__lighter__.py (1)

1-8: Align freezer callback description with README/implementation

This file says “Freezer callback for frozen text encoder”, while projects/vision_language/README.md describes freezing the image encoder backbone. Please pick the correct behavior (image vs text) and align wording here, in the README, and in configs to avoid confusion.

docs/index.md (1)

343-371: Example projects section looks solid; minor naming nit

The new Examples link and project table are clear and match the added projects. One small polish suggestion: in the video_recognition row, consider spelling the library name as “PyTorchVideo” instead of “PytorchVideo” to match the official branding.

projects/vision_language/README.md (1)

11-17: Replace bare URLs with Markdown links (markdownlint MD034)

To satisfy no-bare-urls (MD034) and keep formatting consistent, you can wrap the dataset URLs in Markdown links, e.g.:

-**Flickr8k** (default) - 8,000 images with 5 captions each
-- Download from Kaggle: https://www.kaggle.com/datasets/adityajn105/flickr8k
+**Flickr8k** (default) - 8,000 images with 5 captions each
+- Download from Kaggle: [Flickr8k](https://www.kaggle.com/datasets/adityajn105/flickr8k)
@@
-**Flickr30k** (larger) - 31,000 images with 5 captions each
-- Request access: https://shannon.cs.illinois.edu/DenotationGraph/
+**Flickr30k** (larger) - 31,000 images with 5 captions each
+- Request access: [Flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/)

As per static analysis hints.

projects/video_recognition/writers.py (1)

23-24: Remove redundant isinstance check.

The type hint tensor: torch.Tensor already documents the expected type. The runtime isinstance check is redundant for type-hinted code.

-    if isinstance(tensor, torch.Tensor):
-        tensor = tensor.detach().cpu()
+    tensor = tensor.detach().cpu()
projects/huggingface_llm/dataset.py (1)

13-17: Hardcoded dataset sizes limit flexibility.

The dataset sizes (2000 for train, 500 for test) are hardcoded, making it difficult to adjust for different use cases without modifying the code. Consider accepting these as optional constructor parameters.

-class TextClassificationDataset(Dataset):
-    def __init__(self, dataset_name, split, tokenizer_name, max_length=128):
+class TextClassificationDataset(Dataset):
+    def __init__(self, dataset_name, split, tokenizer_name, max_length=128, train_size=2000, test_size=500):
         self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

         # Load the dataset
         self.dataset = load_dataset(dataset_name, split=split)

         # Reduce dataset size for faster demo
         if "train" in split:
-            self.dataset = self.dataset.shuffle(seed=42).select(range(2000))
+            self.dataset = self.dataset.shuffle(seed=42).select(range(train_size))
         else:
-            self.dataset = self.dataset.shuffle(seed=42).select(range(500))
+            self.dataset = self.dataset.shuffle(seed=42).select(range(test_size))
projects/huggingface_llm/configs/imdb.yaml (1)

45-50: Consider using dict format for metrics.

The metrics are defined as a list, but using a dict format with explicit names would make the metric keys more predictable and easier to reference in callbacks.

     train_metrics:
         _target_: torchmetrics.MetricCollection
         metrics:
-            - _target_: torchmetrics.Accuracy
-              task: multiclass
-              num_classes: "%vars::num_classes"
+            accuracy:
+                _target_: torchmetrics.Accuracy
+                task: multiclass
+                num_classes: "%vars::num_classes"
projects/medical_segmentation/configs/spleen.yaml (1)

194-227: Add pin_memory to test_dataloader for consistency.

The train and validation dataloaders specify pin_memory: true (lines 103, 162), but the test dataloader omits it. For consistency and potential performance benefits, consider adding it to the test dataloader as well.

   test_dataloader:
     _target_: monai.data.DataLoader
     batch_size: 1
     num_workers: "%vars::num_workers"
+    pin_memory: true
     shuffle: false
projects/video_recognition/dataset.py (1)

10-11: Redundant aliased imports.

The imports UCF101 as UCF101 and Kinetics as Kinetics are redundant since they alias to the same name.

-from torchvision.datasets import UCF101 as UCF101
-from torchvision.datasets import Kinetics as Kinetics
+from torchvision.datasets import UCF101, Kinetics
projects/medical_segmentation/models/model.py (3)

63-64: Move repeated import to module level.

sliding_window_inference is imported inside three separate methods. Move to module-level for cleaner code.

At the top of the file:

 import torch
 
 from lighter import LighterModule
+from monai.inferers import sliding_window_inference

Then remove the inline imports from validation_step, test_step, and predict_step.

Also applies to: 87-88, 108-109


37-39: Redundant forward override.

LighterModule already provides forward() that delegates to self.network. This override can be removed. Based on learnings.

-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        """Forward pass."""
-        return self.network(x)
-

58-59: Potentially duplicate loss logging.

LighterModule automatically logs the loss key from returned dicts (as {mode}/loss/step and {mode}/loss/epoch). The manual self.log("train/loss", ...) call creates a separate log entry. Consider removing manual logging for consistency with the framework's automatic logging.

-        self.log("train/loss", loss, prog_bar=True)
         return {"loss": loss}

If prog_bar=True is needed, you can configure this via Lightning's logging options.

projects/vision_language/models/model.py (1)

80-83: Unused metrics parameter.

The metrics parameter is accepted but never used. If CLIP uses retrieval metrics computed inline (i2t/t2i accuracy), consider removing the parameter or documenting why it's kept for future extensibility.

-    def _shared_step(self, batch: dict, metrics) -> dict:
+    def _shared_step(self, batch: dict) -> dict:
         """Shared logic for train/val/test steps."""
         ...
-        # Update metrics if available
-        if metrics is not None:
-            # Note: CLIP typically uses retrieval metrics, not classification metrics
-            pass

And update callers to not pass metrics.

projects/vision_language/networks/clip_model.py (1)

157-164: Unused embed_dim parameter.

The embed_dim parameter is accepted but never used. Consider removing it or using it to validate encoder output dimensions.

     def __init__(
         self,
         image_encoder: nn.Module,
         text_encoder: nn.Module,
-        embed_dim: int = 512,
         temperature: float = 0.07,
         learnable_temperature: bool = True,
     ) -> None:
projects/vision_language/dataset.py (4)

113-120: Silent failure in _load_image could mask real errors.

The broad except Exception silently returns a zero tensor, which could hide issues like corrupt images, permission errors, or missing files. Consider at minimum logging a warning.

     def _load_image(self, path: Path) -> torch.Tensor:
         """Load image from file."""
         try:
             from torchvision.io import read_image
 
             return read_image(str(path)).float() / 255.0
-        except Exception:
+        except Exception as e:
+            import warnings
+            warnings.warn(f"Failed to load image {path}: {e}. Returning zeros.")
             return torch.zeros(3, 224, 224)

250-254: Inconsistent error handling compared to Flickr30kDataset._load_image.

This method lacks exception handling while Flickr30kDataset._load_image catches exceptions and returns zeros. Consider consistent behavior across dataset classes.


314-323: Missing image existence check during annotation loading.

Unlike Flickr30kDataset and Flickr8kDataset which check if image_path.exists(), this method adds entries without validation. This defers errors to __getitem__ time, making debugging harder.

             for ann in coco["annotations"]:
                 image_id = ann["image_id"]
                 if image_id in id_to_file:
+                    image_path = self.root / id_to_file[image_id]
+                    if not image_path.exists():
+                        continue
                     data.append(
                         {
-                            "image_path": self.root / id_to_file[image_id],
+                            "image_path": image_path,
                             "caption": ann["caption"],
                         }
                     )

329-354: COCOCaptionsDataset.__getitem__ doesn't return "caption" key unlike Flickr datasets.

The Flickr datasets return "caption" in their output dict (e.g., line 110), but COCOCaptionsDataset omits it. While collate_fn doesn't use it, this inconsistency could cause issues if downstream code expects uniform outputs.

         return {
             "image": image,
             "input_ids": input_ids,
             "attention_mask": attention_mask,
+            "caption": item["caption"],
         }
projects/eeg/submission.py (1)

160-178: Consider guarding against empty network_state when exporting weights

If the checkpoint does not contain any keys starting with network. (or they are unexpectedly named), network_state will be empty and you’ll silently write an essentially useless weights_challenge_*.pt.

You might want to add a small guard and warning:

         # Extract network weights (remove 'network.' prefix from LighterModule)
         network_state = {}
         for key, value in state_dict.items():
             if key.startswith("network."):
@@
                 if new_key.startswith("model."):
                     new_key = new_key[6:]
                 network_state[new_key] = value
-
-        # Save weights
-        output_file = output_path / f"weights_challenge_{challenge}.pt"
-        torch.save(network_state, output_file)
-        print(f"Exported Challenge {challenge} weights to: {output_file}")
+        if not network_state:
+            print(
+                f"Warning: No 'network.' parameters found in checkpoint for "
+                f"Challenge {challenge}; skipping export."
+            )
+            continue
+
+        # Save weights
+        output_file = output_path / f"weights_challenge_{challenge}.pt"
+        torch.save(network_state, output_file)
+        print(f"Exported Challenge {challenge} weights to: {output_file}")

This makes misconfigured checkpoints much easier to debug.

projects/eeg/README.md (1)

17-27: Align README model naming with actual exported class

The README refers to “EEGNet”, while the code exposes EEGNetv4 from braindecode. To avoid confusion when users write configs, consider naming it explicitly as “EEGNetv4” here as well (or adding a short note that the implementation uses EEGNetv4 under the hood).

projects/eeg/configs/challenge1.yaml (2)

86-89: Hardcoded T_max may drift from max_epochs.

T_max is hardcoded to 100 while max_epochs is defined in vars/trainer. If epochs change, the scheduler won't adapt correctly.

Consider referencing the trainer's max_epochs or adding a var:

   scheduler:
     _target_: torch.optim.lr_scheduler.CosineAnnealingLR
     optimizer: "@model::optimizer"
-    T_max: 100
+    T_max: "%trainer::max_epochs"

132-143: Missing pin_memory in test dataloader.

Train and val dataloaders have pin_memory: true, but test_dataloader omits it. Add for consistency:

   test_dataloader:
     _target_: torch.utils.data.DataLoader
     batch_size: "%vars::batch_size"
     num_workers: "%vars::num_workers"
+    pin_memory: true
     shuffle: false
projects/eeg/data/hbn_dataset.py (2)

162-173: Consider reusing get_train_val_test_split to reduce duplication.

The split logic here duplicates the utility function defined at lines 353-385. Both HBNDatasetChallenge1 and HBNDatasetChallenge2 could call get_train_val_test_split internally.


327-350: O(n) linear search in __getitem__ is inefficient for large datasets.

Each __getitem__ call iterates through all window datasets to find the correct one. For large datasets with many windows, this becomes a bottleneck.

Precompute a cumulative sum array in __init__ for O(1) lookup:

# In __init__, after building _window_datasets:
self._cumulative_sizes = []
cumsum = 0
for wds in self._window_datasets:
    cumsum += len(wds)
    self._cumulative_sizes.append(cumsum)

# In __getitem__:
import bisect
dataset_idx = bisect.bisect_right(self._cumulative_sizes, idx)
local_idx = idx if dataset_idx == 0 else idx - self._cumulative_sizes[dataset_idx - 1]
wds = self._window_datasets[dataset_idx]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8361ca1 and 0702c65.

📒 Files selected for processing (89)
  • .gitignore (0 hunks)
  • docs/examples/image-classification.md (0 hunks)
  • docs/examples/index.md (1 hunks)
  • docs/examples/multi-gpu.md (0 hunks)
  • docs/guides/best-practices.md (1 hunks)
  • docs/guides/custom-code.md (1 hunks)
  • docs/guides/training.md (1 hunks)
  • docs/index.md (2 hunks)
  • docs/quickstart.md (1 hunks)
  • docs/reference/cli.md (1 hunks)
  • mkdocs.yml (1 hunks)
  • projects/README.md (1 hunks)
  • projects/cifar10/README.md (1 hunks)
  • projects/cifar10/configs/example.yaml (1 hunks)
  • projects/cifar10/models/__init__.py (1 hunks)
  • projects/cifar10/models/model.py (0 hunks)
  • projects/cifar10/networks/__init__.py (1 hunks)
  • projects/eeg/README.md (1 hunks)
  • projects/eeg/__init__.py (1 hunks)
  • projects/eeg/__lighter__.py (1 hunks)
  • projects/eeg/configs/challenge1.yaml (1 hunks)
  • projects/eeg/configs/challenge2.yaml (1 hunks)
  • projects/eeg/data/__init__.py (1 hunks)
  • projects/eeg/data/hbn_dataset.py (1 hunks)
  • projects/eeg/models/__init__.py (1 hunks)
  • projects/eeg/models/eeg_model.py (1 hunks)
  • projects/eeg/networks/__init__.py (1 hunks)
  • projects/eeg/networks/eeg_networks.py (1 hunks)
  • projects/eeg/submission.py (1 hunks)
  • projects/huggingface_llm/README.md (1 hunks)
  • projects/huggingface_llm/__lighter__.py (1 hunks)
  • projects/huggingface_llm/configs/imdb.yaml (1 hunks)
  • projects/huggingface_llm/dataset.py (1 hunks)
  • projects/huggingface_llm/models/__init__.py (1 hunks)
  • projects/huggingface_llm/models/model.py (1 hunks)
  • projects/huggingface_llm/networks/__init__.py (1 hunks)
  • projects/huggingface_llm/networks/model_factory.py (1 hunks)
  • projects/lora/README.md (1 hunks)
  • projects/lora/__init__.py (1 hunks)
  • projects/lora/__lighter__.py (1 hunks)
  • projects/lora/configs/lora.yaml (1 hunks)
  • projects/lora/dataset.py (1 hunks)
  • projects/lora/models/__init__.py (1 hunks)
  • projects/lora/models/model.py (1 hunks)
  • projects/lora/networks/__init__.py (1 hunks)
  • projects/lora/networks/lora.py (1 hunks)
  • projects/lora/networks/network.py (1 hunks)
  • projects/medical_segmentation/README.md (1 hunks)
  • projects/medical_segmentation/__init__.py (1 hunks)
  • projects/medical_segmentation/__lighter__.py (1 hunks)
  • projects/medical_segmentation/configs/spleen.yaml (1 hunks)
  • projects/medical_segmentation/dataset.py (1 hunks)
  • projects/medical_segmentation/models/__init__.py (1 hunks)
  • projects/medical_segmentation/models/model.py (1 hunks)
  • projects/medical_segmentation/networks/__init__.py (1 hunks)
  • projects/medical_segmentation/networks/unet.py (1 hunks)
  • projects/medical_segmentation/writers.py (1 hunks)
  • projects/self_supervised/README.md (1 hunks)
  • projects/self_supervised/__init__.py (1 hunks)
  • projects/self_supervised/__lighter__.py (1 hunks)
  • projects/self_supervised/configs/simclr.yaml (1 hunks)
  • projects/self_supervised/dataset.py (1 hunks)
  • projects/self_supervised/models/__init__.py (1 hunks)
  • projects/self_supervised/models/model.py (1 hunks)
  • projects/self_supervised/networks/__init__.py (1 hunks)
  • projects/self_supervised/networks/encoder.py (1 hunks)
  • projects/video_recognition/.gitignore (1 hunks)
  • projects/video_recognition/README.md (1 hunks)
  • projects/video_recognition/__init__.py (1 hunks)
  • projects/video_recognition/__lighter__.py (1 hunks)
  • projects/video_recognition/configs/base.yaml (1 hunks)
  • projects/video_recognition/configs/r3d.yaml (1 hunks)
  • projects/video_recognition/configs/transformer.yaml (1 hunks)
  • projects/video_recognition/dataset.py (1 hunks)
  • projects/video_recognition/models/__init__.py (1 hunks)
  • projects/video_recognition/models/model.py (1 hunks)
  • projects/video_recognition/networks/__init__.py (1 hunks)
  • projects/video_recognition/networks/video_models.py (1 hunks)
  • projects/video_recognition/writers.py (1 hunks)
  • projects/vision_language/README.md (1 hunks)
  • projects/vision_language/__init__.py (1 hunks)
  • projects/vision_language/__lighter__.py (1 hunks)
  • projects/vision_language/configs/clip.yaml (1 hunks)
  • projects/vision_language/dataset.py (1 hunks)
  • projects/vision_language/models/__init__.py (1 hunks)
  • projects/vision_language/models/model.py (1 hunks)
  • projects/vision_language/networks/__init__.py (1 hunks)
  • projects/vision_language/networks/clip_model.py (1 hunks)
  • src/lighter/utils/dynamic_imports.py (0 hunks)
💤 Files with no reviewable changes (5)
  • .gitignore
  • docs/examples/image-classification.md
  • docs/examples/multi-gpu.md
  • projects/cifar10/models/model.py
  • src/lighter/utils/dynamic_imports.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-07T17:00:14.929Z
Learnt from: ibro45
Repo: project-lighter/lighter PR: 173
File: projects/cifar10/model.py:6-11
Timestamp: 2025-12-07T17:00:14.929Z
Learning: When reviewing code that uses LighterModule as a base class, the subclass does not need to define __init__ or forward methods. LighterModule provides __init__ that accepts network, criterion, optimizer, scheduler, and metrics parameters, and forward() that delegates to self.network. Subclasses only need to implement step methods (training_step, validation_step, test_step, predict_step). The network architecture is defined separately and passed via YAML configuration.

Applied to files:

  • projects/vision_language/models/__init__.py
  • projects/self_supervised/models/model.py
  • projects/huggingface_llm/models/model.py
  • projects/self_supervised/__lighter__.py
  • projects/self_supervised/networks/encoder.py
  • projects/self_supervised/models/__init__.py
  • projects/lora/models/model.py
  • projects/medical_segmentation/models/model.py
  • projects/video_recognition/models/__init__.py
  • projects/lora/models/__init__.py
  • projects/cifar10/models/__init__.py
  • projects/medical_segmentation/models/__init__.py
🧬 Code graph analysis (33)
projects/eeg/models/__init__.py (1)
projects/eeg/models/eeg_model.py (1)
  • EEGRegressionModel (15-131)
projects/huggingface_llm/dataset.py (1)
tests/conftest.py (1)
  • DummyDataset (51-76)
projects/vision_language/configs/clip.yaml (1)
src/lighter/callbacks/freezer.py (1)
  • Freezer (13-148)
projects/self_supervised/networks/__init__.py (1)
projects/self_supervised/networks/encoder.py (4)
  • BYOLNetwork (114-137)
  • SimCLRNetwork (52-71)
  • create_byol_model (74-111)
  • create_simclr_model (12-49)
projects/huggingface_llm/models/__init__.py (1)
projects/huggingface_llm/models/model.py (1)
  • TextClassificationModel (6-52)
projects/vision_language/__lighter__.py (2)
src/lighter/callbacks/freezer.py (1)
  • Freezer (13-148)
src/lighter/model.py (1)
  • LighterModule (20-423)
projects/video_recognition/writers.py (2)
src/lighter/callbacks/file_writer.py (3)
  • write_tensor (72-75)
  • write_image_3d (90-100)
  • write_image_2d (79-86)
tests/unit/test_callbacks_writers.py (1)
  • test_tensor_writer (333-344)
projects/huggingface_llm/models/model.py (3)
src/lighter/model.py (2)
  • LighterModule (20-423)
  • training_step (130-155)
projects/cifar10/model.py (1)
  • CIFAR10Model (6-61)
tests/fixtures/plain_lightning_modules.py (1)
  • MyLighterModule (101-118)
projects/vision_language/models/model.py (2)
src/lighter/model.py (1)
  • LighterModule (20-423)
projects/vision_language/networks/clip_model.py (4)
  • temperature (177-178)
  • forward (56-67)
  • forward (108-141)
  • forward (194-213)
projects/lora/networks/__init__.py (2)
projects/lora/networks/lora.py (1)
  • LoRAWrapper (23-165)
projects/lora/networks/network.py (2)
  • ImageClassifier (8-63)
  • VisionTransformerForClassification (66-95)
projects/lora/networks/network.py (1)
projects/lora/networks/lora.py (1)
  • forward (118-120)
projects/eeg/data/__init__.py (1)
projects/eeg/data/hbn_dataset.py (3)
  • HBNDatasetChallenge1 (71-196)
  • HBNDatasetChallenge2 (199-350)
  • get_train_val_test_split (353-386)
projects/lora/dataset.py (1)
tests/fixtures/plain_lightning_modules.py (1)
  • SimpleDataset (16-28)
projects/self_supervised/networks/encoder.py (1)
projects/self_supervised/models/model.py (1)
  • forward (86-97)
projects/video_recognition/__lighter__.py (1)
src/lighter/model.py (1)
  • LighterModule (20-423)
projects/eeg/__lighter__.py (1)
tests/unit/test_engine_runner.py (1)
  • test_auto_discover_project_with_marker (182-209)
projects/lora/__lighter__.py (3)
tests/integration/test_plain_lightning.py (1)
  • test_lighter_with_lightning_module_and_external_dataloaders (45-66)
src/lighter/callbacks/freezer.py (1)
  • Freezer (13-148)
src/lighter/model.py (1)
  • LighterModule (20-423)
projects/self_supervised/models/__init__.py (1)
projects/self_supervised/models/model.py (1)
  • SimCLRModel (24-176)
projects/lora/models/model.py (3)
src/lighter/model.py (1)
  • LighterModule (20-423)
projects/cifar10/model.py (1)
  • CIFAR10Model (6-61)
tests/fixtures/plain_lightning_modules.py (2)
  • MyLighterModule (101-118)
  • PlainLightningModule (31-73)
projects/video_recognition/networks/__init__.py (1)
projects/video_recognition/networks/video_models.py (2)
  • R3D (59-119)
  • VideoTransformer (145-227)
projects/medical_segmentation/models/model.py (3)
src/lighter/model.py (1)
  • LighterModule (20-423)
projects/cifar10/model.py (1)
  • CIFAR10Model (6-61)
tests/fixtures/plain_lightning_modules.py (1)
  • MyLighterModule (101-118)
projects/eeg/submission.py (1)
src/lighter/engine/runner.py (1)
  • load (47-62)
projects/video_recognition/models/__init__.py (2)
projects/video_recognition/models/model.py (1)
  • VideoClassificationModel (8-79)
src/lighter/model.py (1)
  • LighterModule (20-423)
projects/lora/models/__init__.py (1)
projects/lora/models/model.py (1)
  • LoRAClassificationModel (8-80)
projects/vision_language/networks/__init__.py (1)
projects/vision_language/networks/clip_model.py (3)
  • CLIPModel (144-221)
  • ImageEncoder (13-67)
  • TextEncoder (70-141)
projects/cifar10/models/__init__.py (1)
projects/cifar10/models/model.py (1)
  • CIFAR10Model (6-57)
projects/medical_segmentation/models/__init__.py (1)
projects/medical_segmentation/models/model.py (1)
  • SegmentationModel (8-120)
projects/video_recognition/configs/base.yaml (1)
tests/integration/test_plain_lightning.py (1)
  • test_lighter_with_lightning_module_and_external_dataloaders (45-66)
projects/cifar10/networks/__init__.py (1)
projects/cifar10/networks/net.py (1)
  • Net (6-23)
projects/medical_segmentation/writers.py (1)
src/lighter/callbacks/file_writer.py (2)
  • write_image_3d (90-100)
  • write_image_2d (79-86)
projects/medical_segmentation/__lighter__.py (1)
src/lighter/model.py (1)
  • LighterModule (20-423)
projects/eeg/configs/challenge2.yaml (1)
tests/integration/test_plain_lightning.py (1)
  • test_lighter_with_lightning_module_and_external_dataloaders (45-66)
projects/vision_language/networks/clip_model.py (1)
projects/vision_language/models/model.py (1)
  • forward (48-55)
🪛 LanguageTool
projects/eeg/README.md

[grammar] ~11-~11: Ensure spelling is correct
Context: ...ession on 129-channel EEG, 2s windows @ 100Hz. ## Dataset HBN-EEG - ~3,000 part...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/examples/index.md

26-26: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

projects/vision_language/README.md

12-12: Bare URL used

(MD034, no-bare-urls)


16-16: Bare URL used

(MD034, no-bare-urls)

projects/lora/README.md

56-56: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


61-61: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Comment on lines +190 to +196
def __getitem__(self, idx: int) -> tuple:
X, y, window_info = self.dataset[idx]
# Get target from cached metadata
target = float(self._metadata.iloc[idx].get("rt_from_stimulus", y))
# Convert to float32 for MPS compatibility (MPS doesn't support float64)
X = X.astype(np.float32)
return X, np.float32(target)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential index mismatch with iloc on metadata.

After splitting and creating BaseConcatDataset, the metadata DataFrame index may not be contiguous (0 to N-1). Using iloc[idx] assumes a reset index. If the metadata retains original indices, this could return incorrect targets.

Consider resetting the index when caching metadata:

         # Cache metadata for efficient access
-        self._metadata = self.dataset.get_metadata() if self.dataset else None
+        self._metadata = self.dataset.get_metadata().reset_index(drop=True) if self.dataset else None
🤖 Prompt for AI Agents
In projects/eeg/data/hbn_dataset.py around lines 190-196, the code uses
self._metadata.iloc[idx] but the cached metadata may retain original
(non-contiguous) indices after dataset splitting/concatenation, causing index
mismatches; fix by ensuring the cached metadata has a contiguous 0..N-1 index
(e.g., call reset_index(drop=True) when caching or after any split/concat
operation) or map dataset indices to metadata indices explicitly so iloc[idx]
always returns the correct row.

Comment on lines +181 to +257
def create_submission_zip(
output_dir: str = "submission",
zip_name: str = "submission.zip",
) -> None:
"""Create submission zip file.

Args:
output_dir: Directory containing submission files.
zip_name: Name of the output zip file.
"""
output_path = Path(output_dir)
zip_path = output_path / zip_name

required_files = [
"submission.py",
"weights_challenge_1.pt",
"weights_challenge_2.pt",
]

# Copy submission.py to output directory
import shutil

shutil.copy(__file__, output_path / "submission.py")

# Create zip
with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zf:
for filename in required_files:
filepath = output_path / filename
if filepath.exists():
zf.write(filepath, filename)
print(f"Added to zip: {filename}")
else:
print(f"Warning: Missing file: {filename}")

print(f"\nCreated submission: {zip_path}")


def test_submission(output_dir: str = "submission") -> None:
"""Test submission locally.

Args:
output_dir: Directory containing submission files.
"""
import sys

sys.path.insert(0, str(Path(output_dir)))

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Testing on device: {device}")

try:
sub = Submission(SFREQ=DEFAULT_SFREQ, DEVICE=device)

# Test Challenge 1
print("\nTesting Challenge 1 model...")
model1 = sub.get_model_challenge_1()
x = torch.randn(2, DEFAULT_N_CHANS, DEFAULT_N_TIMES).to(device)
with torch.no_grad():
y = model1(x)
print(f" Input shape: {x.shape}")
print(f" Output shape: {y.shape}")
print(f" Output: {y.squeeze().tolist()}")

# Test Challenge 2
print("\nTesting Challenge 2 model...")
model2 = sub.get_model_challenge_2()
with torch.no_grad():
y = model2(x)
print(f" Input shape: {x.shape}")
print(f" Output shape: {y.shape}")
print(f" Output: {y.squeeze().tolist()}")

print("\nSubmission test PASSED!")

except Exception as e:
print(f"\nSubmission test FAILED: {e}")
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

test_submission does not actually test the packaged submission module / weight paths

Currently test_submission:

  • Adds output_dir to sys.path, but still uses the Submission class and constants from the current module (projects.eeg.submission).
  • resolve_path never uses output_dir, so if weights only exist in output_dir (the default from export_weights), --test will typically fail to find them.

To truly exercise the self-contained submission bundle (the submission.py copied into output_dir and its colocated weights), you can import the submission module from output_dir and use its Submission class and constants:

 def test_submission(output_dir: str = "submission") -> None:
     """Test submission locally.
@@
-    import sys
-
-    sys.path.insert(0, str(Path(output_dir)))
-
-    device = "cuda" if torch.cuda.is_available() else "cpu"
+    import sys
+    import importlib
+
+    output_path = Path(output_dir)
+    sys.path.insert(0, str(output_path))
+
+    # Import the self-contained submission module from the output directory
+    submission_module = importlib.import_module("submission")
+
+    device = "cuda" if torch.cuda.is_available() else "cpu"
     print(f"Testing on device: {device}")
 
     try:
-        sub = Submission(SFREQ=DEFAULT_SFREQ, DEVICE=device)
+        sub = submission_module.Submission(
+            SFREQ=submission_module.DEFAULT_SFREQ,
+            DEVICE=device,
+        )
@@
-        # Test Challenge 1
+        # Test Challenge 1
         print("\nTesting Challenge 1 model...")
         model1 = sub.get_model_challenge_1()
-        x = torch.randn(2, DEFAULT_N_CHANS, DEFAULT_N_TIMES).to(device)
+        x = torch.randn(
+            2,
+            submission_module.DEFAULT_N_CHANS,
+            submission_module.DEFAULT_N_TIMES,
+        ).to(device)

This way:

  • __file__ inside the imported submission module points to output_dir/submission.py, so resolve_path will correctly see colocated weights_challenge_*.pt.
  • The local test matches what Codabench will see in the actual submission bundle.

Comment on lines +1 to +5
"""HuggingFace model utilities."""

from transformers import AutoModelForSequenceClassification

__all__ = ["AutoModelForSequenceClassification"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n projects/huggingface_llm/networks/__init__.py

Repository: project-lighter/lighter

Length of output: 230


🏁 Script executed:

rg -n "create_model" projects/huggingface_llm --type py

Repository: project-lighter/lighter

Length of output: 229


🏁 Script executed:

cat -n projects/huggingface_llm/networks/model_factory.py

Repository: project-lighter/lighter

Length of output: 247


🏁 Script executed:

rg -n "create_model" projects/huggingface_llm -A 2 -B 2 --type py

Repository: project-lighter/lighter

Length of output: 435


Define create_model factory to match networks package import

projects/huggingface_llm/networks/__init__.py imports create_model from this module (line 3), but the function is not defined in model_factory.py. This will raise an ImportError when importing the networks package.

Add a factory function around AutoModelForSequenceClassification and export it:

-"""HuggingFace model utilities."""
-
-from transformers import AutoModelForSequenceClassification
-
-__all__ = ["AutoModelForSequenceClassification"]
+"""HuggingFace model utilities and factory functions."""
+
+from transformers import AutoModelForSequenceClassification
+
+
+def create_model(
+    pretrained_model_name_or_path: str,
+    num_labels: int,
+    **kwargs,
+):
+    """Create a sequence classification model from a pretrained checkpoint."""
+    return AutoModelForSequenceClassification.from_pretrained(
+        pretrained_model_name_or_path,
+        num_labels=num_labels,
+        **kwargs,
+    )
+
+
+__all__ = ["create_model", "AutoModelForSequenceClassification"]
🤖 Prompt for AI Agents
In projects/huggingface_llm/networks/model_factory.py around lines 1 to 5, the
module currently only re-exports AutoModelForSequenceClassification but
networks/__init__.py expects a create_model factory; add a create_model function
that wraps AutoModelForSequenceClassification (accepting model_name_or_path and
**kwargs), return the instantiated model, and export it by including
"create_model" in __all__; ensure the function signature and behavior match how
networks/__init__.py calls it so imports succeed without ImportError.

Comment on lines +8 to +9
from lightly.data import LightlyDataset as LightlyDataset
from lightly.transforms import SimCLRTransform as SimCLRTransform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check project dependencies and lightly specifications

echo "=== Searching for dependency files ==="
find . -maxdepth 2 -type f \( -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" -o -name "requirements*.txt" -o -name "*.txt" \) | head -20

echo -e "\n=== Checking for lightly references ==="
rg "lightly" --type py --type toml --type cfg --type txt -n 2>/dev/null || echo "Search completed"

Repository: project-lighter/lighter

Length of output: 180


🏁 Script executed:

cat pyproject.toml

Repository: project-lighter/lighter

Length of output: 4073


Add lightly to project dependencies.

The lightly library is imported in this file but is not declared in pyproject.toml. This will cause an ImportError at runtime. Add lightly to the dependencies list in pyproject.toml with an appropriate version constraint.

🤖 Prompt for AI Agents
In projects/self_supervised/dataset.py lines 8-9 the module imports from the
third-party package "lightly" but that package is not listed in pyproject.toml;
add "lightly" to the [project]/dependencies in pyproject.toml with an
appropriate version constraint (for example a compatible range like >=2.0,<3.0
or pin to the tested minor version), then update/install dependencies (poetry
install or pip sync) so the import will succeed at runtime.

Comment on lines +45 to +52
train_dataloader:
_target_: torch.utils.data.DataLoader
num_workers: "%vars::num_workers"
pin_memory: true
shuffle: true
collate_fn:
_target_: project.dataset.video_collate_fn
_mode_: callable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add dataset to train_dataloader.

The train dataloader is missing a dataset key. Config composition requires that either this base config or the model-specific config provides the dataset. Document which config should define it, or add a placeholder here.

🤖 Prompt for AI Agents
In projects/video_recognition/configs/base.yaml around lines 45 to 52, the
train_dataloader block lacks a dataset key required by config composition;
either add a placeholder dataset entry here (e.g., dataset: "${...}" or a
minimal _target_ stub) or explicitly document in this file that every
model-specific config must supply train_dataloader.dataset. Update the YAML to
include a dataset key placeholder if you want the base to guarantee presence, or
add a short comment above train_dataloader stating that dataset must be defined
in model configs and remove ambiguity.

⚠️ Potential issue | 🔴 Critical

Add batch_size to train_dataloader.

The train dataloader is missing a batch_size parameter. While this is a base config intended for composition, dataloaders require a batch_size to function. Consider adding it here or documenting that it must be provided in the model-specific config.

🤖 Prompt for AI Agents
In projects/video_recognition/configs/base.yaml around lines 45 to 52, the
train_dataloader block is missing a required batch_size setting; add a
batch_size entry (e.g., batch_size: "%vars::batch_size" or a sensible default)
to this base config or explicitly document that every model-specific config must
supply batch_size; update the YAML line to include batch_size and ensure any
variable reference matches existing vars definitions so DataLoader always
receives a batch size.

Comment on lines +127 to +136
image_transform:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.RandomResizedCrop
size: "%vars::image_size"
scale: [0.8, 1.0]
- _target_: torchvision.transforms.RandomHorizontalFlip
- _target_: torchvision.transforms.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add ToTensor transform before Normalize.

The transform pipeline applies Normalize directly after RandomHorizontalFlip, but Normalize expects a tensor input while torchvision datasets typically return PIL Images. Without a conversion step, this will raise a runtime error.

Apply this diff to add the missing transform:

           - _target_: torchvision.transforms.RandomHorizontalFlip
+          - _target_: torchvision.transforms.ToTensor
           - _target_: torchvision.transforms.Normalize
🤖 Prompt for AI Agents
In projects/vision_language/configs/clip.yaml around lines 127 to 136, the
transform pipeline applies Normalize to PIL images which will error because
Normalize expects tensors; insert a torchvision.transforms.ToTensor (or
equivalent) transform immediately before Normalize so the image is converted to
a tensor prior to normalization, preserving the order: RandomResizedCrop,
RandomHorizontalFlip, ToTensor, Normalize.

Comment on lines +153 to +160
image_transform:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.Resize
size: ["%vars::image_size", "%vars::image_size"]
- _target_: torchvision.transforms.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add ToTensor transform before Normalize.

Same issue as the training transforms - Normalize requires tensor input but the pipeline lacks a conversion step.

Apply this diff:

           - _target_: torchvision.transforms.Resize
             size: ["%vars::image_size", "%vars::image_size"]
+          - _target_: torchvision.transforms.ToTensor
           - _target_: torchvision.transforms.Normalize
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
image_transform:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.Resize
size: ["%vars::image_size", "%vars::image_size"]
- _target_: torchvision.transforms.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
image_transform:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.Resize
size: ["%vars::image_size", "%vars::image_size"]
- _target_: torchvision.transforms.ToTensor
- _target_: torchvision.transforms.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
🤖 Prompt for AI Agents
projects/vision_language/configs/clip.yaml around lines 153-160: the
image_transform pipeline applies Resize then Normalize but is missing a ToTensor
conversion, so Normalize will receive PIL images not tensors; insert a
torchvision.transforms.ToTensor transform between Resize and Normalize (i.e.,
Resize -> ToTensor -> Normalize) using the same _target_ structure to ensure
images are converted to tensors before normalization.

Comment on lines +175 to +182
image_transform:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.Resize
size: ["%vars::image_size", "%vars::image_size"]
- _target_: torchvision.transforms.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add ToTensor transform before Normalize.

The prediction dataloader has the same missing transform.

Apply this diff:

           - _target_: torchvision.transforms.Resize
             size: ["%vars::image_size", "%vars::image_size"]
+          - _target_: torchvision.transforms.ToTensor
           - _target_: torchvision.transforms.Normalize
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
image_transform:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.Resize
size: ["%vars::image_size", "%vars::image_size"]
- _target_: torchvision.transforms.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
image_transform:
_target_: torchvision.transforms.Compose
transforms:
- _target_: torchvision.transforms.Resize
size: ["%vars::image_size", "%vars::image_size"]
- _target_: torchvision.transforms.ToTensor
- _target_: torchvision.transforms.Normalize
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
🤖 Prompt for AI Agents
In projects/vision_language/configs/clip.yaml around lines 175-182, the
transforms list is missing a ToTensor step before Normalize; insert a
torchvision.transforms.ToTensor entry immediately before the Normalize transform
in the image_transform.transforms list so Normalize receives tensor input, and
make the identical change in the prediction dataloader config (add ToTensor
before Normalize there as well).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
projects/video_recognition/dataset.py (2)

28-30: Remove unused audio variable.

The audio variable is unpacked but never used. If audio data is not needed, consider using _ to make this explicit.

Apply this diff:

     def __getitem__(self, idx: int) -> tuple[torch.Tensor, int]:
         # UCF101/Kinetics return (video, audio, label)
-        video, audio, label = self.dataset[idx]
+        video, _, label = self.dataset[idx]

34-36: Consider more robust normalization check.

The normalization logic assumes that video.max() > 1.0 implies the video is in [0, 255] range. While this works for standard video datasets, consider checking against a threshold closer to 255 (e.g., video.max() > 1.5 or video.max() > 127) for better robustness against edge cases.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0702c65 and 15b08e8.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • docs/examples/index.md (1 hunks)
  • projects/eeg/configs/challenge1.yaml (1 hunks)
  • projects/eeg/configs/challenge2.yaml (1 hunks)
  • projects/eeg/data/hbn_dataset.py (1 hunks)
  • projects/huggingface_llm/configs/imdb.yaml (1 hunks)
  • projects/self_supervised/README.md (1 hunks)
  • projects/video_recognition/dataset.py (1 hunks)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • projects/eeg/data/hbn_dataset.py
  • projects/self_supervised/README.md
  • projects/huggingface_llm/configs/imdb.yaml
🔇 Additional comments (5)
docs/examples/index.md (1)

1-106: Well-structured and comprehensive documentation.

This new index page effectively consolidates example projects with clear descriptions, running instructions, and a standardized template. The project table accurately reflects all eight projects and their dependencies per the PR objectives, the getting-started walkthrough is clear and actionable, and the frontmatter/Markdown formatting is clean.

A couple of minor observations for your consideration (not blockers):

  1. Project Highlights are intentionally selective (5 of 8): eeg, video_recognition, and vision_language are absent from the highlights section. This is fine for curating "flagship" examples, but consider adding brief tech-stack callouts (e.g., "Braindecode integration", "PyTorchVideo") in the table's description column if discovery is a concern.

  2. GitHub links depend on post-merge branch state: All links assume the projects exist at projects/<name>/ in the main branch after this PR is merged. Verify that all eight projects are present in the branch and at these paths before merge.

projects/eeg/configs/challenge2.yaml (3)

16-28: Verify dataset class signature matches parameter names.

Line 24-25 defines releases as a list, but the related Challenge 1 config uses scalar release. This inconsistency must align with HBNDatasetChallenge2.__init__() signature—if the dataset expects release (singular), passing releases will cause a runtime TypeError.

Verify the dataset class signature by checking the HBNDatasetChallenge2 constructor in projects/eeg/data/hbn_dataset.py to confirm the correct parameter name(s).

Also applies to: 87-132


50-51: Verify Lighter's Hydra resolver handles the project module prefix.

The config references project.models.eeg_model.EEGRegressionModel (line 51) and project.data.hbn_dataset.HBNDatasetChallenge2 (line 97), relying on Lighter's Hydra instantiation logic to resolve project as a module. If Lighter doesn't register project in the module search path or doesn't handle package-relative imports, instantiation will fail at runtime.

Verify how Lighter resolves custom module paths in Hydra configs (e.g., whether project is a reserved/registered package prefix or if the config should use absolute import paths like projects.eeg.models.eeg_model instead).

Also applies to: 87-87, 97-97


26-26: Verify task name matches HBNDatasetChallenge2 enumeration.

Line 26 specifies task: contrastChangeDetection in camelCase. Confirm this matches the expected task names/enum values in HBNDatasetChallenge2 to avoid validation errors at instantiation.

Also applies to: 100-100, 115-115, 129-129

projects/eeg/configs/challenge1.yaml (1)

16-26: Verify dataset class signature for release parameter (singular).

Line 24 specifies release: R5 as a scalar, intentionally differing from Challenge 2's releases: [R5] (list). This is passed to HBNDatasetChallenge1 at lines 113, 127, and 140. Verify that HBNDatasetChallenge1.__init__() accepts release (singular) and not releases, to avoid runtime TypeErrors.

Also applies to: 101-143

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 15b08e8 and 02feb2b.

📒 Files selected for processing (1)
  • .gitignore (1 hunks)

Comment on lines +159 to +165
# Outputs, logs, dataset files
**/lightning_logs/
**/outputs/
**/.datasets/
**/*.zip
**/predictions/
checkpoints/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent pattern for checkpoints directory matching.

Lines 160–164 all use the **/ prefix for recursive directory matching, but line 165 uses only checkpoints/. This will only ignore root-level checkpoints, not those in example project subdirectories. Since the PR adds multiple projects that may generate their own checkpoints directories, this inconsistency could cause unintended file tracking.

Apply this diff for consistency:

 # Outputs, logs, dataset files
 **/lightning_logs/
 **/outputs/
 **/.datasets/
 **/*.zip
 **/predictions/
-checkpoints/
+**/checkpoints/
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Outputs, logs, dataset files
**/lightning_logs/
**/outputs/
**/.datasets/
**/*.zip
**/predictions/
checkpoints/
# Outputs, logs, dataset files
**/lightning_logs/
**/outputs/
**/.datasets/
**/*.zip
**/predictions/
**/checkpoints/
🤖 Prompt for AI Agents
In .gitignore around lines 159 to 165, the pattern for ignoring checkpoints is
inconsistent (other entries use **/ but line 165 uses only checkpoints/), so
update that line to use the recursive pattern (e.g., change checkpoints/ to
**/checkpoints/) so checkpoints directories in subprojects are also ignored;
ensure the updated pattern matches the style used by the surrounding entries.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
projects/vision_language/configs/clip.yaml (3)

127-136: Add missing ToTensor transform before Normalize in training pipeline.

The training transforms apply Normalize directly after RandomHorizontalFlip. Since torchvision datasets return PIL Images and Normalize expects tensor input, this will raise a runtime error at training time.

           - _target_: torchvision.transforms.RandomHorizontalFlip
+          - _target_: torchvision.transforms.ToTensor
           - _target_: torchvision.transforms.Normalize
             mean: [0.485, 0.456, 0.406]
             std: [0.229, 0.224, 0.225]

153-160: Add missing ToTensor transform before Normalize in validation pipeline.

The validation transforms have the same issue: Normalize is applied without first converting the PIL Image to a tensor.

           - _target_: torchvision.transforms.Resize
             size: ["%vars::image_size", "%vars::image_size"]
+          - _target_: torchvision.transforms.ToTensor
           - _target_: torchvision.transforms.Normalize
             mean: [0.485, 0.456, 0.406]
             std: [0.229, 0.224, 0.225]

175-182: Add missing ToTensor transform before Normalize in prediction pipeline.

The prediction dataloader has the same missing transform.

           - _target_: torchvision.transforms.Resize
             size: ["%vars::image_size", "%vars::image_size"]
+          - _target_: torchvision.transforms.ToTensor
           - _target_: torchvision.transforms.Normalize
             mean: [0.485, 0.456, 0.406]
             std: [0.229, 0.224, 0.225]
projects/vision_language/dataset.py (2)

34-49: split parameter on Flickr30kDataset is unused and can cause split leakage

split is accepted and stored (Line 37, Line 43) but _load_captions() (Lines 61–77) ignores it and always loads all captions, so callers requesting "val"/"test" silently get the full dataset. This is the same issue previously raised and should be fixed, not left as a misleading API.

Consider one of:

  • A. Enforce only a single split for now (least effort, avoids leakage):
 class Flickr30kDataset(Dataset):
@@
-        split: str = "train",
+        split: str = "train",
@@
-        self.split = split
+        if split != "train":
+            raise ValueError(
+                f"Flickr30kDataset currently only supports split='train', got {split!r}"
+            )
+        self.split = split

and update the docstring Args: section to say it currently only supports "train".

  • B. Implement split-aware loading, by loading a split manifest and filtering:
     def _load_captions(self) -> list[dict]:
         """Load image-caption pairs."""
         caption_file = self.root / "results_20130124.token"
         image_dir = self.root / "flickr30k-images"
+        split_file = self.root / f"flickr30k_{self.split}.txt"
+
+        valid_images: set[str] | None = None
+        if split_file.exists():
+            with open(split_file) as sf:
+                valid_images = {ln.strip() for ln in sf if ln.strip()}
 
         data = []
         if caption_file.exists():
             with open(caption_file) as f:
                 for line in f:
@@
-                        image_id = parts[0].split("#")[0]
+                        image_id = parts[0].split("#")[0]
                         caption = parts[1]
                         image_path = image_dir / image_id
-                        if image_path.exists():
+                        if image_path.exists() and (
+                            valid_images is None or image_id in valid_images
+                        ):
                             data.append({"image_path": image_path, "caption": caption})

(Exact split-file naming/format can be adjusted to whatever the project standard is.)

Leaving it as-is is dangerous because experiments that think they’re using val/test may accidentally see training data.

Also applies to: 61-77


140-161: Flickr8kDataset.split and SPLITS are defined but never used for filtering

SPLITS (Lines 140–145) and split (Line 150, Line 156) are defined, but _load_data() (Lines 173–217) ignores them and loads all captions, causing the same train/val/test leakage concern as before. This was already pointed out in an earlier review and is still unresolved.

You should either enforce/implement split handling or remove the parameter and mapping. A concrete split-aware implementation might look like:

     def _load_data(self) -> list[dict]:
         """Load image-caption pairs."""
         caption_file = self.root / "captions.txt"
         image_dir = self.root / "Images"
@@
-        data = []
-        if caption_file.exists():
-            with open(caption_file) as f:
+        # Load split image names if split is specified
+        valid_images: set[str] | None = None
+        if self.split is not None:
+            if self.split not in self.SPLITS:
+                raise ValueError(
+                    f"Invalid split {self.split!r}. Expected one of {tuple(self.SPLITS)}"
+                )
+            split_path = self.root / self.SPLITS[self.split]
+            if not split_path.exists():
+                raise FileNotFoundError(
+                    f"Split file not found for split={self.split!r}: {split_path}"
+                )
+            with open(split_path) as sf:
+                valid_images = {ln.strip() for ln in sf if ln.strip()}
+
+        data: list[dict] = []
+        if caption_file.exists():
+            with open(caption_file) as f:
@@
-                    image_path = image_dir / image_name
-                    if image_path.exists():
-                        data.append({"image_path": image_path, "caption": caption})
+                    image_path = image_dir / image_name
+                    if (
+                        image_path.exists()
+                        and (valid_images is None or image_name in valid_images)
+                    ):
+                        data.append({"image_path": image_path, "caption": caption})

If split management is deliberately left to the experiment code, then drop split and SPLITS entirely from this dataset and its docstring to avoid implying behavior that doesn’t exist.

Also applies to: 173-217

🧹 Nitpick comments (6)
projects/video_recognition/models/model.py (1)

52-80: Consider documenting the performance and memory trade-offs.

The implementation correctly supports CSV/FileWriter compatibility as documented. However, there are trade-offs worth noting:

  1. Lines 70-73, 78: The .tolist() calls synchronize GPU operations and transfer data to CPU. For large batches or high-throughput inference, this may become a bottleneck.
  2. Line 74: Including the video tensor [B, C, T, H, W] in outputs can be memory-intensive with large batches or long videos.

Both design choices are appropriate for the stated use case (CSV/FileWriter output), but adding a brief inline comment about these trade-offs could help users understand when to modify this pattern for their needs.

Example comment additions:

         # Get top-k predictions (up to 5, or fewer if model has fewer classes)
         k = min(5, probs.shape[1])
         topk_probs, topk_indices = probs.topk(k, dim=1)
 
+        # Note: .tolist() calls below synchronize GPU and transfer to CPU.
+        # For high-throughput inference, consider keeping tensors or batching writes.
         result = {
             "prediction": pred.tolist(),
             "confidence": probs.max(dim=1).values.tolist(),
             "top5_classes": topk_indices.tolist(),
             "top5_probs": topk_probs.tolist(),
-            "video": video,  # Include video tensor for FileWriter
+            "video": video,  # Video tensor for FileWriter (memory-intensive for large batches)
         }
projects/huggingface_llm/models/model.py (1)

6-12: Consider expanding docstring to mention configuration.

The docstring correctly notes that HuggingFace models compute their own loss. Consider adding a note that the criterion parameter should be omitted from the YAML configuration, as it's not used by this model.

Example addition:

     HuggingFace text classification model wrapper.
 
     The HuggingFace model computes its own loss when labels are provided,
-    so we don't need a separate criterion.
+    so we don't need a separate criterion. When configuring this model in YAML,
+    omit the 'criterion' parameter.
projects/vision_language/dataset.py (4)

113-120: Catching all exceptions in _load_image and returning zeros may hide real dataset issues

_load_image wraps both the import and the read in a broad except Exception: and returns an all-zero tensor (Lines 115–120). This makes it hard to notice missing images, path mistakes, or torchvision misconfiguration, especially during training.

Consider tightening this:

  • Import read_image once at module level, so import errors are immediate.
  • Distinguish between import failure and I/O failure, or at least log a warning when returning zeros.

For example:

-from torchvision.io import read_image
+from torchvision.io import read_image
@@
     def _load_image(self, path: Path) -> torch.Tensor:
         """Load image from file."""
-        try:
-            from torchvision.io import read_image
-
-            return read_image(str(path)).float() / 255.0
-        except Exception:
-            return torch.zeros(3, 224, 224)
+        try:
+            return read_image(str(path)).float() / 255.0
+        except FileNotFoundError:
+            raise
+        except Exception as e:
+            # Optionally: log a warning here instead of silently masking the error
+            raise RuntimeError(f"Failed to load image at {path}") from e

This keeps failures visible instead of silently corrupting the batch.


250-255: Inconsistent image-loading behavior between Flickr30k and Flickr8k

Flickr8kDataset._load_image (Lines 250–255) always calls read_image without handling failures, whereas Flickr30kDataset._load_image catches all exceptions and returns a dummy tensor. The inconsistency can make debugging harder and lead to different behavior across example datasets.

Consider standardizing on one approach (ideally fail-fast with a clear error, plus optional logging) across all three datasets:

  • Either let exceptions propagate for all datasets, or
  • Provide a shared utility that handles logging and error conversion consistently.

329-355: COCO dataset API doesn’t expose raw caption text like the Flickr datasets

COCOCaptionsDataset.__getitem__ (Lines 329–354) returns image, input_ids, and attention_mask, but not the original caption string, while both Flickr datasets include "caption" in their returned dicts (Lines 106–111, 243–248).

For downstream logging/inspection and API consistency, consider adding the caption field here as well:

     def __getitem__(self, idx: int) -> dict:
         item = self.data[idx]
@@
-        return {
-            "image": image,
-            "input_ids": input_ids,
-            "attention_mask": attention_mask,
-        }
+        return {
+            "image": image,
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+            "caption": item["caption"],
+        }

collate_fn ignores extra keys, so this won’t affect batching.


363-369: collate_fn assumes homogenous image tensor shapes across the batch

The collate function stacks item["image"] tensors directly (Lines 366–368). That’s standard in PyTorch, but it will fail if any dataset returns images with differing spatial sizes or channel counts (e.g., grayscale vs RGB) and the user forgets to pass a resizing/normalizing image_transform.

You may want to:

  • Document that datasets/transforms must ensure all image tensors have identical shape, or
  • Add a lightweight assertion before torch.stack to produce a clearer error message when shapes differ.

Example:

 def collate_fn(batch: list[dict]) -> dict:
     """Collate function for vision-language batches."""
+    # Optional: sanity check shapes for a clearer error if transforms are misconfigured
+    shapes = {tuple(item["image"].shape) for item in batch}
+    if len(shapes) != 1:
+        raise ValueError(f"All images in a batch must share the same shape, got: {shapes}")
     return {
         "image": torch.stack([item["image"] for item in batch]),
         "input_ids": torch.stack([item["input_ids"] for item in batch]),
         "attention_mask": torch.stack([item["attention_mask"] for item in batch]),
     }

This keeps the example friendlier to users wiring up their own transforms.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02feb2b and 7b529d0.

📒 Files selected for processing (7)
  • projects/eeg/data/hbn_dataset.py (1 hunks)
  • projects/eeg/models/eeg_model.py (1 hunks)
  • projects/huggingface_llm/models/model.py (1 hunks)
  • projects/video_recognition/dataset.py (1 hunks)
  • projects/video_recognition/models/model.py (1 hunks)
  • projects/vision_language/configs/clip.yaml (1 hunks)
  • projects/vision_language/dataset.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • projects/eeg/data/hbn_dataset.py
  • projects/video_recognition/dataset.py
  • projects/eeg/models/eeg_model.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-07T17:00:14.929Z
Learnt from: ibro45
Repo: project-lighter/lighter PR: 173
File: projects/cifar10/model.py:6-11
Timestamp: 2025-12-07T17:00:14.929Z
Learning: When reviewing code that uses LighterModule as a base class, the subclass does not need to define __init__ or forward methods. LighterModule provides __init__ that accepts network, criterion, optimizer, scheduler, and metrics parameters, and forward() that delegates to self.network. Subclasses only need to implement step methods (training_step, validation_step, test_step, predict_step). The network architecture is defined separately and passed via YAML configuration.

Applied to files:

  • projects/huggingface_llm/models/model.py
🧬 Code graph analysis (1)
projects/huggingface_llm/models/model.py (2)
src/lighter/model.py (1)
  • LighterModule (20-423)
tests/fixtures/plain_lightning_modules.py (1)
  • MyLighterModule (101-118)
🔇 Additional comments (8)
projects/video_recognition/models/model.py (4)

17-39: LGTM!

The shared step pattern is clean and promotes code reuse across train/val/test phases. The logic correctly handles forward pass, loss computation, prediction derivation, and metric updates.


41-42: LGTM!

Clean delegation to the shared step logic.


44-45: LGTM!

Consistent delegation pattern matching training_step.


47-50: Verify the design intent for removing loss in test phase.

The test step explicitly removes the loss from results, while validation_step retains it. This may be intentional (e.g., test phase focuses purely on predictions), but confirming the design rationale would be helpful to ensure consistency across the example.

projects/huggingface_llm/models/model.py (4)

1-3: LGTM!

Clean imports and clear module documentation.


14-28: LGTM!

The _shared_step implementation correctly handles HuggingFace model API and metrics updates with proper null checking.


30-40: LGTM!

The training, validation, and test steps correctly follow the LighterModule pattern and return the expected dict format with loss.

Based on learnings, this implementation correctly delegates to _shared_step and uses the appropriate metrics attributes.


42-49: No issues found. The .tolist() conversion is correct and compatible with CsvWriter. The callback framework explicitly handles Python lists via isinstance(value, (list, tuple)) checks in the sequence length and record value extraction methods. This is a tested and documented pattern used consistently across projects.

@ibro45 ibro45 merged commit 4fa281e into main Dec 9, 2025
7 checks passed
@ibro45 ibro45 deleted the projects branch December 9, 2025 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant