Skip to content

[Torchvision API] Input metadata#6364

Open
mdabek-nvidia wants to merge 13 commits into
NVIDIA:mainfrom
mdabek-nvidia:torchvision_image_metadata
Open

[Torchvision API] Input metadata#6364
mdabek-nvidia wants to merge 13 commits into
NVIDIA:mainfrom
mdabek-nvidia:torchvision_image_metadata

Conversation

@mdabek-nvidia
Copy link
Copy Markdown
Collaborator

Category:

New feature

Description:

Torchvision functional API operator to get metadata of input:

  • get_image_size
  • get_dimensions

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia
Copy link
Copy Markdown
Collaborator Author

@greptileai review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR adds three new torchvision-compatible functional APIs to the experimental torchvision module: crop (functional), get_image_size, and get_dimensions, together with the RandomCrop transform class. The image_metadata functions are pure Python/PIL/torch wrappers with no DALI dependency, mirroring torchvision's deprecated get_image_size and get_dimensions for drop-in compatibility. RandomCrop and the functional crop both delegate to DALI's slice primitive, using a negative anchor plus out_of_bounds_policy to implement padding in a single pass.

  • image_metadata.py correctly returns [W, H] / [C, H, W] to match torchvision, and works on CUDA tensors via shape metadata alone.
  • RandomCrop._randint clamps max_value to 0 (fix from [Torchvision API] crop #6353) so a crop larger than the padded input collapses to position 0 rather than passing an inverted range to DALI.
  • Tests cross-validate against torchvision across CPU/GPU, multiple PIL modes, batched tensors, and all padding configurations; @unittest.skip with a comment is used to defer unsupported dict-fill cases.

Confidence Score: 5/5

Safe to merge; all changes are additive and no existing operator or public API is modified.

The new operators are purely additive and well-tested against torchvision reference outputs. The only findings are minor style issues: a missing underscore prefix on an internal flag and missing glob patterns in four error tests in the new metadata test file.

No files require special attention.

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/torchvision/v2/functional/image_metadata.py New pure Python/PIL/torch compatibility layer implementing get_image_size and get_dimensions; no DALI-specific logic, straightforward mirroring of torchvision behavior.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/crop.py New functional crop operator delegating to ndd.slice with out_of_bounds_policy=pad; validates PIL vs tensor layout, rounds PIL box coordinates, delegates size validation to RandomCrop.verify_args.
dali/python/nvidia/dali/experimental/torchvision/v2/randomcrop.py New RandomCrop operator; uses fn.slice with negative anchor + out_of_bounds_policy for padding; _randint clamps negative max_value to 0; needs_padding should be _needs_padding per internal-state convention.
dali/test/python/torchvision/test_tv_image_metadata.py New tests cross-validating against torchvision for both PIL and tensor inputs including GPU; error-case tests are missing glob= patterns on assert_raises calls.
dali/test/python/torchvision/test_tv_randomcrop.py Comprehensive RandomCrop tests covering identity crops, padding modes, pad_if_needed, asymmetric padding, randomness sampling, and invalid-arg rejection; two dict-fill tests correctly skipped with @unittest.skip.
dali/test/python/torchvision/test_tv_crop.py New crop functional tests; covers tensor, PIL (L/RGB/RGBA), batched tensor, dtype preservation, float-param rejection, and out-of-bounds; validates against torchvision reference.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User input: PIL Image or torch.Tensor] --> B{Input type}
    B -->|torch.Tensor| C[adjust_input decorator converts to DALI tensor]
    B -->|PIL Image| D[adjust_input decorator converts to DALI tensor HWC]
    C --> E[crop or RandomCrop._kernel]
    D --> E
    E --> F{needs padding?}
    F -->|Yes| G[fn.slice negative anchor + out_of_bounds_policy=pad]
    F -->|No| H[fn.slice / ndd.slice no padding policy]
    G --> I[DALI output tensor / batch]
    H --> I
Loading

Reviews (4): Last reviewed commit: "Review fixes" | Re-trigger Greptile

Comment thread dali/python/nvidia/dali/experimental/torchvision/v2/randomcrop.py Outdated
@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_image_metadata branch from 15c1775 to c024bac Compare May 25, 2026 12:13
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia
Copy link
Copy Markdown
Collaborator Author

@greptileai re-review

@mdabek-nvidia
Copy link
Copy Markdown
Collaborator Author

!build

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52613366]: BUILD STARTED

@dali-automaton
Copy link
Copy Markdown
Collaborator

CI MESSAGE: [52613366]: BUILD PASSED

* Different index support for tensor and PILImages

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_image_metadata branch from c024bac to b88990d Compare May 26, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants