Skip to content

[Torchvision API] resized_crop and RandomResizedCrop#6369

Draft
mdabek-nvidia wants to merge 38 commits into
NVIDIA:mainfrom
mdabek-nvidia:torchvision_resizedcrop
Draft

[Torchvision API] resized_crop and RandomResizedCrop#6369
mdabek-nvidia wants to merge 38 commits into
NVIDIA:mainfrom
mdabek-nvidia:torchvision_resizedcrop

Conversation

@mdabek-nvidia
Copy link
Copy Markdown
Collaborator

Category:

New feature

Description:

Torchvision API operators:

  • RandomResizedCrop
  • Functional resized_crop functional
  • Unification of InterpolationMode validation

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@mdabek-nvidia
Copy link
Copy Markdown
Collaborator Author

@greptileai review please

Comment thread dali/test/python/torchvision/test_tv_randomresizedcrop.py Dismissed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 25, 2026

Greptile Summary

This PR adds RandomCrop, RandomResizedCrop, crop, resized_crop, get_image_size, and get_dimensions to the experimental torchvision compatibility layer, along with a centralised normalize_interpolation helper in Resize that converts legacy PIL integer codes to InterpolationMode enums.

  • RandomCrop fuses explicit padding and pad_if_needed directly into a single fn.slice call using DALI's out_of_bounds_policy, avoiding a separate pad pass; RandomResizedCrop delegates entirely to fn.random_resized_crop.
  • resized_crop (functional) chains _crop + ndd.resize and reuses Resize.calculate_target_size_dynamic_mode for aspect-ratio-aware sizing; get_image_size/get_dimensions are pure-Python helpers that mirror the torchvision API.
  • Test coverage is thorough: pixel-level comparison against torchvision for tensors and PIL images, boundary conditions (negative anchors, out-of-bounds crop, crop larger than padded input), and validation error cases.

Confidence Score: 4/5

Safe to merge; all findings are cosmetic or test-quality issues with no impact on correctness or runtime behaviour.

The core logic for RandomCrop, RandomResizedCrop, crop, and resized_crop is correct and well-tested against torchvision references. The padding-fused-into-slice approach for RandomCrop is verified by dedicated boundary tests. The issues found are limited to: one error message that prints type(interpolation) instead of its value, a minimal crop() docstring, misleading parameter names ('size' vs 'height'/'width') in validation error messages, two TODO-marked skipped tests, and one test that omits a glob= message check. None of these affect operator output.

crop.py and randomcrop.py — the validation error messages in _verify_crop_args and _ValidateRandomResizedCropInterpolation could be improved before the API is considered stable.

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/torchvision/v2/randomcrop.py New file implementing RandomCrop and RandomResizedCrop operators. Logic is sound; the padding-fused-into-slice approach for RandomCrop is clever and tested. One error message issue: _ValidateRandomResizedCropInterpolation shows type(interpolation) instead of the actual value.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/crop.py New functional API crop and resized_crop. Delegates size validation to RandomCrop.verify_args, which produces misleading 'size' error messages when height/width is invalid; crop() docstring is also uninformative.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/image_metadata.py New get_image_size / get_dimensions helpers mirroring torchvision. Pure Python with no DALI calls; logic is correct and well-documented.
dali/python/nvidia/dali/experimental/torchvision/v2/resize.py Adds normalize_interpolation class method and int_to_interpolation_mode map to centralize legacy PIL integer code handling. Clean addition; Resize.init now calls normalize_interpolation before validation.
dali/test/python/torchvision/test_tv_crop.py New test file for crop/resized_crop functional. Compares against torchvision reference for tensors and PIL images. test_crop_invalid_output_size lacks a glob= pattern to verify the error message names the right parameter.
dali/test/python/torchvision/test_tv_randomcrop.py New tests for RandomCrop. Good coverage including padding variants, pad_if_needed, and boundary behavior (crop-larger-than-input). Two dict-fill tests are skipped with unresolved TODO comments.
dali/test/python/torchvision/test_tv_randomresizedcrop.py New tests for RandomResizedCrop covering shape, identity (fixed scale/ratio), interpolation modes, antialias, and validation. Good coverage.
dali/test/python/torchvision/test_tv_resized_crop.py New tests for resized_crop functional against torchvision reference images and tensors. Covers sizes, crop positions, interpolation modes, antialias, and validation error cases.

Sequence Diagram

sequenceDiagram
    participant User
    participant crop_fn as crop / resized_crop
    participant RandomCrop
    participant ndd_slice as ndd.slice
    participant ndd_resize as ndd.resize

    User->>crop_fn: crop(inpt, top, left, height, width)
    crop_fn->>crop_fn: _verify_crop_args (type + size checks)
    crop_fn->>crop_fn: adjust_input (PIL/Tensor to ndd.Tensor)
    crop_fn->>ndd_slice: "slice(anchor=(top,left), shape=(h,w), out_of_bounds_policy=pad)"
    ndd_slice-->>crop_fn: cropped ndd.Tensor
    crop_fn-->>User: PIL Image / torch.Tensor

    User->>crop_fn: resized_crop(inpt, top, left, h, w, size)
    crop_fn->>crop_fn: _verify_crop_args + Resize.verify_args
    crop_fn->>ndd_slice: _crop (same as above)
    ndd_slice-->>crop_fn: cropped
    crop_fn->>ndd_resize: "resize(cropped, size=(target_h, target_w))"
    ndd_resize-->>crop_fn: resized ndd.Tensor
    crop_fn-->>User: PIL Image / torch.Tensor

    User->>RandomCrop: __call__(inpt)
    RandomCrop->>RandomCrop: get_HWC_from_layout_pipeline
    RandomCrop->>RandomCrop: _randint(max_top), _randint(max_left)
    RandomCrop->>ndd_slice: "fn.slice(anchor=(left-pad_left, top-pad_top), shape=(crop_w,crop_h))"
    ndd_slice-->>RandomCrop: cropped tensor
    RandomCrop-->>User: torch.Tensor / PIL Image
Loading

Comments Outside Diff (2)

  1. dali/test/python/torchvision/test_tv_randomcrop.py, line 1349-1371 (link)

    P2 [Style] TODO comments in merged code

    Two tests are skipped with # TODO: Fill using dictionary pattern is currently not supported. Per the doc-no-todo-on-merge rule, TODOs should either be resolved before merge or replaced with a reference to a tracked issue. The current state leaves unanswered functionality gaps without a clear follow-up path.

    This pattern appears in both test_random_crop_fill_dict_matches_torchvision_tensor and test_random_crop_fill_dict_matches_torchvision_pil.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. dali/test/python/torchvision/test_tv_crop.py, line 809-819 (link)

    P2 [Style] test_crop_invalid_output_size doesn't check the error message

    The test only verifies that (TypeError, ValueError) is raised — it has no glob= pattern to assert that the message names the relevant parameter. Per the test-edge-cases rule, assert_raises should include a glob= argument to confirm that the exception message is meaningful and targets the right field. Without it a completely wrong (but typed) exception would make this test pass.

Reviews (1): Last reviewed commit: "Torchvision's Resized crop and random re..." | Re-trigger Greptile

Comment thread dali/python/nvidia/dali/experimental/torchvision/v2/randomcrop.py
Comment on lines +43 to +53

def _verify_crop_args(top: int, left: int, height: int, width: int) -> None:
_verify_crop_coordinate(top, "top")
_verify_crop_coordinate(left, "left")
RandomCrop.verify_args(
size=(height, width),
padding=None,
pad_if_needed=False,
padding_mode="constant",
fill=0,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 [Style] Misleading error messages for invalid height/width

_verify_crop_args delegates size validation to RandomCrop.verify_args(size=(height, width), ...). Any validation failure (e.g. height=0, width=1.5) surfaces with "Values size must be positive numbers" or "Size values must be integers" — both mention size, not height or width. Users calling crop(..., height=0, ...) or resized_crop(..., height=0, ...) will see an error that doesn't name the parameter they actually passed, making it hard to debug.

Per the error-message-with-context rule, error messages must name the affected parameter.

@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_resizedcrop branch 2 times, most recently from 22b511a to 6b11eb1 Compare May 27, 2026 11:46
mdabek-nvidia and others added 24 commits May 27, 2026 16:50
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Remove the custom FFmpeg build from the DALI Conda recipe and use the
upstream package instead.

Disable the PackedBFrames Conda QA case because the upstream FFmpeg build
currently fails that coverage.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Improve call site resolution
* Improve AST caching
* Cache calls by lines and spans to avoid full AST rewalks
---------

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
The DALI remap headers included obsolete <ctgmath>. This PR replaces
those includes with <cmath> in the sphere and water remap operators.

The CVCUDA legacy OSD header uses uint8_t directly but did not include
a header that defines it. Some toolchains no longer provide uint8_t
transitively through the current include graph, which makes the CVCUDA
build fail with unknown type name 'uint8_t'. This PR updates the
CVCUDA submodule to a fork commit that adds the missing <stdint.h>
include.

The CVCUDA submodule URL is changed to
https://github.com/JanuszL/CV-CUDA.git so CI can fetch the fix commit.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Update Python 3.14 QA dependency support

Enable the conda build matrix for Python 3.14 and remove the
Python 3.14 experimental-support warning now that the QA environment
uses compatible packages.

Refresh QA package selections for OpenCV, CuPy, TensorFlow/tf_keras,
Numba, and numba-cuda, including Python version bounds, removal of
obsolete Python 3.9-only variants, and NumPy constraints needed by
Python 3.14. Add TensorFlow 2.21 cuSolver library-path handling in
QA templates and patch the TensorFlow/Horovod test build for current
TensorFlow include paths and stream APIs.

Update CuPy DLPack integration to use the current from_dlpack and
__dlpack__ APIs in PythonFunction, tests, and examples. Adjust tests
for newer Python and NumPy behavior around one-element array scalar
conversion, bool conversion, refcount accounting, and pipeline
reference-release checks.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

* Update Python 3.14 conda and AST test support

- Adds explicit conda variants for regular Python builds and include the
  Python ABI suffix plus GIL/free-threading dependencies where applicable.
  This prepares the package recipe for Python 3.14 while leaving
  free-threaded builds disabled until dependencies support them.
- Replaces deprecated ast.Num-based autograph test fixtures with
  ast.parse-generated nodes so the tests remain compatible with Python
  3.14, where ast.Num has been removed.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

* Accept Python 3.14 pickle error wording

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

* Fix

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

* Fix DALI build with Clang 21

Replace obsolete <ctgmath> includes in the remap headers with <cmath> so
they build with newer Clang toolchains.

Update the CVCUDA submodule to a fork commit that adds the missing
<stdint.h> include to CvCudaOSD.hpp. The legacy OSD header uses
uint8_t directly and no longer gets it transitively with the new
toolchain include graph.

Point the CVCUDA submodule URL to the fork that contains the fix so
CI can fetch the updated submodule commit.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

---------

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Disallow GPU inputs from multiple devices
* Disallow consuming inputs from a device other than the current context uses
* Exempt `copy` from the above check (it's the only way to safely cross device boundary)

---------
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
…cript. (NVIDIA#6346)

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Convert numpy types to DALIDataType.
* Extend tests

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
* Different index support for tensor and PILImages

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
* RandomResizedCrop
* resized_crop functional API
* InterpoltionMode validation and unification

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
The test passess without a segfault (DALI-4655). It has not been deliberately fixed, just stopped appearing.

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Removes opencv_imgcodecs from the production libdali link line.

The CPU JpegCompressionDistortion operator was the only consumer of cv::imencode / cv::imdecode in libdali, which dragged opencv_imgcodecs (and its libjpeg/libpng/libtiff dependency closure) into the production runtime. The operator is rewritten on top of nvImageCodec, freeing the production link line from imgcodecs. opencv_imgcodecs is still linked into test binaries via the DALI_OPENCV_TEST_EXTRA_LIBS cache variable (golden-image fixtures use cv::imread/imwrite).

Signed-off-by: Joaquin Anton Guirao <janton@nvidia.com>
JanuszL and others added 13 commits May 27, 2026 16:59
- Adds bulk DLPack TensorList constructors for lists of external tensors. The C++ path builds
  CPU and GPU TensorLists in one pass after preflighting devices and peeking all capsule dtypes,
  so dtype or device mismatches can fall back without partially consuming single-use capsules.
- Adds `ndd.Batch` fast paths for:
  - lists of already evaluated `ndd.Tensor` objects, preserving storage, layout, dtype, enum
    metadata, laziness behavior where applicable, and CUDA stream order;
  - lists of external GPU tensors exposing DLPack, including stream-keyword handshakes and
    same-device validation;
  - lists of CPU and CUDA-host/pinned CPU DLPack tensors.
- Keeps requested dtype conversion on the existing slow path, so external DLPack capsules are not
  consumed before conversion fallback.
- Improves `ndd.Tensor` DLPack ingestion:
  - CPU read-only arrays that raise `BufferError` fall back to an explicit copy through the array
    interface;
  - GPU objects retry `__dlpack__()` without the `stream` keyword only when the producer rejects
    the keyword, and fall back to `__cuda_array_interface__` only when DLPack is unavailable.
- Fixes mixed pinned/non-pinned non-contiguous GPU TensorLists. GPU sample sharing no longer
  requires uniform `is_pinned()` state, which unblocks mixed-GPU batches where cross-device copies
  use pinned staging memory. CPU TensorLists still enforce pinned-state compatibility.
- Makes `UserStream::GetStream(size_t dev)` public so the new bulk DLPack path can derive a
  concrete per-device consumer stream.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Update preprocessor to 1.91.0
* Update cutlass to 4.5.0
* Update pybind11 to 3.0.4
* Updated packaging to <= 26.2
* Updated DALI_Deps commit to the latest version

Signed-off-by: Marek Dabek <mdabek@nvidia.com>
The docs version selector generated legacy-version entries without marking them
as short user-guide links. For DALI 2.x pages, that made the selector add an
unnecessary `docs/` path component.

Mark generated DALI 2.x entries with the existing `short_user` path mode so
the selector builds links that match the published layout.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Generate operator `See also entries` with Sphinx doc references instead of
raw root-relative HTML links. The root-relative form resolves outside the
DALI docs tree on docs.nvidia.com, so links from generated operator pages
can point at dead /examples/... URLs.

Let Sphinx resolve the target document from the generated page location so
the resulting href stays within the deployed user guide for both pipeline
and dynamic operation pages.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
- Exposes batch_size and device as distinct dynamic operator wrapper
  parameters so they can be tracked as reserved kwargs. Add generated
  numpydoc entries for batch_size, device, rng, and seed when those
  arguments are present in dynamic operator signatures.
- Fixes hiding of `seed` argument in the dynamic mode.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
)

The nvImageCodec-based image decoder (dali/operators/imgcodec) sets the
default `hw_decoder_load` to 0.90, whereas the legacy decoder
(dali/operators/decoder) used 0.65. Profiling on
EfficientNet+AutoAugment ImageNet training showed that the higher
default increases p99 decoder latency (24.5 ms -> 34.3 ms) and the
worst-case batch decode (30.8 ms -> 72.7 ms).

Revert the default to 0.65 to match the legacy behavior and restore
decoder determinism. Users who want the prior 0.90 behavior can opt in
explicitly via the operator argument.

Signed-off-by: Joaquin Anton Guirao <janton@nvidia.com>
Limit CPU video decoder codec support

Restrict the CPU frames decoder to codecs supported by the currently
compiled libavcodec configuration. H264 and HEVC are no longer
advertised for the CPU variant while VP8, VP9, and MJPEG remain
enabled.

Make ReadRegularFrame mark end-of-stream by setting next_frame_idx_
to -1 when the index reaches NumFrames(), mirroring the existing
guard in ReadFlushFrame. Without this, codecs with no decoder latency
(VP9 on the new test inputs) deliver the final frame via the regular
path, leaving next_frame_idx_ at NumFrames() and causing
VideoInput depletion to be reported one batch late.

Reset the decoder when an indexed next frame falls outside the valid
range, avoiding reuse of an invalid decoder position.

Update video decoder tests to expect CPU failures for unsupported
codecs instead of skipping only MPEG4. Use VP9 CFR/VFR test inputs
and device-less CPU pipelines where appropriate. Point the CFR/VFR
reference frame folders at `frames_{1,2}_vp9/` so CPU decode of the
new VP9 fixtures matches at the existing eps=10 tolerance. Drop the
CPU HEVC frames-decoder tests (`ConstantFrameRateHevc`,
`VariableFrameRateHevc`, `VariableFrameRateHevcNoIndex`) — HEVC is
no longer in the CPU codec allow-list.

Tolerate up to 16 isolated subpixel deviations exceeding eps in
TestVideo::CompareFrame (out of ~2.7M subpixels per frame). The CPU
VP9 decode path occasionally produces a single byte that differs by
~32 — a SIMD glitch inside libavcodec/sws_scale that Valgrind cannot
instrument. The budget is orders of magnitude below what any genuine
regression would produce, so test sensitivity is preserved.

In dali/test/python/input/test_video.py, filter out h264 from the
round-robin fixture (the unsuffixed test_{1,2}.mp4 in cfr//vfr/
are h264) and restrict test_video_input_audio_stream to the mixed
backend — the only DALI_extra video with an audio stream is h264.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Hide WorkerThread implementation details

Makes WorkerThread a virtual interface with a factory that returns the private
implementation.

Moves WorkerThreadImpl and its helper state into an internal header so async
pipelined executors can include it and store WorkerThreadImpl directly,
without exposing the implementation through the public WorkerThread API.

Updates public/template-visible users such as InputOperator and the legacy
video loader to store WorkerThread through the interface and factory, keeping
private WorkerThread layout out of external operator plugin headers.

Keeps OldThreadPool and NewThreadPool as direct concrete implementations.
ThreadPool and ThreadPoolBase already provide the virtual interface used by
operators, so the concrete pools do not add internal pImpl storage.

Makes WorkerThread::CheckForErrors surface queued worker failures and routes
async executor worker errors through their stop-signaling paths.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
- Pin MLPerf and JAX Rosetta links to specific GitHub
  tags for consistency and reliability.
- Remove references to hardcoded "Getting Started"
  paths in the DALI GitHub repository.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Require dynamic batch_permutation calls to provide an explicit batch_size when it cannot be inferred.

Dynamic mode has no pipeline-level batch size to inherit. BatchPermutation has no data inputs, so calling it without batch_size previously continued into operator setup as a sample-mode call and failed later with less actionable errors. This PR adds an early validation for the generated functional and class APIs.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Validate the number of values in each COCO annotation bbox before copying parsed JSON values into the fixed-size box array. Reject overlong arrays before they can write past the array and reject short arrays to avoid partially initialized boxes.

Use typed invalid_argument exceptions with actual element counts and add a COCO reader regression test that exercises the real reader path with empty, short, and overlong bbox arrays.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Support transferring operator instances to executor
* Include the reader in compiled pipelines
* Mark reader invalid in case of failure after operator transfer
---------

Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_resizedcrop branch from 7075ce9 to dc208ff Compare May 27, 2026 15:02
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants