[Torchvision API] resized_crop and RandomResizedCrop#6369
[Torchvision API] resized_crop and RandomResizedCrop#6369mdabek-nvidia wants to merge 38 commits into
Conversation
|
@greptileai review please |
|
| Filename | Overview |
|---|---|
| dali/python/nvidia/dali/experimental/torchvision/v2/randomcrop.py | New file implementing RandomCrop and RandomResizedCrop operators. Logic is sound; the padding-fused-into-slice approach for RandomCrop is clever and tested. One error message issue: _ValidateRandomResizedCropInterpolation shows type(interpolation) instead of the actual value. |
| dali/python/nvidia/dali/experimental/torchvision/v2/functional/crop.py | New functional API crop and resized_crop. Delegates size validation to RandomCrop.verify_args, which produces misleading 'size' error messages when height/width is invalid; crop() docstring is also uninformative. |
| dali/python/nvidia/dali/experimental/torchvision/v2/functional/image_metadata.py | New get_image_size / get_dimensions helpers mirroring torchvision. Pure Python with no DALI calls; logic is correct and well-documented. |
| dali/python/nvidia/dali/experimental/torchvision/v2/resize.py | Adds normalize_interpolation class method and int_to_interpolation_mode map to centralize legacy PIL integer code handling. Clean addition; Resize.init now calls normalize_interpolation before validation. |
| dali/test/python/torchvision/test_tv_crop.py | New test file for crop/resized_crop functional. Compares against torchvision reference for tensors and PIL images. test_crop_invalid_output_size lacks a glob= pattern to verify the error message names the right parameter. |
| dali/test/python/torchvision/test_tv_randomcrop.py | New tests for RandomCrop. Good coverage including padding variants, pad_if_needed, and boundary behavior (crop-larger-than-input). Two dict-fill tests are skipped with unresolved TODO comments. |
| dali/test/python/torchvision/test_tv_randomresizedcrop.py | New tests for RandomResizedCrop covering shape, identity (fixed scale/ratio), interpolation modes, antialias, and validation. Good coverage. |
| dali/test/python/torchvision/test_tv_resized_crop.py | New tests for resized_crop functional against torchvision reference images and tensors. Covers sizes, crop positions, interpolation modes, antialias, and validation error cases. |
Sequence Diagram
sequenceDiagram
participant User
participant crop_fn as crop / resized_crop
participant RandomCrop
participant ndd_slice as ndd.slice
participant ndd_resize as ndd.resize
User->>crop_fn: crop(inpt, top, left, height, width)
crop_fn->>crop_fn: _verify_crop_args (type + size checks)
crop_fn->>crop_fn: adjust_input (PIL/Tensor to ndd.Tensor)
crop_fn->>ndd_slice: "slice(anchor=(top,left), shape=(h,w), out_of_bounds_policy=pad)"
ndd_slice-->>crop_fn: cropped ndd.Tensor
crop_fn-->>User: PIL Image / torch.Tensor
User->>crop_fn: resized_crop(inpt, top, left, h, w, size)
crop_fn->>crop_fn: _verify_crop_args + Resize.verify_args
crop_fn->>ndd_slice: _crop (same as above)
ndd_slice-->>crop_fn: cropped
crop_fn->>ndd_resize: "resize(cropped, size=(target_h, target_w))"
ndd_resize-->>crop_fn: resized ndd.Tensor
crop_fn-->>User: PIL Image / torch.Tensor
User->>RandomCrop: __call__(inpt)
RandomCrop->>RandomCrop: get_HWC_from_layout_pipeline
RandomCrop->>RandomCrop: _randint(max_top), _randint(max_left)
RandomCrop->>ndd_slice: "fn.slice(anchor=(left-pad_left, top-pad_top), shape=(crop_w,crop_h))"
ndd_slice-->>RandomCrop: cropped tensor
RandomCrop-->>User: torch.Tensor / PIL Image
Comments Outside Diff (2)
-
dali/test/python/torchvision/test_tv_randomcrop.py, line 1349-1371 (link)[Style] TODO comments in merged code
Two tests are skipped with
# TODO: Fill using dictionary pattern is currently not supported. Per thedoc-no-todo-on-mergerule, TODOs should either be resolved before merge or replaced with a reference to a tracked issue. The current state leaves unanswered functionality gaps without a clear follow-up path.This pattern appears in both
test_random_crop_fill_dict_matches_torchvision_tensorandtest_random_crop_fill_dict_matches_torchvision_pil.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
dali/test/python/torchvision/test_tv_crop.py, line 809-819 (link)[Style]
test_crop_invalid_output_sizedoesn't check the error messageThe test only verifies that
(TypeError, ValueError)is raised — it has noglob=pattern to assert that the message names the relevant parameter. Per thetest-edge-casesrule,assert_raisesshould include aglob=argument to confirm that the exception message is meaningful and targets the right field. Without it a completely wrong (but typed) exception would make this test pass.
Reviews (1): Last reviewed commit: "Torchvision's Resized crop and random re..." | Re-trigger Greptile
|
|
||
| def _verify_crop_args(top: int, left: int, height: int, width: int) -> None: | ||
| _verify_crop_coordinate(top, "top") | ||
| _verify_crop_coordinate(left, "left") | ||
| RandomCrop.verify_args( | ||
| size=(height, width), | ||
| padding=None, | ||
| pad_if_needed=False, | ||
| padding_mode="constant", | ||
| fill=0, | ||
| ) |
There was a problem hiding this comment.
[Style] Misleading error messages for invalid
height/width
_verify_crop_args delegates size validation to RandomCrop.verify_args(size=(height, width), ...). Any validation failure (e.g. height=0, width=1.5) surfaces with "Values size must be positive numbers" or "Size values must be integers" — both mention size, not height or width. Users calling crop(..., height=0, ...) or resized_crop(..., height=0, ...) will see an error that doesn't name the parameter they actually passed, making it hard to debug.
Per the error-message-with-context rule, error messages must name the affected parameter.
22b511a to
6b11eb1
Compare
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Remove the custom FFmpeg build from the DALI Conda recipe and use the upstream package instead. Disable the PackedBFrames Conda QA case because the upstream FFmpeg build currently fails that coverage. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
…6328) Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
* Improve call site resolution * Improve AST caching * Cache calls by lines and spans to avoid full AST rewalks --------- Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
The DALI remap headers included obsolete <ctgmath>. This PR replaces those includes with <cmath> in the sphere and water remap operators. The CVCUDA legacy OSD header uses uint8_t directly but did not include a header that defines it. Some toolchains no longer provide uint8_t transitively through the current include graph, which makes the CVCUDA build fail with unknown type name 'uint8_t'. This PR updates the CVCUDA submodule to a fork commit that adds the missing <stdint.h> include. The CVCUDA submodule URL is changed to https://github.com/JanuszL/CV-CUDA.git so CI can fetch the fix commit. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Update Python 3.14 QA dependency support Enable the conda build matrix for Python 3.14 and remove the Python 3.14 experimental-support warning now that the QA environment uses compatible packages. Refresh QA package selections for OpenCV, CuPy, TensorFlow/tf_keras, Numba, and numba-cuda, including Python version bounds, removal of obsolete Python 3.9-only variants, and NumPy constraints needed by Python 3.14. Add TensorFlow 2.21 cuSolver library-path handling in QA templates and patch the TensorFlow/Horovod test build for current TensorFlow include paths and stream APIs. Update CuPy DLPack integration to use the current from_dlpack and __dlpack__ APIs in PythonFunction, tests, and examples. Adjust tests for newer Python and NumPy behavior around one-element array scalar conversion, bool conversion, refcount accounting, and pipeline reference-release checks. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com> * Update Python 3.14 conda and AST test support - Adds explicit conda variants for regular Python builds and include the Python ABI suffix plus GIL/free-threading dependencies where applicable. This prepares the package recipe for Python 3.14 while leaving free-threaded builds disabled until dependencies support them. - Replaces deprecated ast.Num-based autograph test fixtures with ast.parse-generated nodes so the tests remain compatible with Python 3.14, where ast.Num has been removed. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com> * Accept Python 3.14 pickle error wording Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com> * Fix Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com> * Fix DALI build with Clang 21 Replace obsolete <ctgmath> includes in the remap headers with <cmath> so they build with newer Clang toolchains. Update the CVCUDA submodule to a fork commit that adds the missing <stdint.h> include to CvCudaOSD.hpp. The legacy OSD header uses uint8_t directly and no longer gets it transitively with the new toolchain include graph. Point the CVCUDA submodule URL to the fork that contains the fix so CI can fetch the updated submodule commit. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com> --------- Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Disallow GPU inputs from multiple devices * Disallow consuming inputs from a device other than the current context uses * Exempt `copy` from the above check (it's the only way to safely cross device boundary) --------- Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
…cript. (NVIDIA#6346) Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Convert numpy types to DALIDataType. * Extend tests Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
* Different index support for tensor and PILImages Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
* RandomResizedCrop * resized_crop functional API * InterpoltionMode validation and unification Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
The test passess without a segfault (DALI-4655). It has not been deliberately fixed, just stopped appearing. Signed-off-by: Marek Dabek <mdabek@nvidia.com>
This reverts commit 9911739.
Signed-off-by: Marek Dabek <mdabek@nvidia.com>
Removes opencv_imgcodecs from the production libdali link line. The CPU JpegCompressionDistortion operator was the only consumer of cv::imencode / cv::imdecode in libdali, which dragged opencv_imgcodecs (and its libjpeg/libpng/libtiff dependency closure) into the production runtime. The operator is rewritten on top of nvImageCodec, freeing the production link line from imgcodecs. opencv_imgcodecs is still linked into test binaries via the DALI_OPENCV_TEST_EXTRA_LIBS cache variable (golden-image fixtures use cv::imread/imwrite). Signed-off-by: Joaquin Anton Guirao <janton@nvidia.com>
- Adds bulk DLPack TensorList constructors for lists of external tensors. The C++ path builds
CPU and GPU TensorLists in one pass after preflighting devices and peeking all capsule dtypes,
so dtype or device mismatches can fall back without partially consuming single-use capsules.
- Adds `ndd.Batch` fast paths for:
- lists of already evaluated `ndd.Tensor` objects, preserving storage, layout, dtype, enum
metadata, laziness behavior where applicable, and CUDA stream order;
- lists of external GPU tensors exposing DLPack, including stream-keyword handshakes and
same-device validation;
- lists of CPU and CUDA-host/pinned CPU DLPack tensors.
- Keeps requested dtype conversion on the existing slow path, so external DLPack capsules are not
consumed before conversion fallback.
- Improves `ndd.Tensor` DLPack ingestion:
- CPU read-only arrays that raise `BufferError` fall back to an explicit copy through the array
interface;
- GPU objects retry `__dlpack__()` without the `stream` keyword only when the producer rejects
the keyword, and fall back to `__cuda_array_interface__` only when DLPack is unavailable.
- Fixes mixed pinned/non-pinned non-contiguous GPU TensorLists. GPU sample sharing no longer
requires uniform `is_pinned()` state, which unblocks mixed-GPU batches where cross-device copies
use pinned staging memory. CPU TensorLists still enforce pinned-state compatibility.
- Makes `UserStream::GetStream(size_t dev)` public so the new bulk DLPack path can derive a
concrete per-device consumer stream.
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Update preprocessor to 1.91.0 * Update cutlass to 4.5.0 * Update pybind11 to 3.0.4 * Updated packaging to <= 26.2 * Updated DALI_Deps commit to the latest version Signed-off-by: Marek Dabek <mdabek@nvidia.com>
The docs version selector generated legacy-version entries without marking them as short user-guide links. For DALI 2.x pages, that made the selector add an unnecessary `docs/` path component. Mark generated DALI 2.x entries with the existing `short_user` path mode so the selector builds links that match the published layout. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Generate operator `See also entries` with Sphinx doc references instead of raw root-relative HTML links. The root-relative form resolves outside the DALI docs tree on docs.nvidia.com, so links from generated operator pages can point at dead /examples/... URLs. Let Sphinx resolve the target document from the generated page location so the resulting href stays within the deployed user guide for both pipeline and dynamic operation pages. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
- Exposes batch_size and device as distinct dynamic operator wrapper parameters so they can be tracked as reserved kwargs. Add generated numpydoc entries for batch_size, device, rng, and seed when those arguments are present in dynamic operator signatures. - Fixes hiding of `seed` argument in the dynamic mode. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
) The nvImageCodec-based image decoder (dali/operators/imgcodec) sets the default `hw_decoder_load` to 0.90, whereas the legacy decoder (dali/operators/decoder) used 0.65. Profiling on EfficientNet+AutoAugment ImageNet training showed that the higher default increases p99 decoder latency (24.5 ms -> 34.3 ms) and the worst-case batch decode (30.8 ms -> 72.7 ms). Revert the default to 0.65 to match the legacy behavior and restore decoder determinism. Users who want the prior 0.90 behavior can opt in explicitly via the operator argument. Signed-off-by: Joaquin Anton Guirao <janton@nvidia.com>
Limit CPU video decoder codec support
Restrict the CPU frames decoder to codecs supported by the currently
compiled libavcodec configuration. H264 and HEVC are no longer
advertised for the CPU variant while VP8, VP9, and MJPEG remain
enabled.
Make ReadRegularFrame mark end-of-stream by setting next_frame_idx_
to -1 when the index reaches NumFrames(), mirroring the existing
guard in ReadFlushFrame. Without this, codecs with no decoder latency
(VP9 on the new test inputs) deliver the final frame via the regular
path, leaving next_frame_idx_ at NumFrames() and causing
VideoInput depletion to be reported one batch late.
Reset the decoder when an indexed next frame falls outside the valid
range, avoiding reuse of an invalid decoder position.
Update video decoder tests to expect CPU failures for unsupported
codecs instead of skipping only MPEG4. Use VP9 CFR/VFR test inputs
and device-less CPU pipelines where appropriate. Point the CFR/VFR
reference frame folders at `frames_{1,2}_vp9/` so CPU decode of the
new VP9 fixtures matches at the existing eps=10 tolerance. Drop the
CPU HEVC frames-decoder tests (`ConstantFrameRateHevc`,
`VariableFrameRateHevc`, `VariableFrameRateHevcNoIndex`) — HEVC is
no longer in the CPU codec allow-list.
Tolerate up to 16 isolated subpixel deviations exceeding eps in
TestVideo::CompareFrame (out of ~2.7M subpixels per frame). The CPU
VP9 decode path occasionally produces a single byte that differs by
~32 — a SIMD glitch inside libavcodec/sws_scale that Valgrind cannot
instrument. The budget is orders of magnitude below what any genuine
regression would produce, so test sensitivity is preserved.
In dali/test/python/input/test_video.py, filter out h264 from the
round-robin fixture (the unsuffixed test_{1,2}.mp4 in cfr//vfr/
are h264) and restrict test_video_input_audio_stream to the mixed
backend — the only DALI_extra video with an audio stream is h264.
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Hide WorkerThread implementation details Makes WorkerThread a virtual interface with a factory that returns the private implementation. Moves WorkerThreadImpl and its helper state into an internal header so async pipelined executors can include it and store WorkerThreadImpl directly, without exposing the implementation through the public WorkerThread API. Updates public/template-visible users such as InputOperator and the legacy video loader to store WorkerThread through the interface and factory, keeping private WorkerThread layout out of external operator plugin headers. Keeps OldThreadPool and NewThreadPool as direct concrete implementations. ThreadPool and ThreadPoolBase already provide the virtual interface used by operators, so the concrete pools do not add internal pImpl storage. Makes WorkerThread::CheckForErrors surface queued worker failures and routes async executor worker errors through their stop-signaling paths. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
- Pin MLPerf and JAX Rosetta links to specific GitHub tags for consistency and reliability. - Remove references to hardcoded "Getting Started" paths in the DALI GitHub repository. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Require dynamic batch_permutation calls to provide an explicit batch_size when it cannot be inferred. Dynamic mode has no pipeline-level batch size to inherit. BatchPermutation has no data inputs, so calling it without batch_size previously continued into operator setup as a sample-mode call and failed later with less actionable errors. This PR adds an early validation for the generated functional and class APIs. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Validate the number of values in each COCO annotation bbox before copying parsed JSON values into the fixed-size box array. Reject overlong arrays before they can write past the array and reject short arrays to avoid partially initialized boxes. Use typed invalid_argument exceptions with actual element counts and add a COCO reader regression test that exercises the real reader path with empty, short, and overlong bbox arrays. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
* Support transferring operator instances to executor * Include the reader in compiled pipelines * Mark reader invalid in case of failure after operator transfer --------- Signed-off-by: Rostan Tabet <rtabet@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
7075ce9 to
dc208ff
Compare
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Category:
New feature
Description:
Torchvision API operators:
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A