Skip to content

fix(torch): honor drop_last in all to_dataloader() modes#207

Merged
d-laub merged 8 commits into
mainfrom
fix/dataloader-drop-last
Jun 5, 2026
Merged

fix(torch): honor drop_last in all to_dataloader() modes#207
d-laub merged 8 commits into
mainfrom
fix/dataloader-drop-last

Conversation

@d-laub
Copy link
Copy Markdown
Collaborator

@d-laub d-laub commented Jun 5, 2026

Summary

Dataset.to_dataloader() did not honor drop_last consistently across modes. This fixes both directions of the defect:

  • Buffered / double_buffered modes silently dropped the final partial batch even when drop_last=False. Now the trailing partial batch is kept.
  • Default mode (mode=None) crashed with drop_last=True because drop_last was forwarded to td.DataLoader alongside batch_size=None (PyTorch rejects that combination). The BatchSampler is now the sole authority on dropping the partial batch.

Changes

  • _chunked.pyChunkPlanner no longer requires the index count to be a multiple of batch_size; it emits the trailing partial batch as a remainder entry in batch_totals and clamps the final chunk slice to n.
  • _torch.py_resolve_buffered_inputs gates partial-batch truncation on drop_last; get_dataloader stops forwarding drop_last to td.DataLoader in default mode; a directly-passed BatchSampler's batch_size is adopted for buffered re-batching (with a warning when it conflicts with an explicit batch_size).
  • _buffered_loader.py__len__ floor → ceil so it matches what iteration yields.

Test Plan

  • pytest tests/unit/test_chunk_planner.py tests/unit/test_torch.py tests/unit/test_buffered_loader.py tests/unit/test_double_buffered_loader.py → 39 passed, 1 skipped (1kg-gated)
  • New tests cover the full mode × drop_last matrix, partial-batch instance count, default-mode drop_last=True regression, and a DDP-shaped custom-BatchSampler case
  • End-to-end repro on the dummy dataset confirms ceil(N/bs) batches with drop_last=False and N//bs with drop_last=True in both default and buffered modes

🤖 Generated with Claude Code

d-laub and others added 8 commits June 5, 2026 01:07
Two defects: buffered modes ignore drop_last=False (unconditional n_keep
truncation + ChunkPlanner divisibility requirement), and default mode
crashes on drop_last=True (drop_last forwarded to DataLoader alongside
batch_size=None). Design teaches ChunkPlanner about a trailing partial
batch and stops forwarding drop_last in the default path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@d-laub d-laub merged commit 5e49833 into main Jun 5, 2026
7 checks passed
@d-laub d-laub deleted the fix/dataloader-drop-last branch June 5, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant