Skip to content

Introduce sampling strategies#81

Merged
tizianocitro merged 4 commits intomainfrom
feat/sampling-strategies
Mar 2, 2026
Merged

Introduce sampling strategies#81
tizianocitro merged 4 commits intomainfrom
feat/sampling-strategies

Conversation

@tizianocitro
Copy link
Contributor

@tizianocitro tizianocitro commented Feb 25, 2026

Summary

Dataset.__getitem__ was hardcoded to sample by node as __len__ returned num_nodes, and each item returned all hyperedges incident to a given node:

  • Batch count was tied to node count, so adding negative hyperedge samples had no effect on __len__.
  • Uneven and unpredictable hyperedge coverage per batch due to variable node degrees.

I introduced a configurable sampling strategy that allows Dataset to sample either by node (existing behavior) or by hyperedge (new).

Hyperedge sampling (now default option):

  • __len__ returns num_hyperedges.
  • __getitem__(i) returns the i-th hyperedge and all its incidences, so that each batch contains a fixed, predictable number of hyperedges.
  • No deduplication needed since each hyperedge appears exactly once.

Node sampling:

  • __len__ returns num_nodes.
  • __getitem__(i) returns all hyperedges incident to node i (and all nodes in them).
  • Deduplication in Loader.collate handles any overlap.

@tizianocitro tizianocitro marked this pull request as draft February 25, 2026 12:22
@tizianocitro tizianocitro self-assigned this Feb 25, 2026
This was linked to issues Feb 25, 2026
@tizianocitro tizianocitro marked this pull request as ready for review March 2, 2026 10:25
@tizianocitro tizianocitro merged commit 1031533 into main Mar 2, 2026
15 checks passed
@tizianocitro tizianocitro deleted the feat/sampling-strategies branch March 2, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add shuffle() to HData Support hyperedge-based sampling in Dataset

1 participant