Historical negative sampling by ntgbaoo · Pull Request #406 · tgm-team/tgm

ntgbaoo · 2026-05-11T01:55:01Z

Summary / Description

This PR handles:

Added historical negative sampling
Reorganized hooks into sub-modules (if needed)
Split negative samplers into files
Added base class for tgb negative samplers (tgbl, tkgl, thgl) to avoid duplicated code

Related Issues: #405

Type of Change

Test Evidence

Describe how this PR has been tested.

Unit tests
Integration tests
Performance tests

Questions / Discussion Points

List any areas where you’d like reviewer input or have open questions.

codecov · 2026-05-11T01:56:31Z

Codecov Report

❌ Patch coverage is 96.95946% with 9 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
tgm/hooks/negatives/tgb_sampler.py	91.42%	9 Missing ⚠️

📢 Thoughts on this report? Let us know!

ntgbaoo · 2026-05-11T02:32:38Z

+        self._memory[0, self._count : self._count + batch_size] = batch.edge_src
+        self._memory[1, self._count : self._count + batch_size] = batch.edge_dst


One drawback of this is that the amortized space complexity is O(number of observed edge events) rather than O(number of edges).

isn't this better? you mean we have to keep a buffer in space now?

Ideally, we want the space complexity of this to scale linearly w.r.t the number of unique edges in the graphs. However, the amortized space complexity of this implementation is O(number of observed edge events), which means the memory will contain duplicated edges. One benefit of this implementation: it naturally enforces that edges, which appear more frequently in the past, would have higher probabilities to be sampled.

shenyangHuang

Thanks Bao, two major things:

historical negative and random negative should be two hooks, one is stateful, one is stateless
I think we should now seriously consider having a folder or a base class that handles negative sampling and shouldn't throw everything in a single script anymore

shenyangHuang · 2026-05-13T11:40:44Z



-class NegativeEdgeSamplerHook(StatelessHook):
+class NegativeEdgeSamplerHook(StatefulHook):


I think we should have two classes, one for random negative, one for historical negative, it is easier conceptually for me to understand as well. Then I can mix and match the two by having two hooks in my hm.

and also one of them is stateless, one of them is stateful, we should definitely separate them. Otherwise, by same logic, we should merge uniform sampling with recency sampling

shenyangHuang · 2026-05-13T11:43:58Z

+
+        For each source node in the batch, randomly selects a destination node from
+        its past interactions stored in memory. If a source node has no recorded past
+        interactions, its corresponding negative sample is set to PADDED_NODE_ID as


negative sample set to PADDED_NODE_ID is fine, but we need to remind users to mask those out correctly. Alternatively the hook can let you know how much is padded?

This function will return PADDED_NODE_ID for nodes that don't have past interactions. However, PADDED_NODE_ID will be replaced with random dsts before returning. Here is the logic from __call__:

elif self.strategy == 'hist_rnd': if self._count == 0: neg, neg_time = self._random_sampling(dg, batch) neg, neg_time = neg[: size[0]], neg_time[: size[0]] else: #replace PADDED_NODE_ID with random dst rnd_size = round(size[0] * 0.5) hst_size = size[0] - rnd_size neg_rnd, neg_time_rnd = self._random_sampling(dg, batch) neg_hst, neg_time_hst = self._random_hist_sampling(dg, batch) original_valid_mask = neg_hst != PADDED_NODE_ID valid_idx = torch.where(original_valid_mask)[0] cutoff = min(hst_size, valid_idx.size(0)) neg = neg_rnd.clone() neg_time = neg_time_rnd.clone() chosen = valid_idx[:cutoff] neg[chosen] = neg_hst[chosen] neg_time[chosen] = neg_time_hst[chosen]

So PADDED_NODE_ID won't be propagated to downstream, and for nodes that don't have past interactions, we use random sampling

shenyangHuang · 2026-05-13T11:44:43Z

+        self._memory[0, self._count : self._count + batch_size] = batch.edge_src
+        self._memory[1, self._count : self._count + batch_size] = batch.edge_dst


isn't this better? you mean we have to keep a buffer in space now?

shenyangHuang

Thanks Bao, looks good on my end

ntgbaoo added 2 commits May 10, 2026 14:00

WIP

76feb07

Added unit tests

09cbcd6

Added unit test to enhance codecov

aef379d

ntgbaoo commented May 11, 2026

View reviewed changes

ntgbaoo self-assigned this May 11, 2026

ntgbaoo added enhancement New feature or request DGHooks Related to hooks or hook management DGBatch labels May 11, 2026

ntgbaoo requested review from Jacob-Chmura and shenyangHuang May 11, 2026 17:44

Fixed minor bug where duplicated src in a batch

7b15258

shenyangHuang requested changes May 13, 2026

View reviewed changes

ntgbaoo added 4 commits May 13, 2026 16:48

Organized hooks into sub-module and create Histotical negative sampler

3e3a66a

Updated doc

a67a1cb

Fixed doc build failure

b723ea5

Fixed doc build failure

c421fb6

ntgbaoo requested a review from shenyangHuang May 14, 2026 00:45

shenyangHuang approved these changes May 15, 2026

View reviewed changes

ntgbaoo merged commit aadb103 into main May 15, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Historical negative sampling#406

Historical negative sampling#406
ntgbaoo merged 8 commits into
mainfrom
405-HistoricalSampling

ntgbaoo commented May 11, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 11, 2026 •

edited

Loading

Uh oh!

ntgbaoo May 11, 2026

Uh oh!

shenyangHuang May 13, 2026

Uh oh!

ntgbaoo May 13, 2026

Uh oh!

shenyangHuang left a comment

Uh oh!

shenyangHuang May 13, 2026

Uh oh!

shenyangHuang May 13, 2026

Uh oh!

shenyangHuang May 13, 2026

Uh oh!

ntgbaoo May 13, 2026

Uh oh!

shenyangHuang May 13, 2026

Uh oh!

shenyangHuang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self._memory[0, self._count : self._count + batch_size] = batch.edge_src
		self._memory[1, self._count : self._count + batch_size] = batch.edge_dst



		class NegativeEdgeSamplerHook(StatelessHook):
		class NegativeEdgeSamplerHook(StatefulHook):

Conversation

ntgbaoo commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary / Description

Type of Change

Test Evidence

Questions / Discussion Points

Uh oh!

codecov Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shenyangHuang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shenyangHuang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ntgbaoo commented May 11, 2026 •

edited

Loading

codecov Bot commented May 11, 2026 •

edited

Loading