Optimize tensor deduplication function #23

mefor44 · 2025-10-29T10:04:32Z

During src.inference there is a step that deduplicates the same semantic IDS of items. Items with the same semantic ID get an additional id that is an int (starting from 0). Items without duplicates get additional 0 token. The function deduplicate_rows_in_tensor from src.utils.tensor_utils.py is super slow when the number of items is large (in millions). I provide much more efficient implementations of this function

Optimize tensor deduplication function

ea082c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize tensor deduplication function #23

Optimize tensor deduplication function #23

Uh oh!

mefor44 commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimize tensor deduplication function #23

Are you sure you want to change the base?

Optimize tensor deduplication function #23

Uh oh!

Conversation

mefor44 commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant