Skip to content

Conversation

@mefor44
Copy link

@mefor44 mefor44 commented Oct 29, 2025

During src.inference there is a step that deduplicates the same semantic IDS of items. Items with the same semantic ID get an additional id that is an int (starting from 0). Items without duplicates get additional 0 token. The function deduplicate_rows_in_tensor from src.utils.tensor_utils.py is super slow when the number of items is large (in millions). I provide much more efficient implementations of this function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant