-
Notifications
You must be signed in to change notification settings - Fork 138
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Hi!
I compared the searchsorted function implemented here, that does torch.sum(inputs[..., None] >= bin_locations, dim=-1) - 1, with the implementation in C++ here -- https://github.com/aliutkus/torchsearchsorted -- and it appears to be a lot slower on CPU at least.
I modified the benchmark.py in torchsearchsorted and just copy-pasted the function from nflows for comparison.
The output was (all on CPU)
Benchmark searchsorted:
- a [5000 x 16]
- v [5000 x 1]
- reporting fastest time of 10 runs
- each run executes searchsorted 100 times
Numpy: 0.9516626670001642
torchsearchsorted: 0.009861100999842165
nflows: 50.19729063499926
i.e. sorting 5000 inputs into 5000 individual sets of 16 bins.
Am I missing something here? If not, it looks like the spline flows could be sped up quite a bit by using torchsearchsorted or something similar?
Cheers.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request