Reading NDArrays from arrow records Using Binary data#184
Reading NDArrays from arrow records Using Binary data#184ShamsUlAzeem wants to merge 2 commits intofr_fastpyfrom
Conversation
…om float, double to int for TypeConversion
|
@agibsonccc @AlexDBlack looking at the changes of this commit. 6f97e6a |
|
The context here is that when we save NDArrayWritable as an arrow record it's saved as a Binary data format and while deserialising the we don't really know what that format could be. The schema while saving the record is NDArray and while deserialising it's Bytes |
|
Although I see the potential for the storing of NDArrays as bytes in Arrow format to be potentially problematic, it's probably fine if the only use is within DataVec. In general "try/catch for a bunch of common formats" approach will be too brittle, let's avoid that and do it properly. Arrow supports n-dimensional arrays. |
|
So, my first use case for storing an NDArray was like this: When we store an NDArrayWritable using Keeping that in mind, when we try to read it back again into an arrow record using the following code: The internal mapper does map it to a but when we try to fetch the data using the get method: The function doesn't really know how to map it to a suitable writable and the output comes out to be a Which hits when this is evaluated: So, the mappings for binary data here: we should know what the binary data could potentially contain... and if it's not an array, we can assume it's something else. For now, I can think of arrays in the form of binary nd4j or numpy data. |
|
Or maybe arrow contains a datatype for tensors that we can use here instead |
|
@ShamsUlAzeem no there is only the tensor container type. Nd4j-arrow covers this pretty well already. |
What changes were proposed in this pull request?
Reading NDArrays from arrow records Using Binary data
(Please fill in changes proposed in this fix)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Quick checklist
The following checklist helps ensure your PR is complete: