Currently we use `numpy.array.__getitem__` on the host to do the indexing operation -- this is a bit inefficient was it requires DtoH and HtoD `Memcpy`s.
Currently we use
numpy.array.__getitem__on the host to do the indexing operation -- this is a bit inefficient was it requires DtoH and HtoDMemcpys.