When we have arrays with more than a few hundred tiles, I've noticed that our performance drops significantly; this is almost certainly due to the various extent operations needed to compute tiles. We can move the extent code to Cython which would give us a big speedup.
Also, the vast majority of arrays have tiles that are all the same shape; we can leverage this to avoid scanning a tile list, and instead use the tile shape to find the target tile, e.g.
pos_to_tile(pos, tile_shape):
tx = pos[0] / tile_shape[0]
ty = pos[1] / tile_shape[1]
...
num_tiles_x = array.shape[0] / tile_shape.x
return ty * num_tiles_x + tx