It is clear we want to avoid pyactivestorage doing a file open inside each dask chunk if such a file open requires a remote index read each and every time.
A quick hack to fix this (in the pyfive branch) would be to avoid keeping the File instance open (the optimal_kerchunk branch already does this). With that one change, users could at least use active storage instances many times without worrying about the file open count.
A better solution long term may involve lifting the internal s3fs outside so we can take advantage of the s3fs caching.
It is clear we want to avoid pyactivestorage doing a file open inside each dask chunk if such a file open requires a remote index read each and every time.
A quick hack to fix this (in the pyfive branch) would be to avoid keeping the File instance open (the optimal_kerchunk branch already does this). With that one change, users could at least use active storage instances many times without worrying about the file open count.
A better solution long term may involve lifting the internal s3fs outside so we can take advantage of the s3fs caching.