bug, data-loading
Hi Popper Team,
I noticed a small issue in the Popper.register_data method. Currently, the check for the existence of the bio_database directory and the subsequent call to download_all_data happens before the loader_type parameter is evaluated.
# In popper/popper.py -> Popper.register_data
# ...
self.data_path = data_path
# This check runs regardless of loader_type
if not os.path.exists(os.path.join(data_path, 'bio_database')):
print('It will take a few minutes to download the data for the first time...')
self.download_all_data()
else:
print('Data already exists, loading...')
if loader_type == 'bio':
# ... load bio data ...
elif loader_type == 'custom':
# ... load custom data ...
# ... etc ...
This means that even if a user specifies loader_type='custom' or loader_type='discovery_bench', the code still attempts to download the large biological dataset if the bio_database directory isn't found in the provided data_path. This can lead to unnecessary downloads and delays for users who only intend to use their own local data. The download of the biological dataset should only be triggered if the loader_type is set to 'bio' or 'bio_selected' (or any other type that explicitly requires this specific dataset).
Suggested Fix:
Moving the existence check and the download_all_data() call inside the conditional blocks for the relevant loader_types would resolve this. For example:
# In popper/popper.py -> Popper.register_data (Proposed Change)
# ...
self.data_path = data_path
# Download check removed from here
if loader_type == 'bio':
# Check and download only if loader_type is 'bio'
if not os.path.exists(os.path.join(data_path, 'bio_database')):
print('It will take a few minutes to download the data for the first time...')
self.download_all_data()
else:
print('Bio data already exists, loading...')
self.data_loader = ExperimentalDataLoader(...)
elif loader_type == 'bio_selected':
# Similar check for 'bio_selected'
if not os.path.exists(os.path.join(data_path, 'bio_database')):
# ... download ...
# ...
elif loader_type == 'custom':
# No download check needed here
print('Loading custom data...')
self.data_loader = CustomDataLoader(data_folder=data_path)
# ... etc ...
This change would ensure the download only happens when necessary based on the user's specified data loading strategy.
Thanks for considering this improvement!
bug,data-loadingHi Popper Team,
I noticed a small issue in the
Popper.register_datamethod. Currently, the check for the existence of thebio_databasedirectory and the subsequent call todownload_all_datahappens before theloader_typeparameter is evaluated.This means that even if a user specifies
loader_type='custom'orloader_type='discovery_bench', the code still attempts to download the large biological dataset if thebio_databasedirectory isn't found in the provideddata_path. This can lead to unnecessary downloads and delays for users who only intend to use their own local data. The download of the biological dataset should only be triggered if theloader_typeis set to'bio'or'bio_selected'(or any other type that explicitly requires this specific dataset).Suggested Fix:
Moving the existence check and the
download_all_data()call inside the conditional blocks for the relevantloader_types would resolve this. For example:This change would ensure the download only happens when necessary based on the user's specified data loading strategy.
Thanks for considering this improvement!