A tool for downloading ERA5 climate data from the Copernicus Climate Data Store (CDS) using multiple API keys concurrently to improve download speeds.
- Dynamic task assignment system that automatically balances workload among multiple API keys
- Each API key runs a dedicated request pipeline: one sequential request thread (submitting the next retrieve() as soon as the previous result is ready) plus a configurable number of parallel download threads
- This ensures the CDS server queue is always occupied while downloading proceeds concurrently
- Robust download mechanism:
- Automatic fallback download if the CDS API method fails
- Exponential backoff retry strategy for failed downloads
- Resumable for interrupted downloads
- Smart file handling:
- Optional automatic variable short name extraction from NetCDF files
- Skip existing files when provided with short names
- Supports both ERA5 single-level and pressure-level datasets
- Clone or download this repository:
git clone https://github.com/Hem-W/ERA5_toolbox.git cd ERA5_toolbox
- Create a conda environment using the provided environment.yml file:
conda env create -f environment.yml conda activate era5_toolbox
- Install the required dependencies manually:
pip install cdsapi json5 tqdm urllib3 netcdf4 xarray
-
Configure your API keys by creating or modifying the
cdsapi_keys.jsonfile:{ "keys": [ "your-first-api-key", "your-second-api-key", "your-third-api-key" ] }
You can obtain CDS API keys by registering at https://cds.climate.copernicus.eu/
-
Make sure the
cdsapi_keys.jsonfile is in the same directory as the script, or specify a different location in theapi_keys_fileparameter.
To download ERA5 data, modify the user specifications in the main section of downloader_ERA5.py and run:
python -u downloader_ERA5.pyOr run in the background to avoid interruption when closing the terminal:
nohup python -u downloader_ERA5.py &Edit these parameters in the main section of the script:
# User Specification
years = range(2019, 2025)
variables = ["10m_u_component_of_wind", "2m_temperature"]
dataset = "reanalysis-era5-single-levels"
pressure_levels = None # List of pressure levels (hPa)
api_keys_file = None # Use default 'cdsapi_keys.json'
download_workers = 1 # Number of parallel download threads per key
skip_existing = True # Whether to skip downloading existing files
# Optional: Provide short names for variables (recommended when skip_existing=True)
short_names = {
'10m_u_component_of_wind': 'u10',
'2m_temperature': 't2m'
}To download single-level ERA5 data:
years = range(1940, 2025)
variables = ["toa_incident_solar_radiation", "2m_temperature", "total_precipitation"]
dataset = "reanalysis-era5-single-levels"
# Define short names for better file naming and skipping existing files
short_names = {
"toa_incident_solar_radiation": "tisr",
"2m_temperature": "t2m",
"total_precipitation": "tp"
}To download pressure-level ERA5 data:
years = range(1940, 2025)
variables = ["geopotential", "u_component_of_wind", "v_component_of_wind"]
dataset = "reanalysis-era5-pressure-levels"
pressure_levels = ["500", "700"] # Pressure levels in hPa
short_names = {
"geopotential": "z",
"u_component_of_wind": "u",
"v_component_of_wind": "v"
}Output files are named with a prescribed pattern:
- Single-level:
era5.reanalysis.[variable_shortname].1hr.0p25deg.global.[year].nc - Pressure-level:
era5.reanalysis.[variable_shortname].[pressure_level]hpa.1hr.0p25deg.global.[year].nc
The variable short name can be:
- Provided by the user via the
short_namesdictionary (recommended) - Automatically extracted from the downloaded NetCDF file (when
short_nameis not provided)
The script loads API keys from a separate JSON file, which:
- Keeps sensitive credentials out of source code
- Makes it easier to maintain and update keys