We would like to explore the use of multithreading in areas where parallel processing could have major benefits to our response times throughout the API. This will add additional complexity though, so it is something that we should have a careful eye for as we implement it in the future. An example of how this could be used is shown here for the zonal statistics:
|
# Use threading with a small number of workers for parallel |
|
# processing of the zonal statistics |
|
workers = min(4, max(1, cpu_count() // 2)) |
|
|
|
# Only use parallelization if we have enough combinations |
|
# to justify overhead of starting them up |
|
use_parallel = MULTIPROCESSING and workers > 1 and len(dimension_combinations) > 20 |
|
|
|
if use_parallel: |
|
with ThreadPoolExecutor(max_workers=workers) as threads: |
|
results = list( |
|
threads.map( |
|
lambda combo: ( |
|
combo, |
|
calculate_zonal_stats( |
|
da_i.sel(combo), |
|
rasterized_polygon_array, |
|
x_dim, |
|
y_dim, |
|
compute_full_stats, |
|
), |
|
), |
|
dimension_combinations, |
|
) |
|
) |
|
else: |
|
results = [ |
|
( |
|
combo, |
|
calculate_zonal_stats( |
|
da_i.sel(combo), |
|
rasterized_polygon_array, |
|
x_dim, |
|
y_dim, |
|
compute_full_stats, |
|
), |
|
) |
|
for combo in dimension_combinations |
|
] |
|
|
|
return results |
We would like to explore the use of multithreading in areas where parallel processing could have major benefits to our response times throughout the API. This will add additional complexity though, so it is something that we should have a careful eye for as we implement it in the future. An example of how this could be used is shown here for the zonal statistics:
data-api/zonal_stats.py
Lines 224 to 264 in 9d6cd62