Describe the feature or idea you want to propose
related to #2724 and #1794 see #3361
I have been running experiments on a 32/32 core machine. By default, we use joblib backend. Joblib’s own docs are explicit that the threading backend is mainly effective when the hot part of the workload is in compiled code that releases the GIL; if the workload manipulates Python objects a lot, scaling is limited
For example (the classifier Im running) DrCIF’s pipeline is not one big compiled kernel. It builds lots of intervals, loops over representations and features, and dispatches many relatively small operations. The Catch22 path in aeon, for example, includes Python-level loops and then calls into NumPy/Numba feature functions case by case. That kind of mixed workload often produces many live threads with modest actual CPU throughput.
This produces limited CPU usage
if I switch to loky backend, I get this
The loky version runs in 30 mins. The threads one takes
The standard behaviour when prefer is not set is to use loki. We explicilty choose threads in some classifiers (those that are not numba threaded with prange).
fit = Parallel(
n_jobs=self._n_jobs,
backend=self.parallel_backend,
prefer="threads",
)(
Describe your proposed solution
I think we definitely should just remove prefer for some classifiers such as the interval forests, but with some benchmarking first. I will do some benchmarking to try quantify any difference.
Describe alternatives you've considered, if relevant
Leave as is or explicitly default to loky
Additional context
this effects the following regressors and classifiers:
Describe the feature or idea you want to propose
related to #2724 and #1794 see #3361
I have been running experiments on a 32/32 core machine. By default, we use joblib backend. Joblib’s own docs are explicit that the threading backend is mainly effective when the hot part of the workload is in compiled code that releases the GIL; if the workload manipulates Python objects a lot, scaling is limited
For example (the classifier Im running) DrCIF’s pipeline is not one big compiled kernel. It builds lots of intervals, loops over representations and features, and dispatches many relatively small operations. The Catch22 path in aeon, for example, includes Python-level loops and then calls into NumPy/Numba feature functions case by case. That kind of mixed workload often produces many live threads with modest actual CPU throughput.
This produces limited CPU usage
if I switch to loky backend, I get this
The loky version runs in 30 mins. The threads one takes
The standard behaviour when prefer is not set is to use loki. We explicilty choose threads in some classifiers (those that are not numba threaded with prange).
Describe your proposed solution
I think we definitely should just remove prefer for some classifiers such as the interval forests, but with some benchmarking first. I will do some benchmarking to try quantify any difference.
Describe alternatives you've considered, if relevant
Leave as is or explicitly default to loky
Additional context
this effects the following regressors and classifiers: