Skip to content

[ENH] Switch backend to loki if it improves performance #3360

@TonyBagnall

Description

@TonyBagnall

Describe the feature or idea you want to propose

related to #2724 and #1794 see #3361

I have been running experiments on a 32/32 core machine. By default, we use joblib backend. Joblib’s own docs are explicit that the threading backend is mainly effective when the hot part of the workload is in compiled code that releases the GIL; if the workload manipulates Python objects a lot, scaling is limited

For example (the classifier Im running) DrCIF’s pipeline is not one big compiled kernel. It builds lots of intervals, loops over representations and features, and dispatches many relatively small operations. The Catch22 path in aeon, for example, includes Python-level loops and then calls into NumPy/Numba feature functions case by case. That kind of mixed workload often produces many live threads with modest actual CPU throughput.

This produces limited CPU usage

Image

if I switch to loky backend, I get this

Image

The loky version runs in 30 mins. The threads one takes
The standard behaviour when prefer is not set is to use loki. We explicilty choose threads in some classifiers (those that are not numba threaded with prange).

                fit = Parallel(
                    n_jobs=self._n_jobs,
                    backend=self.parallel_backend,
                    prefer="threads",
                )(

Describe your proposed solution

I think we definitely should just remove prefer for some classifiers such as the interval forests, but with some benchmarking first. I will do some benchmarking to try quantify any difference.

Describe alternatives you've considered, if relevant

Leave as is or explicitly default to loky

Additional context

this effects the following regressors and classifiers:

  • Rotation Forest
  • Arsenal
  • Muse
  • Boss
  • Weasel
  • Proximity Forest
  • Teaser
  • TDMVDC
  • Ordinal TDE
  • SFA
  • Catch22
  • RandomIntervals
  • SupervisedIntervals
  • ShapeletTransform: I dont think we get any benefit here

Metadata

Metadata

Assignees

Labels

classificationClassification packageenhancementNew feature, improvement request or other non-bug code enhancementregressionRegression package

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions