Skip to content

Speed up process startup time by avoiding importing packages on version check#9048

Merged
fjetter merged 3 commits into
dask:mainfrom
fjetter:avoid_importing_packages
Apr 17, 2025
Merged

Speed up process startup time by avoiding importing packages on version check#9048
fjetter merged 3 commits into
dask:mainfrom
fjetter:avoid_importing_packages

Conversation

@fjetter
Copy link
Copy Markdown
Member

@fjetter fjetter commented Apr 16, 2025

importlib.metadata can check the version of a package without importing it. That makes our nanny and worker processes come up much faster, especially considering that the Nanny doesn't even have any usage for this (i.e. this also saves a bit on memory but that should be marginal in most cases).

Launching a LocalCluster with 10 workers on my machine is 2-3x faster with this change.

# Main
▶ time python -c "import distributed; distributed.LocalCluster(n_workers=10)"
10.98s user 3.07s system 147% cpu 9.516 total
▶ time python -c "import distributed; distributed.LocalCluster(n_workers=10)"
10.31s user 2.67s system 158% cpu 8.166 total

# This PR
▶ time python -c "import distributed; distributed.LocalCluster(n_workers=10)"
6.73s user 1.89s system 216% cpu 3.976 total
▶ time python -c "import distributed; distributed.LocalCluster(n_workers=10)"
6.52s user 1.76s system 252% cpu 3.275 total
▶ time python -c "import distributed; distributed.LocalCluster(n_workers=10)"
6.61s user 1.79s system 268% cpu 3.125 total

I hope this helps speed up our CI infrastructure and helps us avoid a couple of flaky tests which are related to Nanny processes startup failures

@fjetter fjetter marked this pull request as ready for review April 16, 2025 13:37
@mrocklin
Copy link
Copy Markdown
Member

mrocklin commented Apr 16, 2025 via email

Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find -- thanks @fjetter. This LGTM if tests are happy enough

Comment thread distributed/tests/test_versions.py Outdated
Comment on lines 159 to 161
# Use custom function
("distributed", lambda mod: "123"),
"distributed",
# Use version_of_package
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can get rid of these comments since we're not using a custom function for getting the version

@fjetter
Copy link
Copy Markdown
Member Author

fjetter commented Apr 16, 2025

curiously, CI performance is much worse 🤔

@fjetter fjetter force-pushed the avoid_importing_packages branch from c4b6104 to 345c0eb Compare April 16, 2025 15:31
@fjetter
Copy link
Copy Markdown
Member Author

fjetter commented Apr 16, 2025

Well, importlib.metadata.version is much faster than an initial import but it is much slower compared to an import that already happened. Therefore, all of our test that do not use dedicated processes got slower... that should be fixed with a simple cache

Test runtime on my machine for test_scheduler.py

main: ~60s
PR w/out cache: ~120s
PR w/ cache: ~45s

Let's see if this also translate to CI

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2025

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    27 files  ±0      27 suites  ±0   11h 10m 36s ⏱️ + 1m 9s
 4 100 tests ±0   3 983 ✅ + 2    112 💤 ± 0  4 ❌  - 1  1 🔥  - 1 
51 401 runs  ±0  49 095 ✅ +11  2 297 💤  - 11  8 ❌ +1  1 🔥  - 1 

For more details on these failures and errors, see this check.

Results for commit 5e62cb5. ± Comparison against base commit 48fcf48.

♻️ This comment has been updated with latest results.

@fjetter fjetter force-pushed the avoid_importing_packages branch from 345c0eb to 5e62cb5 Compare April 17, 2025 09:10
@fjetter
Copy link
Copy Markdown
Member Author

fjetter commented Apr 17, 2025

Ok, there doesn't seem to be a noticeable difference for CI runtime. At least nothing I could pick out easily considering that the runtimes vary between 20 and 45min per job.

@fjetter fjetter force-pushed the avoid_importing_packages branch from 5e62cb5 to 6200a25 Compare April 17, 2025 12:28
@fjetter fjetter merged commit 0503225 into dask:main Apr 17, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants