Skip to content

Spectral clustering: support pre-calculated affinity matrix, revamp API#1835

Open
ClaudiaComito wants to merge 45 commits into
mainfrom
features/1740-Support_pre-calculated_affinity_matrix_in_ht_cluster_Spectral
Open

Spectral clustering: support pre-calculated affinity matrix, revamp API#1835
ClaudiaComito wants to merge 45 commits into
mainfrom
features/1740-Support_pre-calculated_affinity_matrix_in_ht_cluster_Spectral

Conversation

@ClaudiaComito

@ClaudiaComito ClaudiaComito commented Mar 21, 2025

Copy link
Copy Markdown
Member

Due Diligence

  • General:
  • Implementation:
    • unit tests: all split configurations tested
    • unit tests: multiple dtypes tested
    • benchmarks: created for new functionality
    • benchmarks: performance improved or maintained
    • documentation updated where needed

Description

This PR introduces some changes to the Spectral clustering class (see below), this work in connection with parallelization efforts of the SCIMES package with @dcolombo

Issue/s resolved: #1740

Changes proposed:

Type of change

Memory requirements

NA

Performance

NA

Does this change modify the behaviour of other functions? If so, which?

no

@github-actions

Copy link
Copy Markdown
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito added this to the 1.6 milestone Mar 31, 2025
@github-project-automation github-project-automation Bot moved this to Todo in Roadmap Mar 31, 2025
@ClaudiaComito ClaudiaComito requested a review from mrfh92 March 31, 2025 09:46
@github-actions

github-actions Bot commented Apr 4, 2025

Copy link
Copy Markdown
Contributor

Thank you for the PR!

@codecov

codecov Bot commented Apr 4, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 86.20690% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.61%. Comparing base (206a523) to head (fb921ea).

Files with missing lines Patch % Lines
heat/cluster/spectral.py 85.71% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1835      +/-   ##
==========================================
+ Coverage   88.76%   91.61%   +2.84%     
==========================================
  Files          89       89              
  Lines       14012    14037      +25     
==========================================
+ Hits        12438    12860     +422     
+ Misses       1574     1177     -397     
Flag Coverage Δ
unit 91.61% <86.20%> (+2.84%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mrfh92

mrfh92 commented Apr 7, 2025

Copy link
Copy Markdown
Collaborator

see #1824 for the eigh function prototype

Comment thread heat/cluster/spectral.py Outdated
raise NotImplementedError("Not implemented for other splitting-axes")

_, eigenvectors = self._spectral_embedding(x)
_, eigenvectors = self.spectral_embedding(x, self.eigen_solver)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the eigenvalue problem is solved another time for the input data of predict aka the "test-data". However, I think the same spectral embedding of the "training data" (input of fit) should be re-used here.

Comment thread heat/cluster/spectral.py Outdated
raise NotImplementedError("Not implemented for other splitting-axes")
# 2. Embed Dataset into lower-dimensional Eigenvector space
eigenvalues, eigenvectors = self._spectral_embedding(x)
eigenvalues, eigenvectors = self.spectral_embedding(x, self.eigen_solver)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the by far most expensive step of the algorithm, it could make sense to save the result of this as an _-attribute of the clusterer-object for later re-usage, e.g., in predict.

@mrfh92 mrfh92 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API-changes look fine to me 👍

I have only the two comments directly attached to the code regarding re-usability of the (expensive) results of the eigenvalue decomposition. I would suggest to save them somewhere in the object in order to avoid recomputation.

@github-project-automation github-project-automation Bot moved this from Todo to In Progress in Roadmap Apr 7, 2025
@mrfh92

mrfh92 commented Apr 7, 2025

Copy link
Copy Markdown
Collaborator

addition to my review: just saw that scikit-learn's SpectralClustering does not have a predict on its own but only a fit_predict.

…D_based_on_Zolotarev-polar_decomposition' into features/1740-Support_pre-calculated_affinity_matrix_in_ht_cluster_Spectral
@github-actions

Copy link
Copy Markdown
Contributor

Thank you for the PR!

@ClaudiaComito

Copy link
Copy Markdown
Member Author

Update after #1964 is merged

@ClaudiaComito ClaudiaComito modified the milestones: 1.7.0, 1.8.0 Dec 2, 2025
@ClaudiaComito ClaudiaComito modified the milestones: 1.8.0, 1.9.0 Mar 3, 2026

@ClaudiaComito ClaudiaComito left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

various updates after merging #1457

Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
Comment thread heat/cluster/spectral.py Outdated
@github-actions

Copy link
Copy Markdown
Contributor

This pull request is stale because it has been open for 60 days with no activity.

@github-actions github-actions Bot added the stale label May 18, 2026
Comment thread .talismanrc
Comment on lines +4 to +10

fileignoreconfig:
- filename: .github/workflows/ci.yaml
checksum: 3bf095a6ff388eb7aacb80e3b33ed600b377bbce6a97f01ac1579dc702024a13
- filename: .github/workflows/ci_full.yaml
checksum: 54389cc7b9ddd0010ed99d1194da724dbcc51922358f739a5aa30e65e8e2c0e0
version: "1.0"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fileignoreconfig:
- filename: .github/workflows/ci.yaml
checksum: 3bf095a6ff388eb7aacb80e3b33ed600b377bbce6a97f01ac1579dc702024a13
- filename: .github/workflows/ci_full.yaml
checksum: 54389cc7b9ddd0010ed99d1194da724dbcc51922358f739a5aa30e65e8e2c0e0
version: "1.0"

@github-actions github-actions Bot removed the stale label May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Support pre-calculated affinity matrix in ht.cluster.Spectral

3 participants