Spectral clustering: support pre-calculated affinity matrix, revamp API#1835
Spectral clustering: support pre-calculated affinity matrix, revamp API#1835ClaudiaComito wants to merge 45 commits into
Conversation
…y_matrix_in_ht_cluster_Spectral
…y_matrix_in_ht_cluster_Spectral
|
Thank you for the PR! |
…y_matrix_in_ht_cluster_Spectral
for more information, see https://pre-commit.ci
|
Thank you for the PR! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1835 +/- ##
==========================================
+ Coverage 88.76% 91.61% +2.84%
==========================================
Files 89 89
Lines 14012 14037 +25
==========================================
+ Hits 12438 12860 +422
+ Misses 1574 1177 -397
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
see #1824 for the eigh function prototype |
| raise NotImplementedError("Not implemented for other splitting-axes") | ||
|
|
||
| _, eigenvectors = self._spectral_embedding(x) | ||
| _, eigenvectors = self.spectral_embedding(x, self.eigen_solver) |
There was a problem hiding this comment.
Here, the eigenvalue problem is solved another time for the input data of predict aka the "test-data". However, I think the same spectral embedding of the "training data" (input of fit) should be re-used here.
| raise NotImplementedError("Not implemented for other splitting-axes") | ||
| # 2. Embed Dataset into lower-dimensional Eigenvector space | ||
| eigenvalues, eigenvectors = self._spectral_embedding(x) | ||
| eigenvalues, eigenvectors = self.spectral_embedding(x, self.eigen_solver) |
There was a problem hiding this comment.
Since this is the by far most expensive step of the algorithm, it could make sense to save the result of this as an _-attribute of the clusterer-object for later re-usage, e.g., in predict.
mrfh92
left a comment
There was a problem hiding this comment.
The API-changes look fine to me 👍
I have only the two comments directly attached to the code regarding re-usability of the (expensive) results of the eigenvalue decomposition. I would suggest to save them somewhere in the object in order to avoid recomputation.
|
addition to my review: just saw that scikit-learn's SpectralClustering does not have a predict on its own but only a fit_predict. |
…y_matrix_in_ht_cluster_Spectral
…y_matrix_in_ht_cluster_Spectral
…D_based_on_Zolotarev-polar_decomposition' into features/1740-Support_pre-calculated_affinity_matrix_in_ht_cluster_Spectral
|
Thank you for the PR! |
|
Update after #1964 is merged |
…y_matrix_in_ht_cluster_Spectral
…y_matrix_in_ht_cluster_Spectral
ClaudiaComito
left a comment
There was a problem hiding this comment.
various updates after merging #1457
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Change return type from Tuple to DNDarray for the function.
for more information, see https://pre-commit.ci
Removed outdated notes about eigenvalues and added comments regarding the Laplacian matrix properties.
|
This pull request is stale because it has been open for 60 days with no activity. |
|
|
||
| fileignoreconfig: | ||
| - filename: .github/workflows/ci.yaml | ||
| checksum: 3bf095a6ff388eb7aacb80e3b33ed600b377bbce6a97f01ac1579dc702024a13 | ||
| - filename: .github/workflows/ci_full.yaml | ||
| checksum: 54389cc7b9ddd0010ed99d1194da724dbcc51922358f739a5aa30e65e8e2c0e0 | ||
| version: "1.0" |
There was a problem hiding this comment.
| fileignoreconfig: | |
| - filename: .github/workflows/ci.yaml | |
| checksum: 3bf095a6ff388eb7aacb80e3b33ed600b377bbce6a97f01ac1579dc702024a13 | |
| - filename: .github/workflows/ci_full.yaml | |
| checksum: 54389cc7b9ddd0010ed99d1194da724dbcc51922358f739a5aa30e65e8e2c0e0 | |
| version: "1.0" |
Due Diligence
Description
This PR introduces some changes to the Spectral clustering class (see below), this work in connection with parallelization efforts of the SCIMES package with @dcolombo
Issue/s resolved: #1740
Changes proposed:
SpectralClustering(formerlySpectral) to match API of the current scikit-learn versionmetricparameter toaffinityto match sklearneigen_solver, with optionslanczosandzolotarev(Default:zolotarev) (e.g. Implement polar decomposition #1697 and Features/1723 Symmetric Eigenvalue Decomposition (eigh) and full SVD (svd) based on Zolotarev Polar Decomposition #1824 )n_components,random_stateto match scikit-learn API__spectral_embeddingis now just the embedding, like for scikit-learn (instead of eigenvalues and eigenvectors);DNDarray.diagonal()for convenience and to match numpy APIType of change
Memory requirements
NA
Performance
NA
Does this change modify the behaviour of other functions? If so, which?
no