FuzzyClusteringSimilarity

A collection of method for computing similarity scores (also called a similarity index) between two (possibly fuzzy) clusterings. Indexes are typically in the range [0, 1], but many random clusterings produce values very close to 1 making them difficult to interpret. An Adjusted Similarity Index is computed using the following equation:

$$\text{Adjusted Index} = \frac{\text{Index} - \mathbb{E}[\text{Index}]}{\max(\text{Index}) - \mathbb{E}[\text{Index}]}$$

Adjusted Indexes are in the range $(-\infty, 1)$ and an adjusted index of $0$ means that the clusterings are no more similar than expected if the clusterings had been selected at random. A popular adjusted similarity index is the Adjusted Rand Index (ARI).

Computing an adjusted similarity index requries choices of both a score and a random model. This package provides implementations of several indexes and random models that are extensions of the Rand Index to fuzzy clusterings. The choice of a particular index and random model is problem dependent. See [1] and [2] for a discussion of this selection.

The following indexes and random models are implemented. If you use any of these indices please consider citing the paper where they were introduced.

Indexes

Normalized Degree of Concordance (NDC) [3]
Frobenious [4]
Jousselme [5]
Belief [5]
Consistency [5]

Random Models

Permutation
Fit [1]
Sym [1]
Flat [1]

Using the NDC and permutation model is called the Adjusted Degree of Concordance [6]. Note that the Frobenious Index may only be adjusted with the permutation model (see [4] for details).

Getting Started

This package is available from the julia general repository.

using Pkg
Pkg.add("FuzzyCLusteringSimilarity")

Then import the module.

using FuzzyClusteringSimilarity

You can run the unit tests to insure the package was properly installed.

Pkg.test("FuzzyClusteringSimilarity")

Documentation

The package exports three main functions. It also exports a typing system of similarity scores and random models to be used to direct the function to the desired implementation. For an example of using the package, check out the code that produced the figures in [1].

function adjustedsimilarity(
    z1::AbstractMatrix{<:Real},
    z2::AbstractMatrix{<:Real},
    index::AbstractIndex,
    model::AbstractRandAdjustment;
    onesided::Bool=true
)

This is the main function to be used to compare two fuzzy clusterings, z1 and z2. The clusterings are in the form of a c x n matrix of n objects into c clusters. If the clustering is hard, ensure the matrix entries have type <:INT to call the proper random model.

function similarity(
    z1::AbstractMatrix{<:Real},
    z2::AbstractMatrix{<:Real},
    index::AbstractIndex
)

The similarity function can be used to compute and unadjusted index.

function expectedsimilarity(
    z1::AbstractMatrix,
    z2::AbstractMatrix,
    index::AbstractIndex,
    model::AbstractRandAdjustment;
    onesided::Bool=true
)

The expected similarity function computes the expected similarity index between random clusterings (using the provided random model).

function massageMatrix(matrix::AbstractMatrix)

Massage a matrix to enable julia's multiple dispatch. Matrix is formated with objects as columns and clusters and rows. If matrix is a hard clustering, the type is converted to Bool.

References

[1] DeWolfe, R., Andrews, J.L. Random models for adjusting fuzzy rand index extensions. Adv Data Anal Classif (2025). https://doi.org/10.1007/s11634-025-00625-w

[2] Gates AJ, Ahn Y-Y (2017) The impact of random models on clustering similarity. J Mach Learn Res 18(87):1–28. http://jmlr.org/papers/v18/17-039.html

[3] E. Hullermeier, M. Rifqi, S. Henzgen and R. Senge, "Comparing Fuzzy Partitions: A Generalization of the Rand Index and Related Measures," in IEEE Transactions on Fuzzy Systems, vol. 20, no. 3, pp. 546-556, (2012). https://doi.org/10.1109/TFUZZ.2011.2179303

[4] Andrews, J.L., Browne, R. & Hvingelby, C.D. On Assessments of Agreement Between Fuzzy Partitions. J Classif 39, 326–342 (2022). https://doi.org/10.1007/s00357-021-09407-3

[5] T. Denoux, S. Li, and S. Sriboonchitta, “Evaluating and Comparing Soft Partitions: An Approach Based on Dempster–Shafer Theory,” IEEE Trans. Fuzzy Syst., vol. 26, no. 3, pp. 1231–1244, (2018), https://doi.org/10.1109/TFUZZ.2017.2718484.

[6] D’Ambrosio, A., Amodio, S., Iorio, C. et al. Adjusted Concordance Index: an Extensionl of the Adjusted Rand Index to Fuzzy Partitions. J Classif 38, 112–128 (2021). https://doi.org/10.1007/s00357-020-09367-0

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
src		src
test		test
.JuliaFormatter.toml		.JuliaFormatter.toml
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FuzzyClusteringSimilarity

Indexes

Random Models

Getting Started

Documentation

References

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FuzzyClusteringSimilarity

Indexes

Random Models

Getting Started

Documentation

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages