Skip to content

mariolpantunes/pyNNMF

Repository files navigation

logo pyNNMF - Non-Negative Matrix Factorization

PyPI - Version PyPI - Python Version GitHub License GitHub Actions Workflow Status GitHub last commit

pyNNMF is a Python library for computing Non-Negative Matrix Factorization (NMF) with built-in support for missing value imputation. Unlike standard NMF libraries (e.g., scikit-learn), pyNNMF is resilient to missing data (NaN values and unobserved entries) and handles them natively using optimized NumPy routines.


Features

  • Native Python & NumPy: Highly optimized vectorized linear algebra operations without compiled or external C/C++ dependencies.
  • Missing Value Resiliency: Handles missing values (NaN) and observed/unobserved zeroes without failing or distorting optimization gradients.
  • Multiple Solvers: Supports Multiplicative Updates (MU), Alternating Least Squares (ALS), and Hierarchical Alternating Least Squares (HALS).
  • Multiple Cost Functions: Minimizes Frobenius Norm (Euclidean), Kullback-Leibler (KL) Divergence, and Itakura-Saito (IS) Divergence.

Installation

To install pyNNMF locally or prepare it for development:

git clone https://github.com/mariolpantunes/pyNNMF.git
cd pyNNMF
pip install -e .

Solver & Cost Function Selection Guide

To get the best speed and accuracy (matrix completion / imputation) out of pyNNMF, select the solver and cost function pair according to the data's noise distribution and missingness ratio:

1. By Noise Distribution

Noise Type Recommended Cost Function Recommended Solver Rationale
Additive / Gaussian Frobenius Norm (cost_fb) nmf_als (or nmf_mu) Frobenius norm represents the true negative log-likelihood for Gaussian noise. ALS converges fast.
Count / Poisson / Sparse KL Divergence (cost_kl) nmf_mu(cost='kl') (or nmf_mu_kl) KL divergence corresponds to Poisson likelihood and enforces sparsity naturally.
Scale-Invariant / Audio IS Divergence (cost_is) nmf_mu(cost='is') (or nmf_mu_is) IS divergence measures relative rather than absolute errors, protecting small-magnitude values.

2. By Missingness Ratio

  • Low-to-Moderate Missingness (< 30%): HALS solver (nmf_hals) is recommended. It updates variables coordinate-wise and converges to the lowest training objective minima very fast. (Note: HALS only supports Frobenius norm).
  • High Missingness (> 30%) / Highly Noisy: MU and ALS solvers (nmf_mu, nmf_als, rwnmf) are recommended. Their slower, diagonally-scaled update trajectories act as an implicit regularizer, preventing overfitting on the small number of observed entries.

Usage Examples

Demonstration scripts are available in the examples directory:

1. Basic Imputation Example

Demonstrates how to initialize a low-rank matrix, mask entries as missing (NaN), and reconstruct/impute them:

PYTHONPATH=src python examples/imputation_example.py

2. Solver Validation CLI

Benchmark execution times and validate prediction accuracy (out-of-sample RMSE/MAE) across different noise distributions and missingness ratios using your exectimeit library:

PYTHONPATH=src python examples/validate_solvers.py --size 100 --noise gaussian --ratio 0.15

3. Initialization Comparison

Evaluate the impact of different initialization strategies (random, nndsvd, svd_impute) on the convergence speed and final reconstruction error:

PYTHONPATH=src python examples/init_comparison.py

Running Tests & Checks

The test suite can be run using the standard Python unittest module:

PYTHONPATH=src python -m unittest discover -s test

To run formatting and static type checking:

ruff check src/ test/ examples/
npx pyright src/ test/ examples/

Documentation

Detailed package documentation is hosted on GitHub Pages


Authors


License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages