Solver Testing and Addressed Bugs Along the Way#134
Conversation
a non-permuted LU factorization isn't actually numerically correct. Modified the implementation (and interfaces of the solvers) to use LU and the appropriate triangular solve calls. Other solvers still relies on Cholesky when available.
… tests for Serinv. - Figured out fixture structure - Added fixtures factory for testing utils functions - Added structured solver testing for solver.factorize workflow
- This raised an unexpected behavior in the `_strutured_to_spmatrix` mapping.
- Separated sequential / distributed tests related fixtures
- Solved several issues in the implementation of the distributed solver - Fixed the structured_to_sparse mapping function that simply wasn't working
- Adapted `runner.sh` - Updated `README.md` with new test descriptions
There was a problem hiding this comment.
Pull Request Overview
This PR implements a comprehensive testing suite for DALIA's solver implementations and fixes several critical bugs discovered during test development. The testing infrastructure is organized into unit tests and component integration tests, covering sequential and distributed solver configurations across CPU and GPU backends.
Key changes:
- Added 300-500 tests for dense, sparse, and structured solvers (sequential and distributed)
- Fixed scipy sparse solver to use LU decomposition instead of fake Cholesky
- Corrected distributed structured solver issues with RHS handling, array gathering, and mapping
- Included automated test runner script for convenient backend and test category selection
Reviewed Changes
Copilot reviewed 49 out of 49 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/runner.sh | Shell script for automated test execution with backend detection and selection |
| tests/test_config.py | Test configuration constants (tolerances, random seed) |
| tests/structured_solvers_utils.py | Utility functions for structured solver test generation and validation |
| tests/component_integration/solvers/utils.py | Utility functions for component integration tests |
| tests/unit/solvers/structured_solvers/sequential/test_spmatrix_mapping.py | Unit tests for sequential structured solver matrix mapping |
| tests/unit/solvers/structured_solvers/distributed/test_spmatrix_mapping.py | Unit tests for distributed structured solver matrix mapping |
| tests/component_integration/solvers/structured_solvers/sequential/test_*.py | Component integration tests for sequential structured solvers |
| tests/component_integration/solvers/structured_solvers/distributed/test_*.py | Component integration tests for distributed structured solvers |
| tests/component_integration/solvers/sparse_solvers/sequential/test_*.py | Component integration tests for sparse solvers |
| tests/component_integration/solvers/dense_solvers/sequential/test_*.py | Component integration tests for dense solvers |
| src/dalia/solvers/sparse_solver.py | Fixed to use LU decomposition with proper solve and logdet implementations |
| src/dalia/solvers/distributed_structured_solver.py | Fixed RHS handling, gather operations, and multiple RHS support |
| src/dalia/solvers/structured_solver.py | Added symmetrize parameter to _structured_to_spmatrix |
| src/dalia/solvers/dense_solver.py | Renamed cholesky to factorize |
| src/dalia/core/solver.py | Updated abstract method from cholesky to factorize |
| src/dalia/utils/multiprocessing.py | Fixed allgather to properly return list of arrays instead of concatenating |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 49 out of 49 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -763,16 +793,17 @@ def _structured_to_spmatrix( | |||
|
|
|||
| # TODO: Need to communicate to agregates/Map the local B matrix to all ranks | |||
There was a problem hiding this comment.
someone once told me i shouldnt have any TODOs left in my PRs.
There was a problem hiding this comment.
what are your thoughts on this "symmetrize" option? do you want it to be enabled by default? when is it relevant / not relevant?
There was a problem hiding this comment.
after the discussions of the last week, i wonder whether we should have a non-diagonally dominant test case?
There was a problem hiding this comment.
can't you test the solver by checking if L is lower triangular and L*L^T = A?
- Removed DiagDom caracteristic in dense solver testing - Minor modifications in the sparse solver implementation - Improved test for the scipy sparse solver
Added testing suite for sequential and distributed solvers. Tests are separated in two parts:
unit/andcomponent_integration/.Regarding #129 , the tests are passing for CPU/GPU arrays using MPI (not tested with
mpicudaaware and NCCL, need environment development from branchinstallation_environements), it might still be that the exact case that fail is not tested for and / or I have overlooked something.Closes #72 , as now the testing suite structure will be defined and about 300–500 tests are available. A lot of work remaining to test the entire framework → will be done incrementally
The PR: Mpi issues #124 has been merged in subsequent "AR1/gamma_prior" branches, so this is gonna be dealt with in rebasing to these branches. For this reason, I would suggest closing
Mpi issues #124