Initial implementation of SmartGradient#118
Conversation
Merging to main the cleaned up changes from the SC25 improvements
Co-authored-by: esmail-abdulfattah <esmail.abdulfattah@kaust.edu.sa>
however smart_gradient needs to be debugged
There was a problem hiding this comment.
Pull Request Overview
This PR introduces an initial implementation of the SmartGradient method, an adaptive finite difference gradient estimation technique that improves upon vanilla gradient computation by adaptively updating the basis used for gradient estimation based on parameter changes.
- Implements a new
SmartGradientclass that uses QR decomposition and adaptive basis updates - Adds a
VanillaGradientclass for standard finite difference gradient computation - Integrates both gradient methods into the main DALIA framework with configuration support
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/dalia/gradient_methods/smart_gradient.py |
Core implementation of the SmartGradient algorithm with adaptive basis updates |
src/dalia/gradient_methods/vanilla_gradient.py |
Standard finite difference gradient computation implementation |
src/dalia/core/gradient_method.py |
Abstract base class defining the gradient method interface |
src/dalia/gradient_methods/__init__.py |
Module initialization exposing gradient method classes |
src/dalia/configs/gradient_method_config.py |
Configuration classes for gradient method settings |
src/dalia/configs/dalia_config.py |
Integration of gradient method config into main DALIA config |
src/dalia/core/dalia.py |
Integration of gradient methods into the main DALIA class |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| self.curr_theta = current_theta | ||
| self.temp_basis = xp.roll(self.temp_basis, 1, axis=1) | ||
| xdiff = current_theta - self.prev_theta | ||
| xdiff += get_device(self.rng.normal(0.0, self.diagonal_noise, self.basis_size)) |
There was a problem hiding this comment.
The get_device function is being applied to numpy random values, but self.rng is a numpy random generator. This will likely fail when using GPU backends since numpy arrays cannot be directly converted to GPU arrays. The noise should be generated using the appropriate backend (xp) instead of numpy.
| xdiff += get_device(self.rng.normal(0.0, self.diagonal_noise, self.basis_size)) | |
| xdiff += xp.random.normal(0.0, self.diagonal_noise, self.basis_size) |
There was a problem hiding this comment.
is there a reason why you are using get_device? shouldn't this all be done on GPU when using cuda? also, in my INLA_DIST code im using ThetaDiff = ThetaDiff + eps*MatrixXd::Identity(dim_th,dim_th); so not a random term, but just a small known addition to the diagonal. no idea what is better.
There was a problem hiding this comment.
- Regarding the random term instead of a scaled identity, I followed Esmail implementation. The noise is still sampled from a know, and parametrizable distribution.
- I use the host
numpyrandom number generator:numpy.random.default_rnginstead of a generalizationnumpy/cupythoughxpbecause this interface doesn't exist incupy(you havecupy.random.Generatortho).
There was a problem hiding this comment.
ok, nevertheless I don't know if i would do something to our implementation that will make it non-reproduceable as there is also no random seed set.
| ) / (2.0 * self.finite_difference_epsilon) | ||
|
|
||
| # Transform back the gradient into the original basis | ||
| gradient[:] = xp.linalg.solve(self.basis.T, gradient) |
There was a problem hiding this comment.
Using xp.linalg.solve with self.basis.T assumes that self.basis.T is square and invertible, but self.basis is the Q matrix from QR decomposition which is orthogonal. For orthogonal matrices, the inverse is simply the transpose, so this should be gradient[:] = self.basis.T @ gradient for better numerical stability and performance.
| gradient[:] = xp.linalg.solve(self.basis.T, gradient) | |
| gradient[:] = self.basis.T @ gradient |
| # --- Set up recurrent variables | ||
| self.gradient_f = xp.zeros(self.model.n_hyperparameters, dtype=xp.float64) | ||
| self.f_values_i = xp.zeros(self.n_f_evaluations, dtype=xp.float64) | ||
| self.gradient_basis = xp.eye(self.model.n_hyperparameters, dtype=xp.float64) |
There was a problem hiding this comment.
The self.gradient_basis variable is defined but never used in the updated code. This appears to be leftover from the previous implementation and should be removed.
| self.gradient_basis = xp.eye(self.model.n_hyperparameters, dtype=xp.float64) |
There was a problem hiding this comment.
k = self.basis_size
eps = self.finite_difference_epsilon
deltas = eps * self.basis
direction_matrix[:, 0] = theta_dev
direction_matrix[:, 1:1+k] = deltas
direction_matrix[:, 1+k:1+2*k] = -deltas
for j in range(1, direction_matrix.shape[1]):
direction_matrix[:, j] = self._transformed_fun(phi=direction_matrix[:, j])
There was a problem hiding this comment.
I think copilot might actually be right on this one. I didn't see it being used somewhere else either.
|
This is how I did it in the smartGrad package and the c++ implementation.
Best regards,
Esmail
…On Wed, Oct 15, 2025 at 12:07 PM Vincent MAILLOU ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/dalia/gradient_methods/smart_gradient.py
<https://urldefense.com/v3/__https://github.com/dalia-project/DALIA/pull/118*discussion_r2431751242__;Iw!!Nmw4Hv0!wB1sCbC6G8CPPuTCDKk-H58wKYeZnkRZlzDonYeOQvs0fypFIw41ZQCyNNS6qnPDYkk6mQUuD_wl3m2RQcAAMUXXumSdhiIVfnLO$>
:
> + def _update_basis(self, current_theta) -> None:
+ """Update the basis using the current theta.
+
+ Parameters
+ ----------
+ current_theta : xp.ndarray
+ The current (hyper)parameter values.
+
+ Returns
+ -------
+ None
+ """
+ self.curr_theta = current_theta
+ self.temp_basis = xp.roll(self.temp_basis, 1, axis=1)
+ xdiff = current_theta - self.prev_theta
+ xdiff += get_device(self.rng.normal(0.0, self.diagonal_noise, self.basis_size))
- Regarding the random term instead of a scaled identity, I followed
Esmail implementation. The noise is still sampled from a know, and
parametrizable distribution.
- I use the host numpy random number generator:
numpy.random.default_rng instead of a generalization numpy/cupy though
xp because this interface doesn't exist in cupy (you have
cupy.random.Generator).
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/dalia-project/DALIA/pull/118*discussion_r2431751242__;Iw!!Nmw4Hv0!wB1sCbC6G8CPPuTCDKk-H58wKYeZnkRZlzDonYeOQvs0fypFIw41ZQCyNNS6qnPDYkk6mQUuD_wl3m2RQcAAMUXXumSdhiIVfnLO$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AOBONKQ4GMMPWPYFJCECDYD3XYFDTAVCNFSM6AAAAACGBFRAGGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTGMZZGI2DIMRQGI__;!!Nmw4Hv0!wB1sCbC6G8CPPuTCDKk-H58wKYeZnkRZlzDonYeOQvs0fypFIw41ZQCyNNS6qnPDYkk6mQUuD_wl3m2RQcAAMUXXumSdhnqggGMg$>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
This message and its contents, including attachments are intended solely
for the original recipient. If you are not the intended recipient or have
received this message in error, please notify me immediately and delete
this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before printing
this email.
|
Co-authored-by: esmail-abdulfattah esmail.abdulfattah@kaust.edu.sa
Should close #103