Skip to content

Initial implementation of SmartGradient#118

Open
vincent-maillou wants to merge 8 commits into
dalia-project:add_gradient_methodsfrom
vincent-maillou:add_smart_gradient
Open

Initial implementation of SmartGradient#118
vincent-maillou wants to merge 8 commits into
dalia-project:add_gradient_methodsfrom
vincent-maillou:add_smart_gradient

Conversation

@vincent-maillou

@vincent-maillou vincent-maillou commented Sep 9, 2025

Copy link
Copy Markdown
Collaborator

Co-authored-by: esmail-abdulfattah esmail.abdulfattah@kaust.edu.sa

Should close #103

@vincent-maillou vincent-maillou marked this pull request as ready for review September 11, 2025 13:16

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an initial implementation of the SmartGradient method, an adaptive finite difference gradient estimation technique that improves upon vanilla gradient computation by adaptively updating the basis used for gradient estimation based on parameter changes.

  • Implements a new SmartGradient class that uses QR decomposition and adaptive basis updates
  • Adds a VanillaGradient class for standard finite difference gradient computation
  • Integrates both gradient methods into the main DALIA framework with configuration support

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/dalia/gradient_methods/smart_gradient.py Core implementation of the SmartGradient algorithm with adaptive basis updates
src/dalia/gradient_methods/vanilla_gradient.py Standard finite difference gradient computation implementation
src/dalia/core/gradient_method.py Abstract base class defining the gradient method interface
src/dalia/gradient_methods/__init__.py Module initialization exposing gradient method classes
src/dalia/configs/gradient_method_config.py Configuration classes for gradient method settings
src/dalia/configs/dalia_config.py Integration of gradient method config into main DALIA config
src/dalia/core/dalia.py Integration of gradient methods into the main DALIA class

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread src/dalia/gradient_methods/smart_gradient.py
Comment thread src/dalia/gradient_methods/smart_gradient.py Outdated
Comment thread src/dalia/gradient_methods/smart_gradient.py
Comment thread src/dalia/configs/gradient_method_config.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

self.curr_theta = current_theta
self.temp_basis = xp.roll(self.temp_basis, 1, axis=1)
xdiff = current_theta - self.prev_theta
xdiff += get_device(self.rng.normal(0.0, self.diagonal_noise, self.basis_size))

Copilot AI Sep 11, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_device function is being applied to numpy random values, but self.rng is a numpy random generator. This will likely fail when using GPU backends since numpy arrays cannot be directly converted to GPU arrays. The noise should be generated using the appropriate backend (xp) instead of numpy.

Suggested change
xdiff += get_device(self.rng.normal(0.0, self.diagonal_noise, self.basis_size))
xdiff += xp.random.normal(0.0, self.diagonal_noise, self.basis_size)

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why you are using get_device? shouldn't this all be done on GPU when using cuda? also, in my INLA_DIST code im using ThetaDiff = ThetaDiff + eps*MatrixXd::Identity(dim_th,dim_th); so not a random term, but just a small known addition to the diagonal. no idea what is better.

@vincent-maillou vincent-maillou Oct 15, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Regarding the random term instead of a scaled identity, I followed Esmail implementation. The noise is still sampled from a know, and parametrizable distribution.
  • I use the host numpy random number generator: numpy.random.default_rng instead of a generalization numpy/cupy though xp because this interface doesn't exist in cupy (you have cupy.random.Generator tho).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, nevertheless I don't know if i would do something to our implementation that will make it non-reproduceable as there is also no random seed set.

) / (2.0 * self.finite_difference_epsilon)

# Transform back the gradient into the original basis
gradient[:] = xp.linalg.solve(self.basis.T, gradient)

Copilot AI Sep 11, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using xp.linalg.solve with self.basis.T assumes that self.basis.T is square and invertible, but self.basis is the Q matrix from QR decomposition which is orthogonal. For orthogonal matrices, the inverse is simply the transpose, so this should be gradient[:] = self.basis.T @ gradient for better numerical stability and performance.

Suggested change
gradient[:] = xp.linalg.solve(self.basis.T, gradient)
gradient[:] = self.basis.T @ gradient

Copilot uses AI. Check for mistakes.
Comment thread src/dalia/core/dalia.py
# --- Set up recurrent variables
self.gradient_f = xp.zeros(self.model.n_hyperparameters, dtype=xp.float64)
self.f_values_i = xp.zeros(self.n_f_evaluations, dtype=xp.float64)
self.gradient_basis = xp.eye(self.model.n_hyperparameters, dtype=xp.float64)

Copilot AI Sep 11, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.gradient_basis variable is defined but never used in the updated code. This appears to be leftover from the previous implementation and should be removed.

Suggested change
self.gradient_basis = xp.eye(self.model.n_hyperparameters, dtype=xp.float64)

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k = self.basis_size
eps = self.finite_difference_epsilon
deltas = eps * self.basis

direction_matrix[:, 0] = theta_dev
direction_matrix[:, 1:1+k] = deltas
direction_matrix[:, 1+k:1+2*k] = -deltas

for j in range(1, direction_matrix.shape[1]):
direction_matrix[:, j] = self._transformed_fun(phi=direction_matrix[:, j])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think copilot might actually be right on this one. I didn't see it being used somewhere else either.

@esmail-abdulfattah

esmail-abdulfattah commented Oct 15, 2025 via email

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding "SmartGradient" Approach for the Hyperparameter Gradient Computation

4 participants