[Example] Backward function of RGMS kernel

The forward function of the RGMS kernel is (relation related information are ignored for simplicity):

$$ Y = AXW $$

we already have its implementation written in SparseTIR using composable formats and tensor cores.

The backward function of the RGMS kernel needs to compute both the gradient of $X$ and $W$ :
$$\nabla (XW) = A^T \nabla Y$$
$$\nabla X = \nabla (XW) W^T $$
$$\nabla W = X^T \nabla (XW) $$

The three formulas could be computed inside the same kernel, and $\nabla (XW)$ should be stored in shared memory. The same optimizations (composable formats + tensorization) could be applied to backward kernel as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Example] Backward function of RGMS kernel #77

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Example] Backward function of RGMS kernel #77

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions