Memory issue due to A and B matrix computation

Hi,
Thanks for providing the Mamba implementation. I would like to know if there is any workaround in the efficient computation of `deltaA` and `deltaB_u` that can avoid the GPU memory running out issue. The following are the parameters I used to create the Mamba instance:
```
d_model: 1024
n_layer: 4   
d_state: int = 1024
expand: int = 2
```
The other parameters are set to their default values. 

It results in a model of ~60M parameters. However, I run out of memory (max GPU memory= 24 GB) when I train with a batch size of 256 or even as low as 64 and this probably happens due to large matrix computations for `deltaA` and `deltaB_u`. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory issue due to A and B matrix computation #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Memory issue due to A and B matrix computation #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions