OOM on 24GB GPU (RTX 4090) when loading DeepSeek model - suggestion to support BERT or add memory optimization

Describe the issue

I'm trying to run the multivariate anomaly detection experiment on the Weather dataset, but I encounter CUDA Out-of-Memory (OOM) error on a 24GB GPU. It seems the model always loads the full DeepSeek-Qwen2 architecture (even when changing DEEPSEEK_PATH to BERT), which exceeds the available memory.

Environment

GPU: RTX 4090D (24GB)

CUDA: 12.4

PyTorch: 2.4.1+cu121

Python: 3.10

OS: Ubuntu 22.04
Steps to reproduce：
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.38 GiB. 
GPU 0 has a total capacity of 23.52 GiB of which 2.83 GiB is free. 
... (loading Qwen2ForCausalLM)
Error message (relevant part)：
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.38 GiB. 
GPU 0 has a total capacity of 23.52 GiB of which 2.83 GiB is free. 
... (loading Qwen2ForCausalLM)
What I tried

Changed DEEPSEEK_PATH from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B to bert-base-uncased in ts_benchmark/baselines/utils.py

Set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Cleared previous results

Observation

The tokenizer downloads successfully from bert-base-uncased, but the model still loads the Qwen2ForCausalLM architecture (deepseek stack), consuming 20+ GB VRAM before OOM.

Questions

Is there a way to run the model using the lightweight BERT architecture instead of DeepSeek?

If DeepSeek is required, could you provide guidance on memory optimization (e.g., reduced batch_size, seq_len, gradient checkpointing, or mixed precision) to fit into 24GB VRAM?

Are there any configuration flags or scripts specifically designed for consumer GPUs (24GB)?

Additional context

Your paper's Table 3 shows BERT achieves competitive performance, so supporting BERT as a lightweight backbone would greatly benefit users without A100/H800 GPUs.

Thank you for your great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM on 24GB GPU (RTX 4090) when loading DeepSeek model - suggestion to support BERT or add memory optimization #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

OOM on 24GB GPU (RTX 4090) when loading DeepSeek model - suggestion to support BERT or add memory optimization #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions