Skip to content

OOM #178

@Huster-Hq

Description

@Huster-Hq

我在4台4090(每台是24G)上尝试了该代码,我已经将num_generations调成1,但还是报OOM错误,显存溢出。请问这是正常的吗?
以下我的运行文件:

CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun
--nproc_per_node="3"
--nnodes="1"
--node_rank="0"
--master_addr="127.0.0.1"
--master_port="12355"
src/open_r1/grpo.py
--use_vllm true
--output_dir ${OUTPUT_DIR}
--model_name_or_path ${QWEN_PATH}
--dataset_name ${HF_DATASET}
--max_prompt_length 4096
--max_completion_length 2048
--per_device_train_batch_size 1
--gradient_accumulation_steps 4
--learning_rate 1e-6
--lr_scheduler_type "constant"
--logging_steps 1
--bf16 true
--gradient_checkpointing true
--attn_implementation flash_attention_2
--min_pixels 3136
--max_pixels 501760
--num_train_epochs 2
--run_name ${RUN_NAME}
--save_steps 200
--save_total_limit 3
--save_only_model true
--report_to wandb
--temperature 1.0
--num_generations 1
--vllm_device "cuda:3"
--vllm_gpu_memory_utilization 0.8
--deepspeed ${DS_CONFIG}
2>&1 | tee "${OUTPUT_DIR}/training_log.txt"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions