Skip to content

关于训练超参 #60

@weiminw

Description

@weiminw

你好, 我在使用helpsteer2 数据微调(Lora)单标签奖励模型的时候(0.6helpfulness + 0.4correctness), lora_rank = 64, alpha = 128, learning_rate = 1e-5, 训练后的模型能力很差, r2一直在0.37左右, 请教一下. 这个reward model应该怎么训练?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions