Skip to content

Scores not bounded to 0 and 1 #58

@mgutierrezc

Description

@mgutierrezc

Hi again! I was testing the Reward model for a project and I got confused when finding that the scores are not really bounded to 0 and 1

Not only some of the scores are negative, but also some are greater than 1.

Image

Image

This is the snippet I used to obtain the scores from ArmoRM. I thought that the scores would be bounded to that scale out of the box, but let me know if an additional transformation is needed!

def calculate_rewards(prompt, response, model, tokenizer):
    """
    Calculates reward using both the prompt and response
    
    ["helpsteer-helpfulness","helpsteer-correctness","helpsteer-coherence",
    "helpsteer-complexity","helpsteer-verbosity","ultrafeedback-overall_score",
    "ultrafeedback-instruction_following", "ultrafeedback-truthfulness",
    "ultrafeedback-honesty","ultrafeedback-helpfulness","beavertails-is_safe",
    "prometheus-score","argilla-overall_quality","argilla-judge_lm","code-complexity",
    "code-style","code-explanation","code-instruction-following","code-readability"]
    """

    # preparing inputs for tokenizer
    messages = [{"role": "user", "content": prompt},
                {"role": "assistant", "content": response}]
    device = model.device
    
    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)

    # multi-objective rewards for the response
    with torch.no_grad():
        output = model(input_ids)
        multi_obj_rewards = output.rewards.cpu().float()

    helpsteer_rewards_pred_norm = multi_obj_rewards[0, :5]
    ultrafeedback_rewards_pred_norm = multi_obj_rewards[0, 5:10]
    beavertails_rewards_pred_norm = multi_obj_rewards[0, 10:11]
    other_rewards_pred_norm = multi_obj_rewards[0, 11:]

    return {
            "helpsteer-helpfulness": helpsteer_rewards_pred_norm[0].item(),
            "helpsteer-correctness": helpsteer_rewards_pred_norm[1].item(),
            "helpsteer-coherence": helpsteer_rewards_pred_norm[2].item(),
            "helpsteer-complexity": helpsteer_rewards_pred_norm[3].item(),
            "helpsteer-verbosity": helpsteer_rewards_pred_norm[4].item(),
            "ultrafeedback-overall_score": ultrafeedback_rewards_pred_norm[0].item(),
            "ultrafeedback-instruction_following": ultrafeedback_rewards_pred_norm[1].item(),
            "ultrafeedback-truthfulness": ultrafeedback_rewards_pred_norm[2].item(),
            "ultrafeedback-honesty": ultrafeedback_rewards_pred_norm[3].item(),
            "ultrafeedback-helpfulness": ultrafeedback_rewards_pred_norm[4].item(),
            "beavertails-safety": beavertails_rewards_pred_norm[0].item(),
            "prometheus-score": other_rewards_pred_norm[0].item(),
            "argilla-overall_quality": other_rewards_pred_norm[1].item(),
            "argilla-judge_lm": other_rewards_pred_norm[2].item(),
            "code-complexity": other_rewards_pred_norm[3].item(),
            "code-style": other_rewards_pred_norm[4].item(),
            "code-explanation": other_rewards_pred_norm[5].item(),
            "code-instruction-following": other_rewards_pred_norm[6].item(),
            "code-readability": other_rewards_pred_norm[7].item()
        }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions