Skip to content

Problem about reward and loss in online training #9

@Enjia

Description

@Enjia

Hi,
Thank you for your dedicated work of PCC-Uspace.
When I followed the instruction in Deep_Learning_Readme.md, I found that values of both Reward and Ewma Reward were so high as the snapshot below:
Reward: 1360096.79, Ewma Reward: 21968834.42
Reward: 1013840.44, Ewma Reward: 21759284.48
Reward: 425067.66, Ewma Reward: 21545942.31
Reward: 327400.01, Ewma Reward: 21333756.89
Reward: 154455.32, Ewma Reward: 21121963.88
Reward: 115554.43, Ewma Reward: 20911899.78
Reward: 140730.04, Ewma Reward: 20704188.08
Reward: 112697.73, Ewma Reward: 20498273.18
Reward: 107894.34, Ewma Reward: 20294369.39
...
Worsestill, values of loss_vf_loss were also unexpected, one of which reached "4512207000000.0 ".
Did you ever stumbled across this problem and could you please tell me the possible reason behind this phenomenon? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions