Skip to content

Discount Factor Possibly Applied Twice #13

@Silent-Zebra

Description

@Silent-Zebra

Thanks to @cool-RR for pointing this out.

get_gae_advantages already includes discount factors, then is later multiplied by cum_discount, which is another discount factor. Thus, the discount factor is counted twice. I may have been confused by the bottom of page 5 in the Loaded DiCE paper (Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning), where the formula includes both a cumulative discount and an advantage (which I took to be the GAE which includes discount), and didn't follow the common practice of omitting the cumulative discount.

The implication of this is that my discount factor may be applied twice, compared to common practice, so for a given discount factor e.g. 0.96, there might be more discounting going on than you might otherwise expect. I expect that this doesn't materially change overall results, though it might affect learning dynamics, and might be confusing or inconsistent when comparing discount factors with other codebases.

I'm leaving things as is, even though the fix is very quick (e.g. just remove cum_discount in https://github.com/Silent-Zebra/POLA/blob/master/jax_files/POLA_dice_jax.py#L85), because I don't have time to rerun experiments now, and also I don't expect results to materially change (and even if they do, I could likely use a different discount factor to get similar results).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions